August 26, 2020Introducing The Top Data & BI Victors of 2020
Welcome back to our “Analytics Leader Spotlight” series, where we get to know the data champions who are transforming the way that teams use and think about data. In today’s spotlight, we are excited to introduce you to Mark Stange-Tregear, Vice President of Analytics at Rakuten Rewards.
Thank you for joining us for another interview in our Analytics Leader Spotlight Series. Can you introduce yourself?
A: I’m Mark Stange-Tregear, I’m the VP of Analytics at Rakuten Rewards, formerly Ebates. I’ve been with the company for six years. I was brought in to work on the BI infrastructure of the company and that’s what I’ve been doing ever since. I joined when the company was much smaller than it is today. Since that time, we’ve gone through a couple of business intelligence situations; moving from an older SQL Server stack through a Hadoop implementation and now into a Snowflake for our enterprise data storage and business intelligence needs.
How did you get started in the data and analytics field? How has your career ultimately led you to where you are today?
A: I’d love to say that it was a deliberate, well thought-out process. But I think a random walk is much more accurate. I have a background in computer science, but my degree is in philosophy. I have an MA in philosophy, I was working on a PhD in philosophy when I met my significant other. I moved to the U.S. to follow her and went into non-profit book publishing. I was doing page design work, and from there I transitioned into marketing. I quickly found that databases made marketing much easier. I was working with a fairly small company at that point, but I started working more with marketing databases, and a little while after that I moved to a startup called WorldWinner. They were in the online gaming space, and I was working as a marketing analyst. I worked on traditional acquisition and retention marketing and started getting much more involved in the BI space in general. After a little time at WorldWinner, I moved to California where I went to work for a startup called MyNewPlace in San Francisco. From there, branching out from marketing analytics, I was recruited as a general analyst. Along with a data savvy executive team, I was pulling data and working on pretty much every aspect of the company. We eventually sold MyNewPlace to a large strategic player in the apartments listing space called RealPage. I stayed on with them for a couple of years until the opportunity at Ebates came along. By that point, I had a reasonable background in BI architecture, report building, and general analytics. Coming into Ebates when I did and having the chance to really help shape its BI program from the foundational up was a great opportunity to develop a skill set and really learn.
Your job and responsibilities have changed over time. What do they look like today at Rakuten Rewards?
A: Today, it’s very different to a little startup; it’s much more about having a programmatic approach, and systematically developing business intelligence and data savviness within the company. There’s also a constant question of scalability. My role today is really about empowering the company with data, specifically with enterprise-level analytical data. How do we drive value from data using analytical decision-making mechanisms? What is the technology plan around making that data as available and visible and usable as possible in as many significant use cases as possible. Increasingly, the question is how do we empower users in the business or engineering fields to use data and to feel comfortable using that data themselves.
It’s not just about access and governance and control, it’s also about teaching. We run a lot of seminars and workshops to help people learn data skills, and to learn how to not have to depend on an analyst. It’s a lot about building confidence and trying to convey the approaches and techniques that my team has learned over the years. It’s important that the wider group can feel like we can move the company forward using data. That’s a key priority.
The executive team at Rakuten Rewards has been extremely data focused since before I joined. A lot of them have backgrounds in the data industry. So, the expectation within the company, and something I will say that we generally accomplish, is that our decisions are driven with data. It’s an interesting position for someone like me to be in. I know that some people in the BI space have a challenge just convincing people that they should be using data, but that’s never really been too much of an issue for me. The difficulty I face may be a little bit different to the difficulties faced by some other people: the desire is for data to be ready and available, and to be used in every context we can think of. This puts an enormous amount of pressure on the BI systems, the stability of those systems, the availability of the data, and the richness of the data. Things like sub-siloing, or locking data away, or not having data at your fingertips doesn’t fly in our environment. That mindset presents a lot of interesting challenges just trying to keep up with a growing company. It’s a nice problem to have in many ways and I think that it’s allowed us to move a little faster on some projects and initiatives than we could otherwise have done.
There is no convincing people to use data. How do you create an architecture to promote that self-service so people continue to trust the data?
A: Trust is huge for me. It’s a very difficult challenge. The minute you start getting into a situation where one question has multiple answers and different systems give you different numbers, is the minute that most people stop believing any of the data. It doesn’t matter if one of the systems is right, or all of them are right just with slightly different conditions. The minute multiple answers come out trying to answer one question, is the minute people stop believing any of the answers. I’ve seen that happen over the years with some expensive consequences. The approach I’ve taken is to keep the data and business logic as consolidated as humanly possible. So what does that mean in practice? It means trying not to have separate pockets of marketing data, cut off from sales data, with a different hub of data for finance and product analytics. The idea is to consolidate all of the data into one technology stack where it can be joined and queried as a whole.
This also means that in general, we’re taking the approach that the data should be stored at the most granular level we can think of and that we are storing data even if we don’t have an immediate use case for it. Meaning that we can get access to that data on a dime without a large engineering project to make it available. The idea behind the granularity is that you can always go from more granular data up to more aggregated data, but you can’t go the other way around as easily. If you’re storing data at an aggregate level and a question comes up, you’ve got a research project to figure out where the number came from. If it’s all built from the same granular data, all that you need to do is take a look at the report logic and you basically have your answer.
Whilst I can’t guarantee that no one is ever going to come up with two answers to the same question, I think we’ve done a pretty good job of minimizing the amount of research to find out why. As a result, we’ve been able to maintain a degree of confidence within the data. I think that’s the core part of what we do and the approach that we’ve taken. However, it does put some pretty hard strain on the technology.
In six years, we’re now on our third enterprise BI system. We found that the SQL Server implementation we were running just didn’t scale. We bought the biggest machine and it was still not able to handle the load. So we did what a lot of people did and we splintered the architecture. We basically said “Okay, let’s get another SQL Server machine, a really big one, and process this bit of data here and that bit there and some in a third system.” This is fine in theory, right up to the point where you need data from one server merged with data from another. Then you need to start shifting data around. Or you end up in a situation where you are duplicating entire chunks of ETL or business logic across different servers; things inevitably get dropped and you’re back in the situation where you get multiple answers for a single question.
We moved into Hadoop with the idea of a clustered approach, we could go back to single storage and a single source of truth but also manage the concurrent loads that we were trying to deal with. We kinda got there with Hadoop. For a while, Hadoop was the only system that we had. We weren’t running any other subsidiaries, we were running with a single source of truth. The difficulty was that it wasn’t very scalable. Since we were having to buy the technology ourselves, we either had to spend an enormous amount of money and accept the machines sitting idle quite a lot of the time (this upset my CFO, and it was difficult to justify in terms of electricity and staff maintenance), or we had a situation where the machines got crunched. You can protect them with various different techniques, you can try to control memory use and CPU use for different users, but ultimately where we hit a fundamental roadblock was with the hard drives. We were trying to write so much data onto the disk at the same time that we were trying to read data off the disk that the hard drives just slowed down and gave up. You face some pretty interesting technical challenges with Hadoop.
So when the new era of cloud warehouses came along, they sounded pretty much perfect to us. We took our time. We weren’t the very first adopters, but weren’t a long way behind. We started trialing Snowflake, but I think a lot of the principles apply to the other cloud warehouses as well. With Snowflake we’ve got centralization of storage and we’ve got horizontal scaling of processing power, it means that we can maintain consistency of the data. You only have to deploy ETL pipelines once, and your business logic is all consolidated… you don’t have to silo your data. But at the same time, you can have very large scale multi-concurrency, with very different types of workloads. We can have independent workloads with little competition for resources. We found that to be a huge help in moving toward our vision of centralizing logic and storage, but separating usage and consumption.
You’ve made your migration to the cloud and that’s helped you scale-out on the hardware side. You went through that transition from data warehouse to the lake to the cloud. You’ve always had a pretty tight team to service a lot of data-hungry consumers. When you look at the people side, how did you build a team to respond to a real data hungry culture but do it in the most efficient way possible?
A: Phenomenal good luck played a big part. I couldn’t have done this without a number of members of the team. Several members of the team pre-date me or joined not long after me, and having a phenomenal group of people obviously helps. I’ve also been lucky to work with analysts and engineers who have a shared vision, centered around the same notion of “Let’s get a single answer, let’s not silo data. This is going to be a little hard, but it’s worth it.” And we’ve stuck to that commitment. That shared ideology has helped us maintain a vision, and it has helped us to move forward faster than we thought we could with the size of the team involved. Now that we’re out of the business of managing hardware headaches, we’ve got more space to scale. For us there’s no real limit to the capacity of the systems anymore, it’s just a matter of keeping the costs optimized. The question now becomes how best to empower those beyond our team.
I like to say that for every analysis question we answer, there’s going to be two more that follow. It never stops, and it shouldn’t. A good answer should answer the question, but it should also provide additional insight, and generate new thoughts for people to chase down. In a situation where your analysts are significantly outnumbered, how do you keep your business users provided with the data that they need, without having to funnel everything back through a limited group of analysts. That’s an interesting, challenging question and I think that one part is getting the right tools in place. The other part has to do with training and helping to make data a fundamental part of a general business skill set. Most business users have probably used Excel. In the future I think most business employees will use some kind of BI tool. As a result we’re rapidly expanding our training programs, teaching people at different skill levels how to code SQL. We’ve graduated around 120 people through our SQL program since the start of the year; in a mid-size company, that’s a good percentage of the total employee count.
Ultimately, not everyone is going to want to speak SQL, some will want to use Python. So what do you do for those users? And, is it always appropriate that someone has to code SQL to answer a question? Ideally, not. It can be very repetitive and it can get inefficient. I think that’s where tools like Tableau, Looker and AtScale come in; we can provide tools to business users and non-analysts that allow them to do some of the tasks that they need to do without having the more technical analytical skills. I think most people can use a drag and drop interface to build a chart. A lot of people are familiar with pivot tables. If you make that kind of functionality available to them, they will try and self-serve and what I’ve found is that that desire to self-serve will expand and grow, and people will adapt and really start to enjoy the freedom they have. In the end, you can get a lot more analytical productivity out of the business if you don’t have to linearly scale your analytics group.
Given your experience, what advice would you give to others in the industry who are building their careers in the data and analytics space?
A: Analytics is not mysterious or secretive. It’s a way of thinking. There’s definitely some technical skills. There are definitely people that are more inclined or interested in thinking in an analytical way, but it’s not something that only a tiny proportion of the population can do. It can be done by more people, you just have to make sure that people are interested in doing it and then provide them with the training to do so. There’s definitely a good number of people coming from computing or more quantitative degrees where it’s logical for them to go into analytics. There’s a quote of “data being the new oil,” and that’s only the case if you have people who know how to use it. The more people who can use data and can feel confident using it, the better. From my point of view, that means decentralizing the analytics workflows. You have to teach other people how to do it, and build that confidence.