Three Data Horror Stories Shared by the AtScale Team
Do you have a data horror story that keeps you up at night? In the spirit of Halloween, Sarah Gerweck (AtScale Co-Founder and Chief Technology Officer), Chris Oshiro (Field CTO) and Dave Mariani (AtScale Co-Founder and Chief Strategy Officer) sat down for a new episode of our podcast, CHATSCALE, to discuss three horror stories that they’ve encountered throughout their career. In this blog post, we share the highlights.
Starting from Scratch
Dave recalls his time at Yahoo!, where he was responsible for delivering analytics to the business to power their display advertising revenue. What was powering the analytics behind all of this? A SSAS cube that they built that “took those 360 million ads every day and put them all of them all available in an interactive OLAP Cube powered by SQL Server analysis services.” Dave recalls that the customers loved this because they could “make fine-tune adjustments at the ad-level depending on how an ad was performing.” He continues “The cube that was the largest cube in the world at 24 terabytes actually generated 50 million dollars of extra lift and the Yahoo advertising business every year.”
Mariani goes into the cons of this, “In order to make this work at 24 terabyte SSAS cube meant I had to build that cube from scratch and then update it with incremental data everyday. What that meant, was to build it from scratch and it took seven days. That’s one week of 24/7 processing.”
Dave then goes into detail about the complexities, sharing how they had to create an “A Cube” and a “B Cube” to allow their customers to continue to run queries if a cube is taken offline when being updated with SSAS. He recalls using NetApp snapshots to “make one of those cubes (Cube A for example) available for query while we updated the data on Cube B. You can probably see that that’s a delicate dance because what we have to do is once we updated Cube B we then would have to switch users from Cube A to Cube B, all behind the scenes. To do that with DNS tricks and some other tricks using NetApp snapshots.”
Sounds like a solution? Dave says otherwise, “The snapshot failed and the backup (of course we always would have a backup because if we lost that data that we would be down for seven days). Well one of the engineers who was working on restoring the backup actually deleted the backup file. So, what we are left with is no data to run our advertising display advertising analytics business for seven full days.” Sounds frightening! Mariani continues the tale, “My internal users of which there were hundreds of them, all had to go blind for one week which cost us roughly $1 million of lost revenue during that week and all the while I had to sit there and watch that cube build for 24 hours a day, seven days a week, knowing that my customers had no data to query.”
What’s the lesson learned? “That was a lesson of there is too big to fail and that definitely failed to be big. Trying to move that much data, materialize it and reformat it was a really bad idea.” Mariani concludes his story by sharing that this is where AtScale came from, “The concept of giving customers the ability to do ad-hoc queries using a multi-dimensional interface, without having to physically build a cube.”
Blind Data Prep
Gerweck is familiar with the challenges that large companies encounter when processing, loading, moving and transforming their data. “Rather than being about a catastrophic failure, this is more about a catastrophic classification that set in as this company tried to prep their data in a way that would allow them to get performance across all of their use cases.” She continues, “This company was using traditional data warehouses, they had data coming from multiple servers and multiple regions through multiple pipelines and of course, the technology at the time, they needed to get all that data into one place so that their analytics systems could operate on that data. Then they needed to prep that data in such a way that it would be useful to their end users.” What did the company do? Gerweck continues the story, sharing how they “Built lots and lots of complicated ETL jobs there were dozens of engineers involved in building these things, taking this data, dressing it up in just the right way so that it could be loaded into this data warehouse and then also doing projections on effectively all of the different projections of that data that they might require to be able to quickly answer questions for end users.”
Where did they run into trouble? Sarah says, “They were doing this effectively blind. They couldn’t roll it out to the users and then say ‘We’ll see what people are using and maybe they will be too big and take down our data warehouse.’ They had to basically try to figure out in advance everything they are going to need.” What did this lead to? The company had “More than two thousand different pre-aggregations of data that they required, constantly had to be built and they didn’t know until months later which ones were really going to be useful, which ones for getting used most frequently and least frequently. This led to millions and millions of dollars in terms of data storage costs, database license fees and engineering time. But even worse than all those things, was that it calcified the system.”
Gerweck also shares how this story was inspiration behind AtScale’s Autonomous Data Engineering because there was a need “To have a reactive system that could look at the stream of data as it’s coming in in and the Stream of user activity and basically build intelligent optimizations on the fly, make changes after needed use, cutting-edge to statistical techniques to be able to transform on non-additive problems into additive problems as well errors and avoid the huge costs of having to call an engineer… So another one of the motivations for AtScale was to handle all those issues for you and allow you to be dynamic, allow you to use the original data landed directly onto the system, but also get the performance of a system like what you get with a cube build or traditionally aggregated data warehouse.”
Chris Oshiro is no stranger to organizations who are using their data and transforming their analytics. “Along that path and journey there are certainly a lot of scary stories,” says Oshiro.
“ A lot of the analytics that our customers are doing today is very data-intensive and very scale intensive. In order to accomplish this type of insights or find the kind of insights that they are looking for, they need to effectively gather a lot of data and and put it into these big repositories and scan through it.”
He shares how customers go about this as, “They go ahead and they stand up systems that are very large and distributed to be able to crunch through that. They purchase machines, they have big farms that they inevitably capitalize some type of capacity and then they run through all that data and they find their insights. And that’s all well and good.” Well, it’s good for the time being.
Oshiro continues the tale, “Fast forward a couple of years, there is a need to migrate those platforms away from data centers and going to cloud computing which is what a lot of our customers are doing today… There are a lot of horror stories just in the transmission of that data into the cloud. Once you have all that data in the cloud, you have the same developers who are trying to find insights. Now they go about in the cloud to do the same thing. As they do that, they find that a lot of the data is very large. Sometimes they need to run these queries overnight or through the weekend.”
Chris recalls being told the story of having “Developer A or analyst A do something on Friday, run it, walk away, come back on Monday and the horror story was really around the cloud bill.” He recalls customers telling him that they were “Ten thousand, $50,000 queries that were run overnight or over the weekend and they had no idea.”
How does he look at this? “From an AtScale perspective, Autonomous Data Engineering gives us the ability to create these aggregate tables that were insightful based on past queries and past behavior. It allowed AtScale to provide an acceleration of those queries. So instead of crunching all the data, now we can crunch our autonomous engineered acceleration structures and we will get responses really quick.” Oshiro discusses decreased compute, “Instead of going through terabytes of data, now AtSCale is able to redirect to an acceleration structure, this is a faster query mechanism… instead of going and coming back with a $50,000 query maybe that a five dollar query or a one dollar query.”