December 17, 2019Big Data Analytics in the Cloud for Today’s Distributed and Diverse Data
Before You Break Up with OLAP
While relationships can be challenging at times, they are usually worth the effort. When your significant other repeatedly leaves the cap off the toothpaste or leaves dirty dishes in the sink, it might drive you crazy. Our relationship with technology is no different. If you’re like me, you still daydream about a perfect world where you have all the good without the bad. In this post, I’ll discuss my “relationship” with OLAP and SQL Server Analysis Services (SSAS) and how it drove me to create AtScale. In particular, I focus on how OLAP as a technology has a critical place in our new cloud world and how you can enjoy the benefits of OLAP without the ugly baggage.
The Shotgun Marriage: SSAS & Big Data
While running data pipelines and analytics for Yahoo! in 2012, I struggled to keep up with the growth of our advertising data and make it usable for the business. We had truly big data before the term “Big Data” was invented. On top of the data volume challenge, I had several internal and external customers who all wanted to consume this data using different tools and applications. The result was utter chaos. Our most simple terms of “page views” and “ad impressions” had different definitions across our business units – there was no single source of truth for our most basic business metrics and my users didn’t trust our data.
I needed a universal semantic layer, or data API that would:
- Work with a variety of data consumers and tools
- Deliver fast, consistent query performance
- Define secure metrics in once place with an easy to use business interface
- Provide a self-serve interface for business analysts and data science
- Scale with our business
- Minimize manual data engineering
- Avoid data copies and data movement
At the time, there was no single solution that could meet my objectives. So, to satisfy these requirements, I forced a shotgun marriage of Hadoop, Oracle and SQL Server Analysis Services (SSAS). The data landed in Hadoop, was pre-aggregated and ETL’ed into Oracle as a staging area, and then processed into a SSAS cube for end user consumption.
My SSAS cube ended up becoming the largest SSAS cube in the world at 24TB for 3 months of data. While 5 times larger than the next largest cube, 24TB is really not “big data” in my definition. Yet, keeping that beast fed required a delicate dance involving NetApp snapshots and tricked out ETL code. But my end users loved it and they were able to do wonderful, revenue-producing things with it. I really needed a way of having my cake (OLAP functionality) and eating it too (without the OLAP architecture).
Why I Broke Up with SSAS
My relationship with the above stack was not great. While I achieved my business objectives of delivering real value for consumers and promoting an amazing degree of self-service, the architecture was too rigid, too fragile, didn’t scale, wasn’t securable and was not sustainable. Here’s how I scored our results:
Business results: Terrific. Our SSAS display advertising cube delivered over $50 million in lift to that business every year and consumers generated a crazy amount of user generated, high value analytical content.
The level of effort: Unbearably high. It took my team 4-6 weeks to add a new dimension or metric and 7 days(!) of 24×7 processing to rebuild the cube. On top of that, I could only deliver 3 months of data, not the 15 months my users needed to do year over year comparisons.
Meeting 7 objectives: Poor (3 out of 7). The solution hit the mark on ease of use, query performance and semantic consistency but failed miserably scaling. Just as bad, the architecture required an army of highly skilled data engineers to make work and generated at least 3 copies of (huge) data.
It was clear I wasn’t getting what I needed out of this relationship. I needed a break to reassess our future together.
…But wanted to stay friends with OLAP
While I was in love with OLAP’s ease of use, it’s query speed and centralized business logic, I had a hard time with its underlying architecture that made my team’s life hell. It was clear to me that the dimensional model was a winner – end users loved it, they trusted the results and it promoted self service and got me and my data team out of the reporting business. However, OLAP’s fundamental architectural approach of pre-calculating data into an array and lay it out on disk was not scalable and introduced too much latency and complexity.
OLAP Functionality Without the SSAS Headaches – A Healthy Relationship
So, my co-founders and I started AtScale to solve this problem once and for all. We took the best of OLAP and married it with a modern approach for data management. With AtScale, we implemented the MDX specification as defined by Microsoft and SSAS, but delivered it through a no data movement, direct query architecture.
With AtScale, we delivered a solution that has the best of both worlds:
- SSAS (OLAP) compatibility with Excel & any other BI tool that speaks MDX or SQL
- Fast, consistent and low-cost multidimensional and tabular queries
- Simple, delightful modeling tools to add new data elements and data sources instantly
- Direct access to any data lake or data warehouse, including new cloud data warehouses and nested data
- Security and governance controls to manage data access in one place
- A single source of truth for critical business metrics, defined server-side
OLAP and Big Data – A Match Made in Heaven
It wasn’t easy to deliver on the above goals. OLAP is inherently a cell-based calculation engine. In other words, it works like a spreadsheet. Translating OLAP’s cell-based calculations into dialect optimized queries against Hadoop, Oracle, Teradata, Google BigQuery, Snowflake, Amazon Redshift and many more was enormously difficult. In fact, it took us almost 3 years of engineering work before the product saw the light of day. More than 6 years later, we think we have cracked the code. Trust me. If I had AtScale when I was at Yahoo! – I would have a lot less gray hair.
So, here’s my relationship advice to you. Don’t throw the OLAP baby out with the bathwater when you look to modernize your analytics infrastructure. Sending your business users and data scientists back to using SQL to do their work is asking them to become data engineers. An OLAP powered semantic layer makes data available to everyone, regardless of their skill sets and lets them use the analytics tools they already know and love – true data democratization.
The secret to a successful relationship? Give the people what they want: access to fast, accurate, secure, granular data that’s ready-made for self service.