November 4, 2020The Three Opportunities for Cloud OLAP to Improve Enterprise Analytics
Before You Break Up with OLAP
While relationships can be challenging at times, they are hopefully worth the effort. When your significant other repeatedly leaves the cap off the toothpaste or leaves dirty dishes in the sink, it might drive you crazy. Our relationship with technology is no different. If you’re like me, you still daydream about a perfect world where you have all the good without the bad.
The Shotgun Marriage: SSAS & Big Data
While running data pipelines and analytics for Yahoo! in 2012, I struggled to keep up with the growth of our advertising data and make it usable for the business. We had truly big data before the term “Big Data” was invented. On top of the data volume challenge, I had several internal and external customers who all wanted to consume this data using different tools and applications. The result was utter chaos. Our most simple terms of “page views” and “ad impressions” had different definitions across our business units – there was no single source of truth for our most basic business metrics and my users didn’t trust our data.
I needed a universal semantic layer, or data API that would:
- Work with a variety of data consumers and tools
- Deliver fast, consistent query performance
- Define secure metrics in once place with an easy-to-use business interface
- Provide a self-serve interface for business analysts and data scientists
- Scale with our business
- Minimize or eliminate manual data engineering
- Avoid data copies and data movement
My SSAS cube ended up becoming the largest SSAS cube in the world at 24TB for 3 months of data. While 5 times larger than the next largest cube, 24TB is really not “big data” in my definition. Yet, keeping that beast fed required a delicate dance involving NetApp snapshots and tricked out ETL code. But my end users loved it and they were able to do wonderful, revenue-producing things with it. I really needed a way of having my cake (OLAP functionality) and eating it too (without the OLAP architecture).
At the time, there was no single solution that could meet my objectives. So, to satisfy these requirements, I forced a shotgun marriage of Hadoop, Oracle and SQL Server Analysis Services (SSAS). The data landed in Hadoop, was pre-aggregated and ETL’ed into Oracle as a staging area, and then processed into a SSAS cube for end user consumption.
Why I Broke Up with SSAS
My relationship with the above stack was not great. While I achieved my business objectives of delivering real value for my customers and promoting an amazing degree of self-service, the architecture was too rigid, too fragile, didn’t scale, wasn’t securable and was not sustainable. Here’s how I scored our results:
Business results: Terrific. Our SSAS display advertising cube delivered over $50 million in lift to that business every year and consumers generated a crazy amount of user generated, high value analytical content.
Level of effort: Unbearably high. It took my team 4-6 weeks to add a new dimension or metric and 7 days(!) of 24×7 processing to rebuild the cube. On top of that, I could only deliver 3 months of data, not the 15 months my users needed to do year over year comparisons.
Meeting my 7 objectives: Poor (3 out of 7). The solution hit the mark on ease of use, query performance and semantic consistency but failed miserably scaling. Just as bad, the architecture required an army of highly skilled data engineers to make it work and generated at least 3 copies of (huge) data.
While I was in love with OLAP’s ease of use, it’s query speed and business friendly semantic layer, I had a hard time with its underlying architecture that made my team’s life hell. It was clear to me that the dimensional model was a winner – end users loved it, they trusted the results and it promoted self service and got me and my data team out of the reporting business. However, OLAP’s fundamental architectural approach of pre-calculating every intersection of data into an array and storing it on disk was not scalable and introduced too much latency and complexity. It was clear I wasn’t getting what I needed out of this relationship. I needed a break to reassess our future together.
OLAP Functionality Without the Headache – A Healthy Relationship
So, my co-founders and I started AtScale to solve this problem once and for all. We took the best of OLAP and married it with a modern approach for data management. With AtScale, we implemented the MDX, DAX, Python, REST and SQL protocols but delivered it through a no data movement, direct query architecture.
With AtScale, we delivered a solution that has the best of both worlds:
- SSAS (OLAP) compatibility with Excel, Power BI & any other BI tool that speaks MDX, DAX, Python, REST or SQL
- Fast, consistent and low-cost multidimensional and tabular queries
- Simple, delightful modeling tools to add new data elements and data sources instantly
- Direct access to any data lake or data warehouse, including new cloud data warehouses and nested data
- Security and governance controls to manage data access in one place
- A single source of truth for critical business metrics, defined server-side
It wasn’t easy to deliver on the above goals. OLAP is inherently a cell-based calculation engine. In other words, it works like a spreadsheet. Translating OLAP’s cell-based calculations into dialect optimized queries against Hadoop, Google BigQuery, Snowflake, Amazon Redshift, Azure Synapse SQL and many more was enormously difficult. In fact, it took us almost 3 years of engineering work before the product even saw the light of day. More than 8 years later, we think we have cracked the code. Trust me. If I had AtScale when I was at Yahoo! – I would have a lot less gray hair.
So, here’s my relationship advice to you. Don’t throw the OLAP baby out with the bathwater when you look to modernize your analytics infrastructure in the cloud. Sending your business users and data scientists back to writing SQL to do their work is asking them to become data engineers. An OLAP powered semantic layer makes data available to everyone, regardless of their skill sets and lets them use the analytics tools they already know and love – true data democratization.
The secret to a successful relationship? Give the people what they want: access to fast, accurate, secure, granular data that’s ready-made for self service.