November 24, 2020Unlocking Business Insights with Virtual Cubes & Data Modeling
*Author’s Note: This is an update to the blog post “The 1990’s Called, They Want Their OLAP Back,” which was originally published here.
For many enterprises, OLAP and “Cube” had become dirty words. When asked if people had experience with “cubes”, I have heard people say:
- How do you handle data explosion?
- How do you handle big data?
- How long does it take to process the cube?
- How long does it take to make changes, like adding a dimension?
- Do we have to move data? Do you support HBase? Do you support….?
- How long? How long? How long?
It was easy to see that these OLAP Admins and power users had been living with some pain. To add insult to injury, data was growing exponentially fast and business users still expected “speed of thought” response times. Of course, in many cases “speed of thought” had begun to slow down for them. As expectations were lowered, “Can we get it under 15 seconds?” had morphed into “Can I get it by lunch?”. This is why I founded AtScale. The world needed a new type of OLAP.
REALLY BRIEF HISTORY OF SSAS AND OLAP
In 1992, Arbor Software shipped the first version of Essbase. Which stands for Extended Spreadsheet Database.
In 1998, Microsoft shipped Microsoft SQL Server Analysis Services.
The time of multi-dimensional databases had come into full being and almost 30 years later these OLAP engines are very much still in play. For those who may be new to OLAP, it stands for Online Analytical Processing, if you want to totally geek out have a look at Dr. Edgar F. Codd who coined the term. (Hint: It doesn’t stand for OLD Analytical Processing!)
Databases up to this point were essentially two dimensional: records and fields. They required query language knowledge in order to retrieve data. With the onset of OLAP, business users were now able to ask questions of the data in a dimensional fashion, get “speed of thought” answers and not be required to learn any type of query language. They could simply go into Excel and drill down, pivot, and swap dimensions and measures. OLAP quickly became the default language of business. Instead of looking at records, fields, or facts, any user could come in and begin looking at “Sales by Region”, “Sales by Product”, “Sales by Channel”, Sales by Market”, or, by a Time Period. These “by’s” are what define the actual multidimensionality of OLAP and the ability to just drag & drop, drill down, and drill up all within Excel with the click of a mouse (versus writing SQL) became the rage of the 1990’s and in large part still is today.
Theoretically, these OLAP engines had no dimensional limit. Of course, in the real world it became evident very quickly that movement of data off source, cardinality of dimensions, intensive calculations and size of data played a very key role with regards to performance. In the early days, approaching ten to thirteen dimensions could and did very much begin to degrade performance, and not just query performance. Changing the “cube”, updating the data, and calculating data could begin to take hours if not days. These challenges became acutely apparent as things progressed and of course we had not yet introduced truly “Big Data”.
CHALLENGES OF OLAP ON DATA IN THE CLOUD
Fast forward. As we move through 2020, “Big Data” and “Hadoop” are steadily marching along. OLAP cubes are still widely in-use and definitely “exploding” with data. What does ‘exploding’ really mean? At Yahoo!, I was running with a 24TB SQL Server Analysis Services Cube that took seven straight days of non-stop computation in order to build the cube and months to make any type of change. What’s more, enterprises are starting to move their data to the cloud and they expect access to even more data and without the wait.
TOWARDS A BRAVE NEW OLAP: CLOUD OLAP!
Enter AtScale. How do we solve for truly large amounts of data, models with large numbers of dimensions and measures, and the absolute need for interactive query performance?
I would start these meetings with, “This is not last century OLAP”, or with something along the lines of, “The 90’s called and they want their OLAP back!” In reality, what I meant was, you need Cloud OLAP.
So how do we take advantage of concepts like distributed compute? How do we keep the things that worked with regards to Online Analytical Processing but lose the really painful stuff? While taking into consideration, Big Data? I’ve seen many attempts over the last few years and most of them have failed, or worse; created even more complicated and extremely complex environments. With regards to complicated environments, all one needs to do is look at the Big Data ecosystem. It’s enough to make one wish for the 1990’s metal band big hair to come back.
Companies are taking their Big Data and making it small, pushing it to relational databases, moving it off platform, relying on many different technologies and/or extracting it into various reporting tools. Where, unfortunately, they are likely in violation of corporate governance and have the pleasure of fighting over whose numbers are the right numbers. It’s the same old rock band but it is incredibly more complicated and incredibly more “BIG”. Worse, many vendors are pushing this; move data, index data, calculate all intersections, it’s the same old MOLAP issues with a bigger problem; massive volumes of data.
So how does AtScale solve this problem? It’s quite simple; Cloud OLAP. Intelligent Data Virtualization is the key. We are the experts at Cloud OLAP or COLAP. To the end user of Tableau or Excel we look like the good old big hair band we all knew and loved back in the early 90’s, but without all the limitations because we aren’t storing and pre-aggregating every single data cell. Business users can still see their business by Time, by Region, by Market, etc. They can still drill down, swap, pivot and more importantly they still have their performance.
This data virtualization layer sits on a thin compute layer of your data platform, where it intercepts inbound SQL or MDX from various BI tools and converts it into Google Big Query, Snowflake, Redshift Azure Synapse, Spark, Teradata, Oracle, etc. SQL and queries the data where it landed. There is no data movement off platform, AtScale creates acceleration structures that are stored on platform and provide an order of magnitude of higher query performance. We are leveraging the power of distributed compute, we are leveraging the power of these various platforms and we are only storing smart slices of data informed by user query behavior. These acceleration structures are stored right next to the raw data, meaning there is no data movement off of the platform.
This means a couple of things. First, you gain all the advantages of a Universal Semantic Layer, including but not limited to: a single source of truth, meaning no more fighting over whose numbers are correct. Data governance is ensured because there is no data movement and access rules are applied to every query. Additionally, because of the virtual nature of AtScale cubes, you no longer have the restrictions of yesteryear. I’ve seen models with 100’s of dimensions and 100’s of measures. There are no “data explosions” as AtScale’s accelerations structures only represent a fraction of the cube that is actually being queried. You don’t find yourself in a situation where you have to buy a bigger or faster box in order to, in the words of Ricky Bobby, “go fast.” You don’t scale up, you scale across, again leveraging the power of distributed compute or whatever the underlying platform is and NOT bottlenecking yourself into one box that is trying to do all the heavy lifting and failing. As your data platforms gets faster, we get faster. Whether it’s on premise or in the cloud. As you add new data nodes and compute power to the platform, we automatically get faster. If you decide, as one customer did, to move from an on-premises Hadoop cluster to Google BigQuery it’s a simple plumbing redirect, not a months long project with painful end user disruptions.
It’s like you’ve got the best music from the early 1990’s, without all the big hair. It’s a masterpiece movie from the past digitally remastered to take complete advantage of modern capabilities.
It’s OLAP for the 21st Century without all the baggage. It’s Cloud OLAP.