Unlocking Business Insights with Virtual Cubes & Data Modeling
Following up from my previous blog post “The Three Opportunities for Cloud OLAP to Improve Enterprise Analytics” – these business constructs in business analytics are not new (multi-fact tables, hierarchies, drill-down paths, shared/unshared dimensions). There have been BI tools that have been doing these types of really complex analytics for some time. The challenge has been the way that they did it. They had a limited scope of data, or they just had less data. The way to do it back in the day was to effectively pre-aggregate and pre-calculate everything. That allowed analysts to use those constructs to answer very complex business questions and get answers in short order. The challenge today is that the scale of data has grown tremendously. More data means there’s machine data, data coming from different silos, and third-party data. The prior pre-calculation method is just not really feasible anymore.
New data is coming in rapidly. As soon as you pre-calculate, everything’s stale. At the other end of the spectrum, is that pre-calculating everything results in a massive (and long to build) data object. Bringing all of your data into one big bucket, was very possible 20 years ago. Today, the bucket is more like an ocean and it’s not really possible to do that.
Unlocking Insights with Virtual Cubes
In the era of big cloud data, we have cross platform cloud data that impacts both network and data latency requirements. AtScale decided really early on that instead of enforcing this massive ETL project to create an old-style cube, AtScale now will provide a virtual cube. The virtual cube with Autonomous Data EngineeringTM was key. We didn’t have to adhere to this old style of materializing the entire object before anyone even asked a question of it. The AtScale virtual cube is an abstraction and data modeling layer. You can ask any question of it and as it learns your type of questions, it begins to prepare for future ones. The work AtScale does creating our acceleration structures is very pointed and optimized. They adhere to at least 80-20 rules, where 80% of your questions are actually going to be answered by 20% of your attribution.
You also don’t want to just build out the entire object because it’s a waste of time. And frankly, it’s not really a scalable solution. We need a new way. And that’s why AtScale has Autonomous Data Engineering and a virtual cube.
What to Expect this Holiday Season
What we’ve found with the kind of challenges that we’re having with COVID-19 and the remoteness of our employees has been most notably the heavy dependency on cloud infrastructures. Companies used to be able to depend on a core set of individuals to maintain a data center. The current crisis really accelerated the movement towards the cloud and the use of cloud services. No one is going into the office these days. The people who used to maintain a data center are, no longer there, or they are maintaining it remotely. And now many of those systems are no longer even in a data center. Now they’re all going to cloud computing. AtScale started in the data center, but in the era of Big Data. We knew we had to orchestrate the data and services and that copying the data was a bad idea (even when the egress/ingress costs were low). Cloud computing was different, but in many ways it was a perfect evolution for AtScale. Again AtScale doesn’t want to own the data, AtScale orchestrates cloud computing. We choose to not own the data.
It was important to be able to minimize that movement because of the evolution that we’re currently seeing. People go into the cloud and there’s less dependency on people to be in particular places in order to access data. We need to be able to be flexible and be that abstraction layer. From an AtScale perspective, we’ve always assumed that you’re going to have systems that might evolve to the cloud but that you also might have big data on-prem. You might have folks who manage a data center who go away at one point. All of that stuff really evolves.
AtScale never wanted to be another silo of data; frankly we wanted it to be an orchestration of data and data services and provide a shared data modeling layer throughout the organization. I think the current situation for us makes it clear that decision was a good one. We don’t want to be yet another system of data that now you have to maintain maybe on-prem or maybe on the cloud. No, you actually have to go to a point where you’re more about orchestration of data and data services. And that’s really what we’ve been doing from day one.