How Cloud OLAP, Data Virtualization, and a Universal Semantic Layer Make Cloud Costs Manageable and Predictable
From the earliest days of business intelligence to today’s predictive, prescriptive, AI and ML powered data analytics, organizations like yours have relied on insights derived from their data. Whether they’re analyzing demographics, purchase intent, logistics, customer satisfaction, or even social media engagement, they have endless opportunities to become data-driven organizations, empowered by analytics to take insight-driven actions that make their businesses better.
But as everyone knows, the gap between the dream of becoming a data-driven organization and the reality is often a yawning chasm. Analytics as a discipline struggles to cope with exponential data growth, time-consuming data preparation, disparate data sources, and limited on-premise infrastructure that, over time, becomes less and less agile. These issues result in fewer and fewer analytics-driven decisions, more and more disappointment, and discouraging return on investment from expensive analytics infrastructure, toolkits, and expertise.
To minimize these challenges, many organizations pursue analytics in the cloud, which offers agility advantages, including speed of provisioning, unparalleled scale and flexibility, simplified maintenance, and ease of use. However, many organizations that have gone all-in on cloud discover that while they do enjoy those hyped advantages of cloud, they end up dealing with new, unexpected challenges. Most organizations using cloud end up with complicated, time-consuming migrations. Once in the cloud, they discover that the cloud comes with added latency and might not be as performant as on-premise infrastructure. Then there are the costs of each query, which can be wildly unpredictable. Some have even reported unexpected costs in excess of $50,000 for a single query!
There goes the budget, and consequently, the ROI.
These are sobering problems, and until now, mitigating them hasn’t been easy. In a recent executive survey, fewer than 38% of leading organizations say that they have a data-driven culture, and that’s despite the fact that nearly 99% of them report investing in data analytics. There remains a gap between investing in data analytics, and becoming a data-driven organization that’s able to realize true ROI from their analytics expenditures.
AtScale helps organizations become data-driven by mitigating analytics obstacles and challenges. In this paper, we’ll expose the challenges of data analytics and the ways that they produce unacceptable outcomes with intolerable costs. We’re going to explore some ways to mitigate these challenges. Finally, we’re going to show you how AtScale, a Cloud OLAP platform that provides a single, virtualized semantic layer for all your data and all your analytics tools, has proven to improve performance, scalability, agility with cloud data warehouses like Snowflake and Google BigQuery as well as on-premise data sources for a new approach to reducing costs and improving ROI.
Top Cloud Data & Analytics Challenges
Let’s face facts, analytics isn’t easy, and most organizations find there are fundamental issues that impede their ability to cut analytics costs because basic issues delay analysis, and fixing the problems increases costs. To put it another way, there are forces inside almost any organization that reduce analytics velocity over time.
Coping with different, siloed sources of data is a typical challenge. Over time, organizations have different application data sources (IoT, customer records, social media, etc), stored in different locations, accessed by different tools. Being able to conduct analysis across all those sources would be ideal, but spanning and connecting these different sources and data types is a complicated, time-consuming process. Combining data from data lakes and data warehouses often requires time-consuming manual intervention done by specialists who are in short supply. Application data exchange becomes an ever-worsening problem.
To make matters worse, traditionally, organizations combining data sources are forced to move large-scale data sets from the cloud to on-premise analytics infrastructure or vice versa. That’s labor intensive, takes time, and often results in unwanted transfer costs.
Many organizations also lack the ability to quickly add new data sources, including third-party data sources, to fill gaps and fix ambiguities. For example, they recognize that purchasing and integrating third-party demographic data could give them new opportunities to ask better questions, but accessing and integrating the data takes time they don’t have.
Many organizations simply avoid these efforts because they come with unpredictable hard and soft costs, including the need for new tools, consequent training, upskilling, even hiring, and the hidden costs of time spent in exchange for unknown advantages.
To sidestep these obstacles and control these costs, organizations need a way to improve three key areas of their analytics infrastructure:
|1||Application data exchange. Having a universal “source of truth” that sits between data sources and downstream analytics can give analytics tools like Excel and Tableau access to any and all data types, potentially without data movement. This semantic layer has to be universal and fast.|
|2||Self-service tooling. Prioritizing self-service tooling makes anything faster, and that’s especially true of analytics, where analysts often spend days in “trouble ticket hell” waiting for access to a data lake or a file share. Self service tooling empowers analysts to move forward faster and accomplish analysis more quickly.|
|3||Performance and scale. It’s not uncommon for data growth and analytics demands to outstrip physical or cloud infrastructure. Organizations need a way to use both on-premise infrastructure and cloud infrastructure seamlessly, in combination, to build a highly performant, highly scalable infrastructure. With that, organizations avoid complexity and costs while improving simplicity and speed.|
And that’s a perfect transition to our next challenge. If you ask data scientists, you’ll see that to cope with the issues of data velocity, analysts are increasingly forced to minimize analytics granularity, aka breadth and depth of insight. All of the factors that impede velocity also influence granularity, because organizations discover that they can have speed or depth, but rarely both.
First, data growth makes conducting analysis harder and more time-consuming. If an organization collects a terabyte of new data every month, data access, searches and queries take more time than ever before. Ultimately it’s not uncommon to have data sets that are too large for a tool to conveniently use. Organizations work to dodge this problem with extracts, but extracts provide a limited, superficial view of data that’s inconsistent over time.
Second, data science teams would love to conduct drill-down analysis, but rarely have the time. Data access and preparation challenges stand in the way of their ability to ask follow-up questions, especially in organizations that have rapidly evolving, changing, and emerging demands of the data science team.
It’s ironic that in the face of rapid data growth and the availability of more data sources that often, an analysis isn’t as deep or as broad as it should be. But it’s a known challenge, made worse by the cost complexities of choosing the right infrastructures for analytics requirements.
There are ways around these problems. By using an OLAP tool that supports data virtualization, organizations can mitigate the weaknesses of Excel, Tableau, and Power BI, reducing complexity and accelerating time to value. The right OLAP tool offers a path past the million row limit, the need for extracts, consistency problems, update handling, and slowdowns in data
What happens when organizations use the right OLAP tool?
– The need for extracts is eliminated. By connecting BI Tools “live” via its built-in OLAP interface, analysts can continue using Excel to do an analysis based on full access to the data warehouse. Data access happens in minutes instead of days, and all the challenges of data extracts go away.
– They end up with a universal semantic layer for data’s business definition — enforcing consistency and data governance so that one analysis doesn’t differ from another no matter what BI tool is used.
Many organizations try to drive down their costs by rethinking their infrastructure. But for analytics, that’s a challenging undertaking. Data analytics requirements are an ever-emerging, always changing area of possible changes for unpredictable advantages, and that means that cost savings are often hyped, but rarely achieved.
On-premise data analytics costs
Most organizations begin (or began) their analytics initiatives with on-premise infrastructure. Unfortunately, it’s an approach with intrinsic weaknesses because on-premise infrastructure isn’t agile. With on-premise infrastructure, coping with new demands, data growth, or requests for added performance isn’t a trivial task. Adding compute capacity or storage capacity isn’t automatic or easy. It has to be procured, shipped, installed, and set up. Then it has to be monitored, maintained, and repaired.
If organizations want to evaluate emerging technologies, including GPUs, DPUs, and persistent memory, those technologies have to be purchased, evaluated, and a determination made on improvement vs. cost.
Sometimes, it’s actually impossible to perform large-scale analytics with an on-premises data warehouse. Scaling physical hardware to the needs of customers is so cost intensive that you might well find yourself running a data center rather than performing analytics on your data.
Because of these characteristics, on-premise infrastructure requires capital expenditure, operating expenditure, and often comes with delays in time to value.
To address these challenges, many organizations choose to do analytics in the cloud. After all, the cloud is so easy to procure, it literally only takes a credit card and a few minutes. Most public clouds come with specialized analytics capabilities and integrations. Adding performance and capacity takes moments. They’re often on the leading edge from an infrastructure standpoint. As a result of these advantages, the cloud is a no-brainer for organizations that want to conduct broader, deeper, more impactful analysis more quickly and with greater ability to make changes on the fly to adapt to emerging trends, possibilities, and opportunities.
Or is it?
The cloud may be easy to procure, easy to use, and easy to grow or reconfigure, but it’s also easy to overspend on.
As everyone knows, the first challenge is data migration. Migrations are never easy, and they come with costs. If an organization has dozens of terabytes of data sitting in siloed, on-premise storage, they have two choices. One, they can migrate it to the cloud — possibly incurring transfer costs and time-wasting data movement before a single query can be done. Second, they can try to use technologies that make connections between on-premise data and cloud analysis. These usually aren’t intuitive or easy, come with additional costs, and have to be carefully managed, especially across disparate data sources.
Then, once migration is done, you have to cope with the costs of storage. Today, Snowflake costs $23/TB/month, or $828/TB over a three year lifespan— which doesn’t compare favorably with the roughly $50/TB cost for a typical enterprise hard drive that can last for more than three years. To make matters worse, as data grows, data costs increase. This increase can, depending on your data sources, be unpredictable. It’s easy to see a 10x or 20x increase in storage costs year on year.
Second, understanding performance needs is an art rather than a science. Cloud data warehouses offer many different instances and different performance levels. It’s not uncommon to find out that an organization chose relatively low performance instances to control costs, but as a result, added days to their decision-making time. Other organizations discover a gap between expected performance and actual performance and are forced to change performance tiers or shift to a different storage type.
Finally, overall costs differ, depending on region, preferred cloud provider, and preferred platforms, even for identical sets of data and queries.
In short, controlling costs for data analytics is a challenging balancing act of dozens or even hundreds of decisions each day, week, and month. It’s just simply not as easy as it could be. And these costs often influence granularity as well. Drill-down queries cost more money. Expanding data sources or data volume sizes cost more money. Conducting more data preparation costs more money. So it’s not unusual for organizations to compromise on granularity because costs are spiraling out of control.
Some ideas for addressing these problems include:
|RECOMMENDATION||WHY?||OUTCOME AND BENEFITS|
|Create a Centralized Cost Management System (including a team)||A cost oversight team provides accountability||If a single department is using too many resources on their queries, there is a concrete reporting structure that can investigate and then rectify this issue. It also allows you to create uniform policies across your departments, such as automatically killing any query that runs longer than two hours, or automatically downscaling warehouse instances that are scaled too large for too long. Finally, there’s an appeals structure in place so if a department needs to run a query that would otherwise violate policy, the business leader has someone to ask about it.|
|Understand How Data
Warehouse Sizes Maps to Cost
|There are no hard and fast rules governing warehouse size and how this relates to cost.||Simple tasks—such as writing data on and off a disk—tend to need less compute power, so you can run these tasks on a small and less expensive cluster. Even in cases that involve more complex tasks, you can reduce the size of a data warehouse with barely any perceptible difference in how the job executes and realize significant cost savings. But sometimes, large aggregates and joins will take large amounts of computing power. Provisioning those jobs with the maximum amount of compute and memory means that they’ll get done faster.|
|Create Visibility||Visibility Equals (Cost) ControlT||Every job you run contains metadata which is stored and queryable. Since you are running an analytics platform, you have the tools to derive extremely detailed information from the jobs you run, visualize this information in a meaningful way, and then use this information not just to manage costs, but also to anticipate future costs (and manage those as well). You can condense these metrics into a report that is sent regularly to your cost management team. As they look over this data, they’ll see spikes and discrepancies, giving them new opportunities to manage costs.|
But technologies exist to help mitigate these challenges, giving organizations a new way to optimize their analytics initiatives, improving hard and soft costs while boosting both velocity and granularity.
AtScale helps organizations deliver better sales analytics velocity and granularity. It’s a powerful OLAP tool for virtualizing data sources without data movement, and it’s used the Global 2000 to drive better data velocity and granularity while reducing the costs of their more impactful business decisions.
AtScale delivers up to 9.5x improvement for Data & Analytics ROI.
It serves as an engine that accelerates analysis by providing a connection between all data lakes and data warehouses on the one hand, whether on-premise or in the cloud, and all business intelligence tools, AI/ML tools, and applications on the other hand.
How does AtScale help solve the cost conundrum?
|Data complexity and data access
AtScale gives organizations a way to virtualize any data source, serving as “the single source of data” for any requirement. Analysts can create a virtual cube that serves any analyst in the organization, across any data source, on-premise or cloud, without migration. That means that organizations can deepen and broaden their analysis faster than ever before, adding new and siloed data sources without added complexity.
|Autonomous data engineering
AtScale also improves the efficiency of data preparation efforts. Organizations can minimize manual engineering like cube building and ETL. They can also easily add new datasets and quickly add new dimensions, reducing effort, speeding time to insight.
AtScale also powers fast analytics at scale. Organizations that choose AtScale have a platform that powers live connections to any data source with accelerated performance. Analysts now have a way to access more data, from more sources, use more rows, more quickly, and deliver sales analysis that impacts business performance right away with appropriate velocity and granularity. And to make matters better, query times are improved anywhere by 5x to 100x, reducing cloud costs as well as the need to overprovision on-premise infrastructure.
AtScale also offers this power non-disruptively. Whether someone is using a BI tool, an analytics tool, or others, AtScale powers the virtual cube and business definitions. Direct data access from Excel, Tableau, and other tools lets organizations choose the right tool for the job. Organizations don’t have to worry about the costs of hiring new data scientists or upskilling existing staff to cope with a tool change.
Organizations that use AtScale rapidly migrate away from more complex, traditional approaches because the advantages are clear-cut and immediate. AtScale has been proven, in production environments, to easily replace SQL Server Analysis Services (SSAS) with no disruption to operations. One customer used to run 300K queries a month using SSAS, and now runs the same number of queries in AtScale.
AtScale offers anywhere from 2.5-9x improvement of data and analytics ROI. This is a result of AtScale’s positive impact on query performance, user concurrency, query compute costs and SQL complexity.
Based on cloud benchmarks, AtScale has an order of magnitude impact on the ROI of each major cloud data warehouse. Customers powering their analysis with AtScale control the costs and complexity of their cloud analytics, maintain a consistent and compliant view of data across the enterprise and force multiply the effectiveness of their data and analytics teams.
With AtScale, forecasts are performed faster and delivered sooner, accelerating operational decision making from days to seconds. They’re also higher quality, built on consistent data, even across multiple users and over time. Businesses that use AtScale are able to manage their costs more effectively while making smarter decisions faster. These advantages give them unprecedented opportunities to take advantage of evolving economic conditions.
Let’s explore how AtScale’s capabilities provide advantages with Google BigQuery as well as Snowflake on Amazon Web Services, with some customer use cases.
An online retailer in the Netherlands turned to AtScale for help optimizing their data analytics initiatives. With over 11 million customers, 23 million items, and over 40,000 partners selling their products. Our 2,000 employees analyze data growing steadily year over year from over 250 data sources using 3,000 workbooks. From initial initiatives that began with a single Oracle BI stack, they’ve now adopted Tableau, AtScale, and Google BigQuery for all their analytics endeavors.
From their perspective, overcoming data velocity and granularity problems is the essential foundation for ensuring that data analytics provides value to the organization. With 2000 users, they had to make sure that any analyst could conduct time-efficient, drill down analysis on rapidly growing data volumes without relying on complicated IT interventions. They took a forward-looking approach to making sure their BigQuery platform had the capacity they needed, the cost model they could tolerate, and the features to support their usage models.
Setting up very detailed, real-time monitoring of their systems, tracking CPU, memory, disk I/O, network traffic and query response times helps them identify common bottlenecks. One of the most common, user request queues, are fixable with small configuration changes. These avoid larger, more expensive interventions like adding more machines or more powerful machines, interventions that are never. Without this depth of monitoring, costs could get out of control. Having AtScale in place offers other cost-improving advantages. AtScale cuts query time and delays due to user concurrency. AtScale’s consistent data modeling cuts down on manual intervention, speeding time to question and time to insight, improving velocity. AtScale powers more cost-effective depth and breadth, improving analytics granularity. Finally, AtScale also lets them access anything from a single semantic layer, improving data governance without added complexity.
In recent benchmarking with Google BigQuery, AtScale has been proven to improve:
One AtScale customer, a major retailer that helped shape the way people shop online, offering Cash Back, deals and shopping rewards on the world’s largest selection of products and services, had a typical trajectory toward Snowflake. Beginning with SQL for business intelligence and then shifting to Hadoop, over time, as their data grew, the retailer ran into the challenges caused by disconnected data sets, including dozens of discrete warehouses, such as missed SLAs, problems with multiple concurrency, and the high costs and complexity of physical, on-premise infrastructure. After some successful POCs, they migrated their entire stack (Tableau, AtScale, and all data) onto Snowflake.
Snowflake helped them solve these problems with three key characteristics: The cloud warehouse is infinitely more scalable, and doesn’t force them to blend data from different levels of granularity across different systems. All the data with Snowflake is within a single data warehouse, although administrators can and do subdivide that warehouse for different users, by department, for example. User concurrency isn’t an issue — if marketing runs a complex query, it doesn’t force the sales analytics team into a lengthy queue.
But these advantages come with a downside: cost. If you give every department a data warehouse, cloud costs can spiral out of control as each administrator adds additional compute and memory and storage without oversight.
A centralized cost management system, run by a cost oversight team, can help. Applying uniform governance and policies helped them keep an eye on costs, and automate controls, such as killing any query that runs for more than two hours.
Having a cost control team helps everyone understand how data warehouse sizes map to cost. Optimizing every cluster and warehouse keeps costs manageable. Snowflake provides reporting on every job, and using Tableau, it’s easy to provide a weekly report to all stakeholders and departments to improve visibility, help teams make better decisions, and strike a balance between ensuring analytics provides value, and making sure it doesn’t break the bank.
AtScale helps this company in a few critical ways.
There’s a lot to consider when it comes to cost-optimization of data analytics in the cloud, but there are techniques that can save you a significant proportion of your data analytics budget. AtScale helps organizations like yours improve operational velocity, conduct deeper, broader, more granular analysis, and support and sustain cost-cutting initiatives.
AtScale provides the premier platform for data architecture modernization. AtScale connects you to live data using one set of
semantics without having to move any data. Leveraging AtScale’s Autonomous Data Engineering™, query performance is improved by order of magnitude. AtScale inherits native security and provides additional governance and security controls to enable self-service analytics with consistency, safety and control. AtScale’s Intelligent Data Virtualization™ and intuitive data modeling enables access to new data sources and platforms without ETL and or needing to call in data engineering.
AtScale powers the analysis used by the Global 2000 to make million dollar business decisions. The company’s Intelligent Data Virtualization™ platform provides Cloud OLAP, Autonomous Data Engineering™ and a Universal Semantic Layer™ for fast, accurate data-driven business intelligence and machine learning analysis at scale. For more information, visit www.atscale.com.