Close

Request a Demo

How to Improve Your Analytics ROI on Snowflake

Five Practical Techniques

INTRODUCTION

Performing large-scale analytics with an on-premises data warehouse is becoming increasingly difficult, and sometimes impossible. Scaling physical hardware is cost-intensive in both dollars and time, bleeding resources that should be reserved for actually performing analytics on your data. To overcome these challenges, you may have migrated, or be planning a migration, to a cloud data warehouse such as Snowflake.

Snowflake is a tremendous data warehouse platform for analytics that promises cost-savings and infinite scalability. It also eliminates the need to manually blend data from different systems, meaning that if you need to do something like taking a data model table and turning it back into a data lake table, you can simply join it across in a single SQL statement, resulting in massive time savings in terms of data management.

However, there are some caveats to these promises. Organizations that move to the cloud are frequently surprised by its costs, wondering where the cost-saving benefits are once they see the bill. This is for two reasons. First, cloud pricing models can often scale in unexpected ways. Second, cost, processing power and efficiency don’t have a linear relationship in the cloud. Companies can save on costs by using smaller data warehouses for less important tasks, but they can also save on costs by using larger implementations to perform more complex tasks faster. Everything is case-by-case.

Moving your data warehouse to the cloud can give you massive advantages in terms of flexibility, scale, and potential cost savings—but there are key things you should pay attention to in order to avoid paying more than you expected. This guide contains five ways to manage the costs of running analytics on the Snowflake platform and how to increase your ROI at the same time.

One:
Create a Centralized Cost Management System

Create a Centralized Cost Management System

Technically, all data within Snowflake is within a single data warehouse, but administrators can subdivide that data warehouse as much as they like, creating multiple data warehouses for the organization. This lets the company isolate data warehouses by department—in other words, if marketing runs a complicated SQL query, it won’t force the sales analytics team into a lengthy queue.

There’s a potential pitfall inherent to this approach. If you give each team its own data warehouse and then look away, you may yourself suddenly hemorrhaging costs into the cloud.

First, your team will get stuck on a difficult query. Next, they’ll set the data warehouse to 4XL in order to throw additional compute and memory at the query. Finally, they’ll get through the query, but they’ll forget to tune down the data warehouse. The data warehouse is effectively stuck at 4XL, which is twice as expensive as 3XL, which is twice as expensive as 2XL, and so on.

You can avoid this by creating a cost oversight team to help make sure that your data warehouse scales to the size of its queries—and then scales back. A cost oversight team provides accountability—if a single department is using too many resources on their queries, there is a concrete reporting structure that can investigate and then rectify this issue.

It also allows you to create uniform policies across your departments, such as automatically killing any query that runs longer than two hours, or automatically downscaling warehouse instances that are scaled too large for too long. Finally, there’s an appeals structure in place so if a department needs to run a query that would otherwise violate policy, the business leader has someone to ask about it.

Two:
Understand How Data Warehouse Sizes Maps to Cost

Understand How Data Warehouse Sizes Maps to Cost

There are no hard and fast rules governing warehouse size and how this relates to cost. With that said, in most cases you’ll find yourself making warehouses smaller rather than the other way around. Simple tasks—such as writing data on and off a disk—tend to need less compute power, so you can run these tasks on a small and less expensive cluster.

You may be surprised to find that even in cases that involve more complex tasks, you can reduce the size of a data warehouse with barely any perceptible difference in how the job executes and realize significant cost savings. On rare occasions, the reverse will be true. Large aggregates and joins will take large amounts of computing power. Provisioning those jobs with the maximum amount of compute and memory means that they’ll get done faster. Sometimes, they’ll get done so fast that the cost of provisioning these jobs at 4XL is less than the cost of a medium-sized warehouse.

Think of this as more of an art than a science. You might ask yourself, if you’re running tens of thousands of jobs a week, how can you learn which jobs need to be right-sized in any kind of approachable or scalable way? Fortunately, the jobs themselves contain the answer.

Three:
Visibility Equals (Cost) Control

Visibility Equals (Cost) Control

Every job you run within Snowflake contains metadata which is stored and queryable. In other words, every job you’ve every run is stored within a data table. Since you are running an analytics platform, you have the tools to derive extremely detailed information from the jobs you run, visualize this information in a meaningful way, and then use this information not just to manage costs, but also to anticipate future costs (and manage those as well).

Let’s say that you take the data tables containing your query metadata and put them into Tableau. This means that you can see your queries as they’ve been run every month, every week, every day, and every hour. You can run reports on multiple axes—how long each query runs, how much each query costs, how many resources each query needs. Finally, you can condense these metrics into a report that is sent regularly to your cost management team.

As you look over this data, you’re going to see spikes and if they are anomalies or recurring, and you’ll be able to predict when you’re going to see anomalous spikes in your data. If you can predict these spikes, you can see what’s causing them—badly-formed queries, or queries that are wrong-sized for the warehouses they refer to, or tasks that are queueing for abnormally long amounts of time. Once you derive the cause, you can eliminate the slowdowns that accumulate both costs and frustration for you, your users, and your customers.

Four:
Understand Your Control Mechanisms Within Snowflake

Understand Your Control Mechanisms Within Snowflake

There are multiple levers you can pull that will allow you to control costs within your Snowflake implementation. If you’re using multiple warehouses within your implementation, you can easily isolate a single instance that’s causing an issue, and you can experiment with it until you find a mechanism that works to control costs and eliminate slowdowns.

Horizontal Scaling

If a single warehouse doesn’t have enough compute power to handle the number simultaneous queries coming through, it will begin to queue them so it can handle one query at a time. Alternatively, you can set Snowflake to scale horizontally—spinning off an identically-sized cluster so it can handle queries in parallel. You’d think that horizontal scaling might increase costs, but there’s not always a linear relationship. There’s value in being able to finish a queued workload aggressively if you downscale immediately afterwards.

Data Caching

Snowflake will cache data within warehouses. if you have a duplicative workload, you can consolidate that within a single warehouse. In other words, you can use caching to offset some of the cost of a larger warehouse. This is another situation where it turns out that using a larger warehouse size can save you money. While you’re powering more of a workload with a single warehouse, you can cut your costs overall because the warehouse is all using the same data, thus cutting your I/O time.

Code Rewrite

No one claims to write SQL perfectly. Simple is better in terms of cost savings. In many cases, it can be more efficient to perform joins if you aggregate the output of a job into a flat data warehouse table and then query off that table.

By optimizing your query performance using these mechanisms, you’ll be able to generate cost savings, avoid anomalies, and increase the productivity of your workforce—but there’s still more you could be doing.

Five:
Create a Universal Semantic Layer™ using AtScale

Create a Universal Semantic Layer™ using AtScale

One last thing to consider is how difficult it can be to write a query in order to answer a business question. If you’re like most businesses, you don’t have enough data scientists to go around, meaning that personnel—and not data warehouses—could be one of the major bottlenecks preventing your business leaders from accessing analytics.

Instead of hiring more data scientists or purchasing larger data warehouse volumes, you can instead create a universal semantic layer. This allows consistent, easy access to data, essentially liberating the productivity of your analytics team. Data scientists and analysts no longer need to spend long hours performing ETL and data preparation. Instead, they can move right to the querying phase, arriving at actionable insights that will drive your business.

AtScale provides a full multi-dimensional engine that provides a rich business-friendly interface for users, allows for centralized data modeling, and ensures consistency for key business metrics and definitions. In addition to the power of a semantic layer, AtScale’s single point of entry delivers a one-stop data governance and security shop. You can apply your data governance and security policies at a logical or physical level while virtualizing and hiding the physical implementation of the data.

Simply put, AtScale’s Universal Semantic Layer™ provides labor-saving automation and makes your data even easier and safer to use.

 

As we said at the beginning of this guide, the potential for advantages that will allow you to truly accelerate and scale your analytics is within reach. We hope these techniques help you start realizing the ROI you want to (and will be able to) achieve through your move to the Cloud.

AtScale Product Overview

AtScale provides the premier platform for data architecture modernization. AtScale connects you to live data using one set of semantics without having to move any data. Leveraging AtScale’s Autonomous Data EngineeringTM, query performance is improved by order of magnitude. AtScale inherits native security and provides additional governance and security controls to enable self-service analytics with consistency, safety and control. AtScale’s Intelligent Data VirtualizationTM and intuitive data modeling enables access to new data sources and platforms without ETL and or needing to call in data engineering.

More Great Content

ABOUT ATSCALE

AtScale powers the analysis used by the Global 2000 to make million dollar business decisions. The company’s Intelligent Data Virtualization platform provides Cloud OLAP, Autonomous Data Engineering™ and a Universal Semantic Layer™ for fast, accurate data-driven business intelligence and machine learning analysis at scale. For more information, visit www.atscale.com.

KEEP READING & DOWNLOAD THE PDF