Editor’s note: This is an update to the "Why You Need a Virtual Data Warehouse" post we published a while back. This update expands on the challenges of a traditional data warehouse and outlines the 10 requirements of a Virtual Data Warehouse.
The movement of analytical workloads to cloud data warehouses such as Google BigQuery, Snowflake, Amazon Redshift, and Microsoft Azure SQL Data Warehouse is a trend that will only continue to grow. The performance and cost benefits of these cloud providers compared to legacy data warehouses like Teradata or distributed systems like Hadoop are too great for decision makers at the world's largest enterprises to ignore. However, cloud transformation is a challenging multi-year process for even the most nimble enterprises, and there are doubts about whether all data will live in the cloud in the foreseeable future.
Hybrid cloud environments, meaning a combination of cloud and on-premise data warehouses, are the likely corporate standard for data management over the next decade.
While the hybrid cloud represents an improvement from on-premise data environments, it has its challenges:
- It can still be slow, cumbersome, and expensive to maintain both traditional on-premise data warehouses and multiple online data sources.
- Migrating to new data repositories can be disruptive for business users who have time-critical analyses to conduct.
- Curation and management of siloed data is a drain on resources, and the proliferation of BI tools, with each department and business unit nurturing local favorites with different interface requirements, make getting consistent and accurate answers to questions across groups challenging.
- With data in multiple locations and stored in multiple formats, end users are forced to know where to get the data they need, translate it into a common format and combine it together across locations and data platforms.
The solution? What businesses need is a solution that intelligently virtualizes all their siloed data into a single, unified data view from which a variety of BI tools can get fast, consistent answers. AtScale's Virtual Data Warehousing solves the critical challenges facing companies with large data warehouses looking to move to the cloud.
The Virtual Data Warehouse Defined
A virtual data warehouse is a collection of data from a wide variety of sources that is used to drive business decisions. Unlike a physical data warehouse, a virtual data warehouse does not require data to be stored in a single location. It intelligently virtualizes all of an organization’s siloed data into a single, unified data view from which a variety of BI tools can get fast, consistent answers. Data is queried in its native form “as is” but appears as a unified data warehouse to users.
The birth of the relational data warehouse in the eighties drove an enormous amount of innovation and revolutionized the way businesses managed data. Making data available to users in one place with a standard query interface (SQL) was a game changer. The ability to correlate and blend data from different physical systems created new opportunities to improve business operations and create value. Virtualizing the data warehouse is the next step to enabling enterprises to leverage their data with more agility and scale.
The Challenges of a Traditional Data Warehouse
Housing enterprise data in a single location is definitely preferred by users. The ability to join data sets that live in different physical locations without programming is a huge time saver. However, while running analytics and data pipelines, moving large amounts of raw data into a single data warehouse presents the following challenges:
- Data is too big and too volatile to process and load in one day.
- Data is not relational (log files) so storing it in a relational database was unnatural and required lots of manual ETL engineering.
- Users needed access to granular, atomic data but the data could not be stored at this level due to cost and size constraints.
- Users needed access to new types of data and analytics but it required weeks or months to make it available to them.
- Users wanted to use a variety of tools to access data (Tableau, Qlik, Microstrategy, Excel, custom applications) but data warehouses aren’t always suitable requiring data to be moved out of the data warehouse for each individual tool.
There has to be a better way. Physically moving data into a single, monolithic database wasn’t the answer. In fact, it is moving data that is the real evil. Once you move data, everything falls apart:
- Engineers were needed to create complex data pipelines for moving and transforming data.
- Security and control of data is lost (or it has to be redefined) once it left the source system.
- To incorporate new data sources takes weeks or months due to the pipeline re-engineering and database modeling required.
- Data requires pre-aggregation to make it “fit” into a single, monolithic warehouse so users lost data fidelity and freedom to explore atomic data.
Why a Virtual Data Warehouse is a Better Solution
The solution: virtualize the data warehouse. In other words, take all the good attributes of a physical data warehouse (common interface, one location, multiple sources, curated, secure) but leave the data where it landed - don’t move it into one database or data platform.
A virtual data warehouse will:
- Virtualize all data sources so they can be presented as a single, unified data view to maximize their value to BI tools.
- Provide a universal interface for BI tools so that departments and business units can get consistent answers to the same question, even when using different BI tools.
- Accelerate queries, reduce costs, and protect sensitive data from unauthorized access
- Improve BI performance on any tool. With the proliferation of excellent BI tools, each department and business unit may have its own favorite. According to Forbes, 60% to 70% of business functions utilize two or more BI tools. Rather than forcing them onto a single BI solution, which would require retraining and almost certainly suffer from user adoption issues, companies need a solution that can enable a variety of BI tools to connect to their data.
The 10 Requirements for a Virtual Data Warehouse
While it may sound obvious, creating a unified semantic layer on top of your physical data sources and making it work at scale is challenging. For a virtual data warehouse to work, the following capabilities are required:
- No data movement. Data is queried in its native data platform.
- Compatible with any BI or ML tool. Provides both tabular (SQL) and multi-dimensional (OLAP) interfaces.
- No manual data engineering. Data transformations and calculations are applied at query time.
- Scales with the data platforms. Delivers queries in under 5 seconds against data, large (billions of rows) and small (hundreds of rows) without loading data into a separate system.
- Blends data across multiple data sources. Data can be combined at query time across multiple physical data sources.
- Auto-tuned. Query performance is optimized and managed automatically by analyzing user behavior.
- Secure at rest and in flight. Inherits the data platforms’ native security & may include additional runtime controls.
- Centrally governed. User access and controls are applied at runtime consistently regardless of the tools or applications that access it.
- Programmable. The semantic layer, controls & metadata can be machine created and updated.
- Lightweight footprint. Doesn’t require a large number of servers or expensive (i.e. RAM) or exotic (i.e. GPU) hardware.
How AtScale Can Help
The AtScale Virtual Data Warehouse can help you to:
- Seamlessly migrate to the cloud. Avoid business disruption and port analytical workloads without rewriting them.
- Simplify your analytics infrastructure. Use the best tool and platform for the job without moving data or adding new data stores.
- Modernize and future proof your analytics stack. Take advantage of data lakes and the new Cloud data warehouses and be prepared for future platforms.
- Secure and govern data in one place. With a live connection to your data, no more worries about data traveling the world on user’s laptops.
- Turbo charge your analytics and machine learning. Instantly integrate new data sources and deliver a single, super fast API for all your data.
- See all of your organization’s data in a single, unified view, no matter where it is stored or how it is formatted.
- Conduct interactive and multidimensional analyses using your preferred BI tools, whether that is Excel, PowerBI, Tableau, or something else.
- Get consistent answers across departments and business units via AtScale’sUniversal Semantic Layer that standardizes queries regardless of BI tool or query language.
The virtual data warehouse is the new, modern paradigm for data management and what will drive the next wave in analytics and AI innovations. By not moving data, enterprises can put all their data to work without making the traditional trade offs and compromises required by the traditional approaches to data warehousing.
Photo by Marc-Olivier Jodoin on Unsplash