DATA VIRTUALIZATION VS. DATA WAREHOUSE

What Is Intelligent Data Virtualization And Why Do You Need It

*Editor’s note: This is an update to the “Why You Need An Advanced Analytics Fabric ” post we published a while back. This update expands on the challenges of a traditional data warehouse and outlines the 10 requirements of intelligent data virtualization.*

The movement of analytical workloads to cloud data warehouses such as Google BigQuery, Snowflake, Amazon Redshift, and Microsoft Azure SQL Data Warehouse is a trend that will only continue to grow. The performance and cost benefits of these cloud providers compared to legacy data warehouses like Teradata or distributed systems like Hadoop are too great for decision-makers at the world’s largest enterprises to ignore. However, cloud transformation is a challenging multi-year process for even the most nimble enterprises, and there are doubts about whether all data will live in the cloud in the foreseeable future.

Hybrid cloud environments, meaning a combination of cloud and on-premise data warehouses, are the likely corporate standard for data management over the next decade.

While the hybrid cloud represents an improvement from on-premise data environments, it has its challenges:

  • It can still be slow, cumbersome, and expensive to maintain both traditional on-premise data warehouses and multiple online data sources.
  • Migrating to new data repositories can be disruptive for business users who have time-critical analyses to conduct.
  • Curation and management of siloed data is a drain on resources, and the proliferation of BI tools, with each department and business unit nurturing local favorites with different interface requirements, make getting consistent and accurate answers to questions across groups challenging.
  • With data in multiple locations and stored in multiple formats, end users are forced to know where to get the data they need, translate it into a common format and combine it together across locations and data platforms.

The solution? What businesses need is a solution that intelligently virtualizes all their siloed data into a single, unified data view from which a variety of BI tools can get fast, consistent answers. AtScale’s A3 solves the critical challenges facing companies with large data warehouses looking to move to the cloud.

Intelligent Data Virtualization Defined

Intelligent data virtualization is a collection of data from a wide variety of sources that is used to drive business decisions. Unlike a physical data warehouse, intelligent data virtualization does not require data to be stored in a single location. It intelligently virtualizes an organization’s siloed data into a single, unified data view from which a variety of BI tools can get fast, consistent answers. Data is queried in its native form “as is” but appears as a unified data warehouse to users.

The birth of the relational data warehouse in the eighties drove an enormous amount of innovation and revolutionized the way businesses managed data. Making data available to users in one place with a standard query interface (SQL) was a game-changer. The ability to correlate and blend data from different physical systems created new opportunities to improve business operations and create value. Virtualizing the data warehouse is the next step to enabling enterprises to leverage their data with more agility and scale.

The Challenges of a Traditional Data Warehouse

Housing enterprise data in a single location is definitely preferred by users. The ability to join data sets that live in different physical locations without programming is a huge time saver. However, while running analytics and data pipelines, moving large amounts of raw data into a single data warehouse presents the following challenges:

  • Data is too big and too volatile to process and load in one day.
  • Data is not relational (log files) so storing it in a relational database was unnatural and required lots of manual ETL engineering.
  • Users needed access to granular, atomic data but the data could not be stored at this level due to cost and size constraints.
  • Users needed access to new types of data and analytics but it required weeks or months to make it available to them.
  • Users wanted to use a variety of tools to access data (Tableau, Qlik, Microstrategy, Excel, custom applications) but data warehouses aren’t always suitable requiring data to be moved out of the data warehouse for each individual tool.

There has to be a better way. Physically moving data into a single, monolithic database wasn’t the answer. In fact, it is moving data that is the real evil. Once you move data, everything falls apart:

  • Engineers were needed to create complex data pipelines for moving and transforming data.
  • Security and control of data is lost (or it has to be redefined) once it left the source system.
  • To incorporate new data sources takes weeks or months due to the pipeline re-engineering and database modeling required.
  • Data requires pre-aggregation to make it “fit” into a single, monolithic warehouse so users lost data fidelity and freedom to explore atomic data.

Why Intelligent Data Virtualization is a Better Solution

The solution: virtualize the data. In other words, take all the good attributes of a physical data warehouse (common interface, one location, multiple sources, curated, secure) but __leave the data where it landed – don’t move it into one database or data platform. __

Virtualized data will:

  • Virtualize all data sources so they can be presented as a single, unified data view to maximize their value to BI tools.
  • Provide a universal interface for BI tools so that departments and business units can get consistent answers to the same question, even when using different BI tools.
  • Accelerate queries, reduce costs, and protect sensitive data from unauthorized access
  • Improve BI performance on any tool. With the proliferation of excellent BI tools, each department and business unit may have its own favorite. According to Forbes, 60% to 70% of business functions utilize two or more BI tools. Rather than forcing them onto a single BI solution, which would require retraining and almost certainly suffer from user adoption issues, companies need a solution that can enable a variety of BI tools to connect to their data.

The 10 Requirements of Intelligent Data Virtualization

While it may sound obvious, creating a Universal Semantic Layer on top of your physical data sources and making it work at scale is challenging. For an adaptive analytics fabric to work, the following capabilities are required:

  1. No data consolidation required. Data is queried in its native data platform.
  2. Compatible with BI or AI/ML tools. Provides both tabular (SQL) and multi-dimensional (OLAP) interfaces.
  3. No manual data engineering. Data transformations and calculations are applied at query time.
  4. Scales with the data platforms. Delivers interactive query performance as much as 100x against data as large as billions of rows and as small as hundreds of rows without loading data into a separate system.
  5. Blends data across multiple data sources. Data can be combined at query time across multiple physical data sources.
  6. Auto-tuned. Query performance is optimized and managed automatically by analyzing user behavior.
  7. Secure at rest and in flight. Inherits the data platforms’ native security & may include additional runtime controls.
  8. Centrally governed. User access and controls are applied at runtime consistently regardless of the tools or applications that access it.
  9. Open & extensible. The semantic layer, controls & metadata can be machine created and updated as new data sources are added.
  10. Lightweight footprint. Doesn’t require a large number of servers or expensive (i.e. RAM) or exotic (i.e. GPU) hardware.

How AtScale Can Help

AtScale’s A3 can help you to:

  • Seamlessly migrate to the cloud. Avoid business disruption and port analytical workloads without rewriting them.
  • Simplify your analytics infrastructure. Use the best tool and platform for the job without moving data or adding new data stores.
  • Modernize and future proof your analytics stack. Take advantage of data lakes and the new Cloud data warehouses and be prepared for future platforms.
  • Secure and govern data in one place. With a live connection to your data, no more worries about data traveling the world on user’s laptops.
  • Turbo-charge your analytics and machine learning. Instantly integrate new data sources and deliver a single, super-fast API for all your data.
  • See all of your organization’s data in a single, unified view, no matter where it is stored or how it is formatted.
  • Conduct interactive and multidimensional analyses using your preferred BI tools, whether that is Excel, PowerBI, Tableau, or something else.
  • Get consistent answers across departments and business units via AtScale’sUniversal Semantic Layer that standardizes queries regardless of BI tool or query language.

Intelligent Data Virtualization is the new, modern paradigm for data management and what will drive the next wave in analytics and AI innovations. By not moving data, enterprises can put all their data to work without making the traditional trade-offs and compromises required by the traditional approaches to data warehousing.

Photo by Marc-Olivier Jodoin on Unsplash

Start Building with the Developer Edition

Build and share semantic models with the community