3 Things About Data Virtualization You Might Not Know

3 Things About Data Virtualization You Might Not Know

Data virtualization is not new. It was born in the naughts when data was still small by today’s standards. Unlike today, most data was stored in relational databases from a few vendors like Oracle, Microsoft and IBM. Oh my, how times have changed! Data is coming from everywhere and everything, and businesses now realize that data is their lifeblood and it can’t be thrown away. On top of that, we now have data platforms for every type of data whether it’s on premises or in the cloud. It’s no surprise that enterprises are now looking for ways to manage this chaos at scale.

Data virtualization is now seen as a critical strategy for managing this new data volatility and variety. In fact, Gartner says that “by 2022, 60% of all organizations will implement data virtualization as one key delivery style in their data integration architecture.” (Source: Gartner Market Guide for Data Virtualization)”. In this article, we’ll look at some of the misconceptions about data virtualization and highlight how intelligent data virtualization is changing the game.

#1: Data Virtualization is Not Query Federation

Database platforms have included varying degrees of query federation for some time now.  Query federation allows a single SQL query to combine data from more than one data platform using remote database connections to external data platforms. There’s a few problems with this approach. First, users need to understand the local and remote database schemas in order to join the data. Second, queries can’t scale if the remote database has a large data set because the network can’t support the data movement. Third, there’s no semantic layer, so users need to understand SQL, primary and foreign, keys etc. to make it work. In contrast, data virtualization solves these problems by presenting a single consolidating view that hides the underlying data platform location and complexity while managing data flows to minimize data movement.

#2: Data Virtualization Works for Analytics

When first introduced, data virtualization focused on small data sets and operational, not analytical, use cases – for example, combining data from monitoring dashboard A with ticketing dashboard B. Why? Analytical style queries require lots of I/O because they tend to aggregate and join data in multiple, large tables. In recent times, data virtualization vendors like AtScale have solved the data movement problem by pushing the work down to the source data platforms and performing large scale aggregations and joins in the database, not over the network. Now users can have their cake and eat it too: query big data, in multiple data platforms, at scale and at the speed of thought.  

#3: Data Virtualization Solves Your Data Governance Woes 

Because the data virtualization platform acts as the single entry point for all queries and presents a unified semantic model, it’s an excellent place to enforce data governance. Rather than implement varying and potentially conflicting data access rules in a myriad of databases, data virtualization can serve as data governance central to drastically simplify data access rules for the enterprise. Combined with a data catalog like Collibra or Alation, a data virtualization platform becomes the “enforcer in chief” for the data catalog’s data governance policies.

As you can see, data virtualization has come a long way.  Enterprises should consider data virtualization instead of or in addition to traditional, ETL style data integration. Data virtualization can drastically simplify your data infrastructure, improve your agility to respond to the business’ needs and protect your data by governing its access regardless of the query source.

For more information:

How Data Governance and a Semantic Layer Support Data Mesh