The Centralized Data Repository: Breaking Down Data Silos

break down data silos

When it comes to effective consumption of data for business insights, there are six key capabilities: data, access, model, analyze, consume, and insights. AtScale’s Data and Analytics Maturity Model Workshop explains how organizations can build the skills and knowledge necessary to advance their capabilities in each of these areas.

In this post, covering module one, we’ll discuss how organizations can move from a fractured, siloed data environment to a centralized data platform for analytics. By breaking down data silos, enterprises will be on the path to a more proactive use of data throughout the organization.

The Complexity of Modern Data Environments

Most enterprises today lock away data behind multiple silos. When most people think of these silos, data marts and other old school data architecture approaches usually come to mind. But the modern cloud environment has made things much more complex. The modern data architecture environment includes:

  • SaaS Applications: Data is now locked into proprietary SaaS applications that have their own APIs, which makes the data difficult to access.
  • On-Premise Data Platforms: Many companies rely on legacy data warehouses like Oracle and Teradata or other on-premises data platforms like Hadoop to store data in a way that’s difficult to integrate.
  • Cloud Data Platforms: Enterprises use data warehouses and data lakes in the cloud, oftentimes multiple platforms at once, to store data, which can be difficult for end-users to access when they need it.
  • Data Consumption Tools: Organizations rely on a mix of business intelligence tools and AI/ML platforms for analytics. This often requires users to have a sophisticated understanding of data engineering and data integration to access the data and blend it from multiple sources.

The Data Lakehouse vs. Cloud Data Warehouse Approach

There are two common approaches to overcome data silos: data lakehouses and cloud data warehouses.

Data Lakehouses

One newer approach to solve the complexities of modern data environments and break down the data silos is the data lakehouse. Championed by Databricks, the data lakehouse architecture moves away from the legacy data warehouse by adding tools and engines on top of files in a data lake. 

The lakehouse is a very flexible approach because you don’t add structure to the data until it’s needed, also known as schema on read. This enables organizations to make data accessible for business intelligence, data science, and machine learning with minimal preparation and data movement.

Cloud Data Warehouses

The second approach is almost the exact opposite of the data lakehouse architecture: the cloud data warehouse. Snowflake has championed this approach, where all data is copied or loaded into the cloud data warehouse. This evolution of the legacy data warehouse enforces the data schema when writing the data, optimizing storage so that access is as fast as possible when running queries.

Organizations choose this approach because it has mature tools, it is optimized for analytics and it is often an easier migration path from traditional on-premise data warehouses.

Best Practice: Leverage a Data Lakehouse and a Cloud Data Warehouse

AtScale doesn’t recommend either a data lakehouse or cloud data warehouse, but instead suggests leveraging both approaches at the same time. Data lakes are useful as a first landing zone for data and to conduct exploratory analysis for machine learning. Cloud data warehouses are optimized for analytical queries and have all the security and access control capabilities that legacy warehouses had.

The AtScale semantic layer platform can break down the silos between cloud data warehouses, data lakes, data lakehouses, or traditional data platforms. Using a semantic layer, data consumers – whether they’re data scientists, business analysts, or even application developers – don’t have to worry about where or how the data is stored. They’ll have a business-friendly view of the data and leave the semantic layer to access the underlying data in an optimized way.

This shift from siloed data to a centralized, cloud-based approach advances an organization to the next level on our Data & Analytics Maturity Model, paving the way for a more proactive approach to data analytics.

Watch the full video module for this topic as part of our Data & Analytics Maturity Workshop Series.

GigaOm Sonar Chart - semantic layers and metrics stores