August 26, 2019Intelligent Data Virtualization: The Silver Bullet in Next-Gen Big Data Governance
As we covered in the first three parts of our data mesh series, a successful data mesh approach empowers domain experts to build their own data products with centrally-governed building blocks. And to enable this model, organizations need to facilitate the transformation of analytics-ready data into business-ready data. Then, they need to make this data standardized, sharable, and reusable for a variety of different data products — like a decision support dashboard in a BI tool like PowerBI or Tableau, an exploratory analysis in Excel or Python, an AI/ML model, etc.
But there’s another piece to the data mesh puzzle: analytics governance. Governance strikes a balance by empowering self-service data product creation while maintaining a consistent level of security and quality. After all, there are correct and incorrect ways to interact with data. And a data mesh approach focuses on empowering non-data experts to interact with data correctly. Because of this, many users need a way to answer questions like, “am I using the right data?” or “is the data I’m using quality data?” Governance plays this role, empowering all data users with varied levels of expertise to confidently build and maintain high-quality data products, and leading to other innovations such as self-service BI. Analytics governance falls into a few different categories:
- Governance of Data Sources
- Governance of Metrics
- Governance of Dimensions
- Governance of Access Control
- Governance of Cloud Resource Consumption and Analytics Performance
Governance of Data Sources
Data source governance means setting standards for your application data, 3rd party data, and data lineage. It sets clear rules for how this data gets used, modified, and stored within your organization. This entails cleaning up duplicate/unnecessary data and giving each data source a detailed metadata description that’s visible to users.
Many organizations achieve data governance by implementing active governance, meaning that policies get baked into everyday workflows. As an example, active governance makes data lineage metadata accessible from each team’s day-to-day business applications. This accessibility empowers team members to consume data with trust as they get their various jobs done. By putting individual data users in control, active data governance builds a community of business experts committed to data literacy. This method of data source governance also ensures that all data is high-quality, trusted, and compliant from the start. It’s more effective than the passive approaches of the past, which focused on ingesting a high volume of data, then applying rigid business rules to it retroactively.
The semantic layer facilitates active data source governance. It empowers business owners with the right data domain context, enabling them to continuously improve data definitions in an agile and flexible way.
Governance of Metrics
The governance of metrics — whether simple like revenue, cost, and quantities or complex like Annual Recurring Revenue (ARR) and Customer Acquisition Cost (CAC) — ensures a single version of the truth. Consistent metrics serve as the connecting language between the analytics layer and business: the everyday, concrete terms used to explain business-essential concepts across the organization.
When an organization doesn’t set up standard definitions for metrics, each business unit will inevitably create separate definitions and calculation logic for the same metrics. This discrepancy happens because different BI and analytics tools with different calculation pipelines get adopted across decentralized teams.
The problem of inconsistent metrics explains why we’ve seen the rise of metric stores in recent years. Businesses see the need for a stand-alone layer that sits between data warehouses and downstream data consumers, ensuring consistency in metrics usage. A semantic layer creates the foundation for a metrics store by laying out a “common language” and definition for all metrics to follow.
Governance of Dimensions
Dimension governance also plays an important role in facilitating composability for a successful data mesh approach. This means standardizing conformed analysis dimensions such as time, geography, and product. These dimensions serve as the building blocks for data products and the common medium for blending disparate data sets into usable products.
Similarly to metrics governance, the semantic layer can also provide a “common language” for different domain experts to use. These conformed dimensions serve as the connective tissue for stitching together different data sources: the heart of a true data mesh approach.
Governance of Access Control
Access governance is a security function as well as a tool for facilitating quality data products. It ensures that only the appropriate users get granted access to a critical asset. As the medium between data sources and stores, the semantic layer is the natural place to enforce access control policies, ensuring that the right user has access to the right data assets at the right time. It should provide row and column-level security and it integrates with the source data access controls.
Governance of Cloud Resource Consumption and Analytics Performance
Most (if not all) enterprises are rushing to move data to the cloud. But with all of its benefits, the cloud also brings new challenges. Most recently, one of the biggest hurdles to cloud growth is a push for cost governance. Because cloud storage gets priced as a service, rather than bought once as a physical asset (as with on-premise storage), mounting cloud costs can catch an organization off-guard.
A new study uncovered that 81% of IT leaders have been directed to reduce or take on no additional cloud spending. So, it only makes sense that financial governance comes into play, alongside other forms of analytics governance.
Because all analytics consumption naturally passes through the semantic layer, it can provide greater visibility into usage and can be a platform for implementing usage controls. A semantic layer can also optimize consumption. A universal, stand-alone layer can monitor all of the analytics queries, detect patterns in the most commonly-asked business questions, then create aggregated data for these questions by keeping the data at its source. Then if a similar question gets asked, the semantic engine leverages the aggregated data, improving the query performance and reducing execution time. This added efficiency saves time and resources, leading to overall lower cloud costs for the entire organization.
The Power of an Analytics Governance to Support Data Mesh
Each level of analytics governance — data sources, metrics, dimensions, and cloud resource consumption/analytics performance — plays a significant role in enabling an effective data mesh. These guardrails lead to a “building block assembly line” of sorts, empowering users across the entire organization to create high-quality data products. And a well-built data mesh approach sets the stage for a variety of new innovations, from fully-realized data democratization, to cost and resource optimization, to stronger data-driven decision-making, and so much more.
Learn more about how the semantic layer facilitates analytics governance for data mesh success. Check out our white paper, “The Principles of Data Mesh and How a Semantic Layer Brings Data Mesh to Life”.