May 12, 2022The Semantics of the Semantic Layer Part 4: Data Preparation
This is the third blog in my blog series, The Semantics of the Semantic Layer, where I describe the seven core capabilities of a semantic layer. In this blog, I will dive deeper into the semantic data model that serves as the critical underpinning of a business-friendly view of data. The data model is the canvas that a subject matter expert (SME) paints on to create a digital map of the business for everyone in the organization.
As a reminder, the following diagram shows the six core capabilities for a semantic layer. This blog will focus on “Semantic Modeling”, highlighted in red:
For a semantic layer to function, it must map the physical data objects to the logical business constructs, creating a digital twin of the business while serving as the graph-based query planner and optimizer.
Write Once, Re-Use Many
The days of centrally managed, monolithic data pipelines are over. Data moves and changes too fast for a single team to keep up with the demands of the business. At the other end of the spectrum, business users creating their own data pipelines, also proved problematic. After all, business users aren’t data engineers, but they are business domain experts.
New approaches, like the data mesh or a hub and spoke model, seek to create a modern, distributed architecture for analytical data management that alleviates the traditional bottlenecks while putting business definitions in the hands of business domain experts.
The illustrations below show an example of how domain-oriented data ownership works using a data mesh approach to analytics data management.
The semantic data model is a key component for delivering this decentralized strategy for analytics data management because it gives the domain experts the environment to create their data products and share them across the organization. With a semantic data modeling platform that promotes object-oriented model definitions, reusability and sharing, business domain experts or data engineers can free themselves of manual data engineering tasks and combine models and components to create new analytics products. To support sharing and reuse, a semantic data platform must support the role-based security and sharing of model components for creating data products from multiple data domains.
Key Takeaway: A semantic data model must support a hub and spoke style for creating data products using an object-oriented data modeling language that allows subject matter experts to own and share data model components across teams.
Server-Side Data Blending
Data was already siloed before the cloud revolution, but with the proliferation of cloud data lakes, cloud data warehouses and SaaS applications, data lives in more places than ever before. A web of proprietary and incompatible data APIs makes the matter even worse for users looking to create data products.
A semantic layer platform can break down these data silos and abstract away the location and format of the data. At the same time, modelers can create mashups of the data that span multiple locales and platforms to create new, composite data products that can serve as the digital twin of the business. By creating these logical views server-side instead of in the consumption tools, these blended data views can be shared across multiple user personas, whether it’s business analysts, data scientists, or application developers.
Key Takeaway: A semantic data model must break down data silos by blending data sources on the server side to create rich, composite views across multiple business domains.
Graphical & CI/CD Friendly
The people who understand their business domain and the data that feeds it are usually the best authors of the semantic data model. By leveraging their knowledge of the physical (how the data is stored and structured) and the logical (business terms and calculations), they can create data products that are consumable by a wide range of downstream users.
The modeler persona is not always a business analyst, though. Sometimes, a data engineer is the best person to own a particular data domain and thereby own the semantic data model. In order to support both types of model authors, business users and data engineers, a data modeling platform must support both visual and code-based model definitions. In order to support a decentralized analytics data management style, CI/CD is critical to supporting a scalable workflow to enable multiple data domain owners and model contributors.
Key Takeaway: A semantic later platform should support both code-based and graphical data modeling to allow engineers and non-engineers alike to build and collaborate on data models and data modeling components.
The Power of the Semantic Model
Besides serving as a metrics hub, a semantic layer — powered by a semantic data model — creates a digital twin of your business. By harnessing the power of the subject matter expert, the semantic data model serves as the rosetta stone for the enterprise, providing a business-friendly access layer to data for everyone. In my next post, part four of eight, we’ll dive into the importance of data preparation virtualization.
In the meantime, if you are looking to skip ahead, I encourage you to read the white paper, The Semantics of the Semantic Layer.