The data landscape has changed significantly in the last few years due to the increased adoption of big data, cloud data warehouses, self-serve analytics, data virtualization semantic layer, and more.
Semantic Layer Definition
A semantic layer is a business representation of data and offers a unified and consolidated view of data across an organization. With a semantic layer, different data definitions from different data sources can be quickly mapped for a unified, consistent, and single view of data for analytics and other business purposes.
Types of Semantic Layers
There are many different approaches to implementing a semantic layer. In this discussion, we are focused on semantic layers for analytics use cases — i.e. BI and AI. The term semantic layer is sometimes also used to describe knowledge graphs that support data exploration in large complex data sets.
Here’s a quick breakdown of the different approaches for implementing a semantic layer for data and analytics:
- Semantic Model Implemented in BI Tool: Traditionally, semantic models were built by dashboard creators within the BI tools they were using (e.g. PowerBI, Tableau). This approach works for data products built within the same BI instance, but rapidly breaks down as different instances or different tools are adopted by new workgroups. Inevitably, this approach leads to semantic layer sprawl with inconsistent definitions across different data products.
- Semantic Model Implemented in Data Warehouse: One of the primary goals of building centralized data warehouses was to simplify and standardize enterprise analytics. It can be tempting for data warehouse architects to think of the data assets they manage as the single source of truth for all use cases. While this may be true at some level, data assets comprised of multi-billion record fact tables with completely de-normalized table structures are not “business-ready.” Inevitably, business users will extract portions of this data into BI tools and necessarily create a localized semantic layer — again leading to semantic layer sprawl.
- Semantic Layer within Data Pipelines: With the rise of the data engineer, armed with flexible transformation tools like dbt, we have seen the rise of purpose-built data pipelines for data products supporting both BI and AI audiences. When data engineers construct data pipelines sourced from raw data assets, they are in effect embedding semantic layer logic into the code that defines their transformation. This approach can be hard to manage and ensure consistency as it scales.
- Universal Semantic Layers: The term universal semantic layer means an independent layer between raw data assets (e.g. data warehouse or lakehouse) and data consumers (e.g. BI tools or AI/ML platforms). In this case, semantic models are pre-defined views of raw data that abstract complexity and apply business-oriented definitions. Metrics like revenue or cost are defined. Dimensional hierarchies like time, product, and geography are defined. The universal semantic layer manages the translation between business users (working in BI or AI platforms) and the database.
The use of a universal semantic layer has the power to transform not only the world of DataOps but also turn all users into data-driven decision-makers.
Why Do Organizations Need a Semantic Layer?
Organizations today have technical capabilities to capture enormous amounts of data for improved operations, compliance, and analytics. In addition, globalization, regulations, competition, and other factors have pushed the need for organizations to become decentralized and become faster and nimbler.
This decentralization has resulted in increased complexities, including:
- Multiple data definitions
- Multiple data formats
- Multiple datatypes
A marketing team, for example, may refer to a business as a “prospect” by managing the leads in Salesforce. The sales team might call that same business a “client” as orders and deliveries are managed in SAP ERP, and the finance team calls the same business entity a “counter party” as the invoicing process is managed in Oracle EBS. In this complex environment, how do you get a report that aligns all three data elements to one? In the current siloed data landscape, it is not possible to get a single “Lead to Cash” report due to different data definitions originating from multiple source systems.
The solution lies in having one standard and consistent definition for this business entity where “prospect,” “client,” and “counterpart” are mapped to one data entity. With the semantic layer, different data definitions from different sources can be quickly mapped for a unified and single view of data. A semantic layer maps business data into familiar business terms to offer a unified, consolidated view of data across the organization and meet the growing analytics needs of an enterprise. The semantic layer manages the relationships between the various data attributes to create a simple and unified business view that can be used for querying and deriving insights quickly and cost-effectively.
At the same time, the semantic layer does not hold or store the actual data. The semantic layer is a metadata and abstraction layer built on the source data (eg.. data warehouse, data lake, or data mart). The metadata is defined so that the data model gets enriched and becomes simple enough for the business user to understand.
Sample Semantic Layer Use Cases
The use of a semantic layer has the power to benefit companies across industries, as organizations strive to become truly data-driven:
- Retail: We’re far past the days of strictly brick-and-mortar sales. Retailers are collecting, processing, and analyzing more data than ever thanks to the expansion of eCommerce. A universal semantic layer helps these businesses consolidate their data from disparate sources – like POS systems, customer service touch points, and online stores – to make data-driven campaigns that increase conversions and meet consumer expectations.
- Healthcare: The pandemic made it clear that all industries, even those with strict privacy regulations like healthcare, need robust digital data literacy. A semantic layer can help analysts predict when and where ailments might happen, and who will be affected by them. That helps providers know where to allocate their time and resources, improving overall patient care.
- Financial Services: In a highly regulated industry like financial services, it can be tough for businesses to see their big picture all at once. Disparate resources, restricted access, and legacy systems make it hard to access all the necessary data. Semantic layers help aggregate and contextualize this siloed data so finserv leaders can make decisions with confidence and accuracy.
How Does a Semantic Layer Platform Work?
The semantic layer is between the canonical data store and the analytics tools. It sits on top of a canonical data store like the data warehouse, data lake, or data mart and makes it easier for the business user to access data for their analytics needs with reports, dashboards, and ad-hoc queries. The semantic layer links the analytics platform with the data warehouse using the facts (data values), dimensions (data attributes), and hierarchies (taxonomies).
The semantic layer platform is integrated into the consumption platform — the analytics tools such as Power BI, Tableau, Python, Business Objects, Looker, Jupyter Notebook, and even Microsoft Excel. The queries from the business users could be in SQL, DAX, MDX, and so on using the tool-specific native protocols such as XMLA, JDBC, ODBC, SOAP, and REST interfaces. By abstracting the physical form and location of data, the semantic layer platform makes data stored in the data warehouse, data lake, or data mart accessible with one consistent and secure interface for business users.
What are the Business Benefits of a Semantic Layer?
The semantic layer maps business data into familiar business terms to offer a unified, consolidated view of data across the organization. At its core, the semantic layer offers a single standard for consuming and driving enterprise-wide analytics.
Benefits of a semantic layer include:
- Democratization of data analytics and machine learning (ML)
- Single source of truth
- Seamless model development and sharing
- Improved query performance and reduced computing costs
- Reduced data cleaning effort
- Better security and governance
1. Democratization of Data Analytics and Machine Learning (ML)
As data analytics spreads within organizations, relying on one monolithic BI (Business Intelligence) or ML (Machine Learning) platform to meet everyone’s needs becomes less realistic. A semantic layer platform is needed to connect and work with diverse data platforms, protocols, and consumption tools. This will decouple the data from consumption, thereby enabling the democratization of data analytics and ML in the enterprise.
2. Single Source of Truth
The semantic layer excels at being able to create sophisticated SQL and often multiple SQL statements in response to a very simplified set of user gestures. The semantic layer must understand how to deal with database loops, complex objects, complex sets (union, intersection), aggregate table navigation, and join shortcuts. By applying rules to define database complexity and ambiguity the generation of the SQL guarantees that if two users ask for the same information, they will get the same results – this is the most important aspect of the semantic layer, allowing an organization to define a single version of the truth.
3. Seamless Model Development and Sharing
Data scientists rely on raw and granular data for deriving insights from their models. This raw data has very little business value from the data and analytics perspective. Businesses need insights to make decisions,not raw data. Adding a data model to the raw data makes it very valuable because data models create a visual description to help the business analyze, understand, and clarify the data and associated relationships. The semantic layer, with its data modeling capabilities, enables easy authoring, sharing, and collaborating of data models and insights.
4. Improved Query Performance and Reduced Computing Costs
The limited scalability and the high costs of on-premise data warehouses are forcing companies to leverage the power of the cloud to offer enhanced scalability, flexibility, and elasticity. While cloud computing, including cloud data warehouses, offers many benefits, these benefits come at the expense of performance and costs. In today’s big data environment, a good semantic layer platform includes a comprehensive performance management system beyond simple caching techniques. At the core, the semantic layer facilitates better query performance (and time to insights) and reduced computing costs.
5. Reduced Data Cleaning Effort
Studies have shown that over 70% of the effort involved in data and analytics projects is on data cleansing. A common and consistent data definition using the governance-enabled semantic layer will ensure business analysts, data analysts, and data scientists have the same definition and context on the data. In addition, the semantic layer offers pre-built controls for managing data access, integration, and feature creation. All this will not only reduce the data cleaning efforts, but will also produce reliable insights. The semantic layer also provides a logical schema with views, stored procedures, functions, and more.
6. Better Security and Governance
As the semantic layer sits between the data platform and analytics tools, it secures the digital infrastructures with the right levels of authentication and authorization. The semantic layer can authenticate users with single sign-on solutions through Active Directory, LDAP (Lightweight Directory Access Protocol), OAuth, or any other user authentication platforms. In addition, the Semantic Layer offers RBAC (Role Based Access Control) including the ability to protect sensitive data attributes, limit data access per each user’s business roles, and more.
AtScale: Building a Universal Semantic Layer for the Future
Implementing a universal semantic layer is a great idea. But to stay great, great ideas must evolve. Today, businesses have more data flowing through their systems than ever. On the one hand, that means there’s more data available for analysis and reporting – which is great news. On the other hand, that means accurate, efficient analysis of incoming data requires more resources and firepower than ever – which can be a strain.
Future data workloads are only going to increase in volume and complexity, and that increase is going to happen at a much faster rate.
Businesses need a tool that can create abstractions of mountains of data from disparate sources, contextualize it, and glean actionable insights for data-driven decisions – and they need a tool that can do that every day. How can enterprises prepare for these rapidly approaching (and growing) needs for handling future data workloads? By building with the right kind of universal semantic layer – one that opens the gates for data literacy for any and all users.
AtScale believes the value of a modern BI platform is in addressing known shortcomings and innovating on what originally made the semantic layer great: it is a force multiplier for data consumers. Our universal semantic layer stands independent from other BI tools by:
- Centralizing governance and control, allowing you to get a big picture of all your data at once
- De-centralizing analytics consumption and data product creation, letting you contextualize what you need to know
- Granting a business-oriented view of your data, fostering a culture of self-service and data literacy
Learn more about AtScale’s universal semantic layer.