March 6, 2019TECH TALK: AtScale 6.0 brings Universal Semantic Layer Benefits to Google Cloud
There are six key areas for effectively consuming data for business insights: storing the data, having access to the data, modeling the data, analyzing it, consuming it, and generating insights from it. For organizations looking to advance in each of these areas, AtScale’s Data and Analytics Maturity Model Workshop explains how teams can build their skills and knowledge.
In this blog post covering module six of the workshop, we’ll discuss data modeling within business intelligence (BI) tools as well as AI and machine learning (AI/ML) tools. We’ll also explain the challenges with consumption layer modeling, the different data modeling approaches, and the benefits of dimensional modeling using a semantic layer.
Challenges with Data Modeling in BI Tools
Most of the data modeling at today’s enterprises takes place in the consumption layer within the BI tools themselves. For example, Tableau can connect to a data source like a Snowflake data warehouse and so that users can view all the tables and schemas from the raw data.
However, this approach is too advanced for many users because they would need database credentials and would need to understand what these tables and views mean before performing any reporting or analysis. There’s a lot of data preparation required and a need to understand how the data is structured within a data warehouse.
Challenges with Data Modeling in AI/ML Tools
Similarly, AI/ML tools like DataRobot enable users to connect to data sources and see different lookups, splits, transformations, and more. Just like with data modeling in BI tools, there’s a high level of sophistication needed when using AI/ML tools to connect to the data source, understand how the data is stored, and determine what the data means from a business perspective. A lot of knowledge needs to be in place before users can begin any data analysis.
The Drawbacks of Consumption Layer Modeling
As with data modeling in BI and AI/ML tools, there are drawbacks to modeling data directly within the consumption layer, including:
- It’s Hard: Modeling data directly within the consumption layer requires advanced knowledge of SQL and databases to understand raw data structures. With a high level of skill needed, fewer data consumers will have enough expertise to generate insights from the data.
- It Wastes Time: Data models at the consumption layer aren’t typically shared, so each user will need to reinvent the wheel for a new set of analysis. This means that data consumers will need to spend more of their time on data preparation instead of analyzing the data and generating insights.
- It Creates Inconsistency: Because the data models and calculations are defined by different users within each tool, there’s likely to be differences in terminology and query results, making it difficult to have consistent and trusted data insights across the organization.
Data Modeling Approaches
Three of the most common approaches to data modeling include: tabular, logical, and dimensional.
The tabular approach means accessing data using the physical data sources. With this approach, data consumers need to understand how to work with the raw data directly within the data warehouse or data lake. The primary interface in a tabular approach is a physical data table or view, which would need to be related to the business for analysis.
The logical approach relies on data virtualization to give data consumers a logical view of the data rather than physical access. Through data virtualization, data consumers can have access to data that’s more up-to-date and has undergone additional processing to improve its usefulness without any data movement.
The dimensional approach combines the benefits of data virtualization – or direct access to data without moving it – with a dimensional semantic layer. This means there are measures, dimensions, hierarchies, and business-friendly constructs that eliminate the need for data consumers to understand SQL or have any advanced data engineering skills.
Best Practice: Leverage a Universal Semantic Layer
Along with investing in a cloud data platform to scale-out data processing, implementing a semantic layer for dimensional data modeling can help you deliver data that’s useful to everyone in the organization – not just data engineers.
By modeling data once at the semantic layer rather than the consumption layer, you can also eliminate inconsistencies across different teams and consumption tools. In addition, connecting BI and AI/ML tools to the semantic layer can help promote self-service data analytics while ensuring there are still guardrails around the access and use of data.
In short, a semantic layer can help data consumers be productive faster, without having to understand the physical structure of the data. The semantic approach also future-proofs your data architecture, so you can introduce new data sources or platforms more easily and make the most of your data going forward.
Watch the full video module for this topic as part of our Data & Analytics Maturity Workshop Series.