The Semantics of the Semantic Layer Part 6: Performance Optimization

Performance optimization with a semantic layer - diagram

This is the sixth blog in my blog series, The Semantics of the Semantic Layer, where I discuss the seven core capabilities of a semantic layer. In this blog, I will dive deeper into the autonomous performance optimization capabilities that make a semantic layer a “speed of thought” query serving layer.

As a reminder, the following diagram shows the seven core capabilities for a semantic layer. This blog will focus on“Performance Optimization”, highlighted in red:

Seven core capabilities of a semantic layer - diagram

For a semantic layer to function, it’s critical to deliver a live, interactive query experience to discourage users from bypassing the semantic layer by moving data into external, ungoverned caching layers.

Autonomous & Adaptive

For a semantic layer to be useful, it must be the source of truth for all queries, which means data needs to be queried ”live” regardless of where the data lives. Creating copies of data to improve performance introduces data latency and inconsistency and thereby undermines the core values of a semantic layer. 

It follows, then, that a semantic layer needs to deliver extract-level performance against the cloud data platforms natively. As such, the semantic layer must include automated performance management to deliver queries at “speed of thought” with a live connection to cloud data platforms. Since data is always growing and evolving and user query behavior is far from predictable, attempts to manually tune queries is futile.

How a Semantic Layer Learns

As illustrated above, the semantic layer platform must be capable of rewriting queries and creating aggregates on-demand using the data model, end user query patterns, data statistics and machine learning to automatically manage performance for every query.

Key Takeaway: A semantic layer must autonomously tune query performance to support interactive, live connections to data platforms. Semantic layer solutions that do not automatically manage query performance are unsuitable for supporting direct (live) queries. 

In Situ: No Data Movement Required

When working with cloud data platforms, most data analysts using BI tools like Tableau and Power BI create data extracts (i.e. Hyper) or import data into their BI tools before creating their dashboards and reports. It’s not their desire to add another data management task to their workflow, but end users must create these data copies in order to get the performance and interactivity they need when querying cloud data platforms. Besides the extra work involved, creating data copies outside of the data platform adds costs and introduces data inconsistency and security risks.

To avoid these pitfalls, a semantic layer should improve performance without moving data outside of the native cloud data platform and avoid creating and managing a separate query acceleration infrastructure for accelerating performance.

Semantic Layer Model

As illustrated above, ideally, the semantic layer should create aggregate tables, or materialized views, on-demand using a machine learning model that is informed by user query patterns and data statistics. By rewriting queries to target smaller aggregates instead of raw data, an optimized semantic layer can also substantially reduce your cloud data platform’s operating costs. 

To see just how much a semantic layer can accelerate queries and reduce costs, check out AtScale’s TPC-DS 10TB Benchmark Reports.

Key Takeaway: A semantic layer must deliver query performance at “speed of thought” with a live connection to data platforms without the need to create tool-specific extracts or imports or moving data to separate caching subsystems.

Powerful Autonomous Performance Management

Besides serving as a metrics hub, a semantic layer — powered by autonomous performance management — provides a live, interactive connection to your data without data movement. By avoiding external caching layers, data stays where it landed and queries scale with your cloud data platform. By avoiding redundant data scans, an optimized semantic layer can also substantially reduce your data platform’s operating costs. In my next post, part seven of eight, we’ll dive into the importance of the semantic layer’s analytics governance features.

In the meantime, If you are looking to skip ahead, I encourage you to read the white paper “The Semantics of the Semantic Layer”.

Power BI/Fabric Benchmarks
TPC-DS Benchmark Result Report Download Now

AtScale Developer Edition