How Semantic Layers Make GenAI 3X More Accurate Than Direct SQL

How Semantic Layers Make GenAI 3X More Accurate Than Direct SQL

This blog was written by Mark Palmer. In addition to being an AtScale Board Member, Mark Palmer is a seasoned entrepreneur and technologist. With extensive advisory experience, Palmer offers strategic insights to startups and Fortune 500 companies alike, helping them leverage emerging technologies and navigate market complexities. As the founder and CEO of Techno-Sapien, Palmer continues to drive innovation and empower businesses to thrive in the ever-evolving tech landscape. This article was originally published on Techno Sapien.

A semantic data layer might be the secret to more effective enterprise GenAI applications. Here’s what they are, why they matter, how they help, and why my favorite, AtScale, rocks 🙂

New research shows that a semantic layer can make GenAI answers 3X more accurate than direct SQL database queries (16% to 54%)1 You may wonder: What the heck is a semantic layer, and how do they improve GenAI accuracy?

A semantic layer provides a bridge between the language of business and the language of data. But that’s a pretty abstract notion. Let’s explore what a semantic layer is in practice, how they provide business value, and how they’re a new, essential part of the emerging GenAI data fabric.

A Semantic Layer and GenAI

A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model’s Accuracy for Question Answering on Enterprise SQL Databases. Juan F. Sequeda, Dean Allemang, Bryon Jacob.

Semantics are a bridge to business meaning

As a young computer science graduate, one of my first job titles was “Semantics and Syntax Manager” for Fidelity. At first, I didn’t even understand my title, let alone the difference between data syntax and semantics.

Essentially, I was the liaison between traders and software developers. When the traders wanted an app built, my job was understanding what they meant (semantics) and what data they needed (syntax). I was like a librarian that helps you find just the right book in the library.

For a computer scientist, the syntax was the “easy” part. For example, “CU837291 0, 9, G” represents an individual customer—a two-letter prefix, six digits, and three codes. This syntax is stored in a database and used by apps.

Semantics is tricky. The word comes from the Greek “sēmantikos,” which means “of giving signs and markers.” So data semantics is the art of understanding what data actually means. For example,

  • 0 represents marketing contact status (for example, 0 might mean “no contact.”)
  • 9 represents the health rating for the account (for example, 9/10 NPS)
  • G represents status level (G means “Gold”)

I think of a semantic layer as the bridge between human beings and computers.

How a semantic layer forms a syntax-to-semantics bridge

Mapping syntax to semantics was so crucial at Fidelity that they hired a full-time person to do the job — me. Today, technology makes it easier, more scalable, and more convenient to build that bridge.
Semantic layer tools provide graphical interfaces, APIs, and tools that help make it easy to catalog, translate, and query data semantics and its syntax.

The state of the art is to store meaning as a graph—a network of concepts and their relationships with each other. A semantic layer for an insurance company might have a map of customers (Policy Holders), their Policy, the Agent who sold the Policy, the Coverage that the Policy Holder has, and the Premiums they pay (below).

Store meaning in a graph

A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model’s Accuracy for Question Answering on Enterprise SQL Databases. Juan F. Sequeda, Dean Allemang, Bryon Jacob.

Analysts and programmers can use this semantic graph to answer questions like, “What should the payout for claim XYZ be?” or “Who sold this customer their policy, and what does it cover?”

Without a semantic layer, developers waste their time guessing at — and making mistakes trying to guess — about the meaning of data. For example, analysts waste 80% of their time because, without a semantic layer, they dive right into the syntax of data, which is really hard to decipher:

Semantic layers hide that syntactic complexity. Application developers, analysts, and even an LLM can query the meaning of data instead of getting bogged down in the details of its encoding.

Why a semantic layer makes GenAI more accurate

When an LLM generates predictions, understanding the semantics of data helps it generate more accurate answers. Dataworld researchers benchmarked the difference between semantic-first and syntax-first approaches.

They tested a range of queries with varying complexity, like:

  • What is the average time to settle a claim by policy number?
  • What is the total loss of each policy where loss is the sum of loss payment, Loss Reserve, Expense Payment, Expense Reserve
  • What are the loss payment, Loss Reserve, Expense Payment, Expense Reserve Amount by Claim Number
  • Return all claims by claim number, open date, and close date.

The team expressed these queries in SQL and semantic layers and measured the accuracy of the results.

Question Answering System for Knowledge Graph

A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model’s Accuracy for Question Answering on Enterprise SQL Databases. Juan F. Sequeda, Dean Allemang, Bryon Jacob.

They found that GenAI answers are three times more accurate when you use a semantic layer (or knowledge graph). They measured the accuracy of GPT-4 by directly querying an SQL database and found an accuracy of 16%. With a semantic representation of the same data, accuracy was 54%.

Choosing a semantic layer is confusing
A practical challenge in choosing a semantic layer is that there are so many ways to express meaning. On one end of the spectrum is to hire a person — a librarian. But there are dozens of tools and options to help make it easier to manage meaning, including,

  1. Semantic Layer
  2. Knowledge Graph
  3. Data Catalog
  4. Master Data Management (MDM)
  5. Metadata Management
  6. Data Virtualization
  7. Embedded in Business Intelligence. Most modern business intelligence tools have dedicated data catalogs that help business analysts explore available data, including Microsoft PowerBI, Spotfire, Business Objects, Tableau, and Looker.

Software vendors have created a labyrinth of options, including, in alphabetical order,

  • Alation (Data Catalog, Metadata)
  • Ataccama ONE (Data Quality, Metadata, Data Governance)
  • AtScale (Semantic Layer, Data Virtualization)
  • AWS Glue (Data Catalog, Data Preparation, ETL, )
  • Collibra (Metadata, Data Quality, Data Governance)
  • dbt (Data Transformation, Semantic Layer)
  • Dremio (Data Lakehouse, Semantic Layer)
  • Data Virtuality (Data Virtualization, Data Catalog)
  • Denodo (Semantic Layer, Data Virtualization, Data Catalog)
  • Informatica (Multidomain MDM, MDM, Data Catalog)
  • Microsoft Fabric and Purview (Data Virtualization, Semantic Layer)
  • Riversand MDMCenter (MDM, Metadata Management)
  • Semarchy xDM (MDM, Metadata Management)
  • TIBCO EBX & Data Virtualization (Data Virtualization, MDM, Data Catalog)

Yikes! So many options to manage semantics!

The most elegant approach to semantic layers I’ve seen: AtScale

Although there are many ways to implement a semantic layer, I’m thrilled to have recently joined the board of AtScale, a leader in semantic layers. Why AtScale?

For me, it starts with customers. For years, technology leaders have told me, “AtScale does it right.” After getting to know the product and team, I have to say I agree: it’s the most elegant solution I’ve seen. They’ve earned the right to make CEO Chris Lynch’s claim that AtScale is THE semantic layer leader including customers like,

  • Wayfair and its AtScale semantic layer that gives hundreds of business analysts live access to cloud data to understand their online business and customers better.
  • Toyota North America’s AtScale semantic layer allows its 35+ constituent North American companies to reduce infrastructure costs and find insights 21 times faster than before.
  • Tyson Foods’ self-service analytics semantic layer that helps its 144,000 employees respond quickly to changes in the market, understand their supply chain, and democratize access to data in Google BigQuery, Hadoop, and Amazon Web Services (AWS) Redshift.

Then there’s AtScale’s approach. They aimed to make it easy for business analysts and programmers to work together to express data semantics. This is an essential part of a solution that works — when you’re building a bridge, it must be accessible to all.

The next generation of semantic layers and generative AI

AtScale’s next-generation platform is a remarkable piece of engineering, including:

  • Generative AI and LLMs support acting as a metadata hub and language for enhanced reliability and accuracy.
  • A container-based architecture for new deployment options via Kubernetes or Docker and seamless integration with Snowflake, Databricks, and Google BigQuery.
  • Enterprise integration and authentication through open-source packaging for Keycloak, OpenTelemetry OpenAPI, and KeyGen.
  • dbt Metrics Translator that allows dbt semantic models to integrate seamlessly with AtScale’s live query support for Tableau, Power BI, and Excel.
  • Semantic Integrated Development Environment (IDE) with no-code and code-first data modeling within the same user experience, YAML-based modeling, and Git for full CI/CD support.

If you haven’t checked out AtScale recently, check out their next-generation semantic layer details here.

AtScale Design Center

AtScale Semantic Layer Platform

The rising importance of semantic layers for GenAI

Semantic layers significantly enhance the accuracy and efficiency of Generative AI (GenAI) models. They bridge raw data and its business context, allowing GenAI models to interpret and generate more accurate responses.

Moreover, the semantic layer mitigates misinterpretation and hallucinations by providing a structured and contextual understanding of data. This ensures that GenAI models can deliver reliable and relevant insights.

So join the semantic layer party on your journey to more effective software development, data analytics, and innovation with generative AI!

  1. A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model’s Accuracy for Question Answering on Enterprise SQL Databases. Juan F. Sequeda, Dean Allemang, Bryon Jacob:
Power BI/Fabric Benchmarks
TPC-DS Benchmark Result Report Download Now