AI governance discussions used to center on frameworks and theoretical risk scenarios. This month’s TDWI webinar with Cal Al-Dhubaib, head of AI and data science at Further, revealed the many ways in which theory has moved to the practical. As organizations move AI pilots into production, enterprise leaders are addressing AI governance with greater rigor.
The number of AI incidents continues to rise. The AI Incident Database, an open-source project tracking major AI failures at global and enterprise scale, shows incidents accelerating year over year. These range from credit card companies issuing discriminatory credit limits to computer vision systems failing to detect certain skin tones, and more recently, business consequences from hallucinations and misinformation generated by GenAI systems.
Risk is increasing as it’s becoming easier to build AI systems, and more people are recognizing the inherent challenges of these technologies. An MIT study found that while most enterprises have experimented with general-purpose AI tools such as Gemini, Copilot, and Claude, few have successfully deployed homegrown GenAI solutions to production. The majority of executives cite concerns about accuracy, navigating regulatory compliance, and security as primary obstacles.
Perhaps most telling: an Informatica survey found that 97% of business leaders struggle to articulate the business value of GenAI.
The unique governance challenges of GenAI
Cal made an important distinction about why GenAI presents different challenges than traditional analytics. “Garbage in, garbage out” has been a data professional’s mantra for decades. But as Cal put it: “Garbage data in plus AI gives you garbage with some glitter on it.”
Most organizations have achieved reasonable trust levels in their structured data assets. But when you look at unstructured data, such as the content increasingly used in GenAI applications, TDWI research shows trust drops dramatically. Only 43% of data professionals are satisfied with the level of trust in their unstructured data. When you automate decision-making on top of that foundation, you get automated chaos.
The next logical question is: which governance frameworks actually work with AI?
The toolkit most enterprises use includes observability (asking “is there anything weird here?”), red teaming (exposing models to edge cases before deployment), guardrails (filters between users and models), and documentation standards like model cards and system cards. These matter. But at AtScale, we’ve learned from twelve years of production experience that these governance techniques only work when you have the right data foundation underneath them.
A universal semantic layer solves that problem: a logical view on top of data that acts as both a data plane and a control plane. It allows you to define business metrics once so they’re computed the same way every time, regardless of whether they’re consumed by Tableau, Excel, or any other tool. Revenue is revenue. A click is a click. An impression is an impression.
That architecture is exactly what LLMs need. The same trust problem that plagued BI at human speed now operates at machine speed, with open-ended questions that can’t be constrained to dashboards. Humans are no longer in the room to apply intuition and catch obvious errors. LLMs are confident even when they’re wrong, and they treat whatever data is furnished to them as ground truth.
The semantic layer makes the LLM’s job simpler by handling what it’s bad at (complex joins, aggregation logic, and metric computation) so it can focus on what it’s good at: understanding natural language intent and reasoning about analytical workflows. To the LLM, a semantic model looks like a single table with all your metrics and dimensions already defined. No joins to figure out. No calculations to guess at.
Testing upfront, monitoring in production, and having a response plan for inevitable mistakes: Cal is right that these require infrastructure, not just process. That infrastructure is a universal semantic layer that provides governed definitions, consistent business logic, and deterministic query execution before an LLM is involved.
Six questions that show where governance meets AI production
During the webinar, audience questions focused on specific implementation problems and the gaps between what governance frameworks promise and what actually works in production.
These questions came up repeatedly from data leaders across industries. They reveal where enterprises are stuck and what’s preventing them from advancing their AI projects.
Question 1: How is GenAI governance different from BI governance?
In the BI era, governance was often embedded in the tools themselves. When you built a dashboard, you knew what you were exposing, who your audience was, and what questions it could answer. You had a very purpose-built report with governance baked in.
GenAI is fundamentally different. When someone asks a chatbot, “Tell me why gross margin is declining,” that’s an open-ended investigation that might require five, ten, or thirty queries to answer. You can’t lock that down at the endpoint level.
Governance must now happen earlier in the stack and at query time. The system needs to know what gross margin is, whether the user is authorized to view it, and how to compute it consistently. You’re moving from a static application of governance to something that must be continuous and applied dynamically as queries are generated.
Question 2: Why do semantics matter for governance?
Data warehouses secure raw tables. But business metrics aren’t stored in raw tables; they’re derived constructs. Gross margin is made up of multiple fields across multiple tables. It’s a logical construct that requires definition, rules for calculation, and context.
Without a semantic layer, an LLM doesn’t know what gross margin is. It will attempt to traverse your data platform tables to determine it. Your warehouse security settings may provide some protection, but the LLM is still guessing how to compute the metric. You need a single place to define and secure both your raw data and your derived business logic.
That’s what a semantic layer does. It defines metrics so they’re computed the same way every time, regardless of whether the consumer is Tableau, Excel, or an LLM.
Question 3: Does a semantic layer actually improve accuracy?
AtScale tested this using TPC-DS, a standard retail benchmark schema. We compared LLM accuracy with and without a semantic layer. The results were striking.
With a semantic layer providing governed definitions and business context, we achieved 100% accuracy. Without it, the best we could do was 22.5% accuracy. The LLM was wrong 80% of the time when left to navigate the schema independently.
We see the same pattern in the market. When organizations use chatbots to traverse data directly, the results are consistently wrong. Some respond by building endless prompts to patch edge cases, essentially creating a very unmaintainable semantic layer through prompt engineering. That doesn’t scale. Data changes constantly. You can’t rely on throwing another prompt at each new edge case.
Question 4: How do semantics strengthen guardrails and explainability?
Red teaming, guardrails, and observability require a semantic contract underneath to be effective.
Without a semantic layer, red teaming might find surface-level errors, but it can’t catch business-critical reasoning errors. If gross margin isn’t formally defined, someone could compute their own version and get access to insights they shouldn’t have. Because the semantic layer sits between the LLM and the physical data, there’s no way to bypass it.
The same principle applies to observability and guardrails. You’re not just blocking words; you’re blocking bad decisions. You’re not guessing whether an answer is wrong. You can tell why it’s wrong because the semantic layer provides lineage, allowing you to trace exactly where the data came from to satisfy a question.
Explainability matters for building trust. If you can’t explain how a metric was calculated, users won’t trust it. The semantic layer makes that possible.
Question 5: Who owns governance and semantic models?
This requires shared ownership between IT and the business, typically structured as a center of excellence with a hub-and-spoke model.
IT manages the tooling, security, and platform, ensuring the semantic layer software is properly configured, that people have appropriate access, and that onboarding is completed correctly. But the business builds the semantic models itself. They understand the context and what they’re trying to achieve. It’s difficult to translate those requirements to an IT person who doesn’t understand the business.
At the same time, IT prevents semantic drift by ensuring the business doesn’t create competing definitions of the same metrics. When you manage semantic models as code with CI/CD workflows, IT can review pull requests. If marketing attempts to redefine gross margin when it already exists in finance, IT can reject the pull request and point them to the canonical definition.
The hub governs. The spokes invent. That model scales because it aligns with human motivations. The business is motivated to create these assets to solve its problems. IT is motivated to ensure those assets are supportable and won’t create governance nightmares.
Question 6: How do we set boundaries on what AI can answer?
Model Context Protocol (MCP) has become essential here. MCP enables tools to expose their data and functionality to LLMs in a standardized way, giving the LLMs more context for reasoning while maintaining governance guardrails.
The more context you provide through MCP interfaces, the better the LLM performs within appropriate boundaries. You’re defining scope at the system level rather than through prompt engineering, which doesn’t scale.
If your tooling vendors don’t provide MCP interfaces, you should demand that they do. That’s how you enable shared knowledge with LLMs while maintaining the right guardrails to produce accurate, trustworthy results.
Moving from pilot to production requires end-to-end governance
You can’t have headless agents taking autonomous action on data that isn’t governed or data that isn’t right.
End-to-end governance must be baked in before any of this works. Edge cases are endless, especially when queries are being generated at machine speed. You can’t cover those bases with manual intervention anymore.
A foundation of trusted data, governed definitions, and distributed ownership between IT and the business will make the difference between pilots that stall and production systems you can trust.
Learn how a universal semantic layer supports AI governance in practice: Schedule a demo.
SHARE
WHITEPAPER