Why AI Needs a Semantic Layer: Context Over Access

In this episode of the Data-Driven Podcast, AtScale CTO Dave Mariani sits down with Coginiti CTO Matthew Mullins to explore a growing divide in enterprise AI strategy: data access vs. data understanding. While most organizations focus on connecting AI to more data sources, the real challenge is ensuring consistent meaning across that data. Without shared definitions, governed metrics, and clear relationships, even the most advanced AI models produce inconsistent results. This conversation examines why the semantic layer is becoming foundational infrastructure for both analytics and AI.

See All Podcasts

Meet our Guests

Matthew Mullins

Chief Technology Office, Cogniti

Matthew Mullins, CTO and Co-Founder of Coginiti, focuses on enabling teams to define, reuse, and operationalize business meaning through a modern semantic intelligence platform.

Meet our Host

Dave Mariani

Chief Technology Officer, Founder, AtScale

Dave is the founder of AtScale and is the Chief Technology Officer. Prior to AtScale, he ran engineering and data at Klout and Yahoo! where he built the world’s largest multi-dimensional cube.

“LLMs don’t actually ingest your data. They reason about the language about your data.”

— Matthew Mullins, CTO, Coginiti

“The semantic layer is the map of your enterprise data, and the LLM is the self-driving car. You need both.”

— Dave Mariani, CTO & Co-Founder, AtScale

Transcript

Dave Mariani: Hi everyone and welcome to another episode of our data driven podcast. So I’m Dave Mariani and the CTO co-founder of AtScale. And today’s special guest is Matthew Mullins and Matthew is the CTO of Coginiti So Matthew, welcome to the podcast.

Matthew Mullins: Hey, thanks for having me, Dave.

Dave Mariani: Great to have you. Well, the reason why Matthew and I got in contact with each other is that there was a really well thought out post about the status of OSI, which stands for the Open Semantic Interchange, which is an open source project sponsored by Snowflake to promote a semantic layer language that can be used as a standard. And so we’ll get into that later, but first things first, Matthew, talk to me a little bit about and tell the listeners a little bit about what your path was to where you got to today.

Matthew Mullins: That was a very generous treatment of that little article that I read. So I’m looking forward to talking about that. So my route has been circuitous. It’s definitely not been straight. I sometimes look at these people have these straight career arcs. And I think that’s very novel. It’s nothing like mine. So I came out, I’ve worked in the tech sector for about 30 years, mostly in startups. I started off in a small regional ISP when the internet was still new to the public. We were connecting bulletin board systems to the internet. And so I started in a startup literally out of someone’s basement, did that for a few years, wrote out the dot com bomb by working for the federal court system, which was really my first introduction to working with data.

We implemented the electronic record system for the federal courts and I happen to be at the court where we did. So it was at the Eastern District of Arkansas doing Clinton V Jones and so trying to figure out how to do press releases of court records to the public before CD ends without crashing everything that we had. So I spent about six years working for the federal court and then I took a hiatus and went to grad school to study for a PhD in philosophy. Went to Northwestern, studied philosophy with an emphasis in cognitive science. Did a lot of work in formal semantics with people like Stephen Kaufman and did the Montague grammar and then came into data in a small kind of boutique data warehousing company called Aginity. So at that time, we were a big partner of IBM Natisa, and we were building boutique data warehouses. This is 2011 for customer 360. So we did that at Best Buy and Bass Pro and Kroger and places like that, all these big enterprise retail customers. And so we did that for a few years, pivoted into being a product company. So that was my route out of services back into being a product company. We had this product that was a rapid data warehouse builder. And the idea was a lot of it was around how to make analytics modular and how to make it very repeatable and how to make that a very repeatable process for analysts so they could come in and build marketing campaigns and things like that. We’d built audience builder for exact target as like a custom software development thing. So we brought some of that idea into the product that we’re building. And it was very tied to at the time the IBM Natisa. At that time, we’re kind of like, this is pre-Hadoop, you know, kind of a two horse race between like Teradata and IBM in the MPP space. So we spent a few years doing that. And then we started to replatform. So this is close to the time that some of my early introduction to AtScale. So we came out with our next generation product, which was we had this idea that we could have this semantic layer where users could connect their tools. We did a lot of work with things like Pardot and things for sending campaigns, but we also want to connect BI. And so we used to have these slides — we called them the hamburger slides, right? So there’s the data platform and there was all your tools and we were like the meat in the middle that you could connect through. And so that was my first introduction when AtScale first launched. You probably didn’t know who we were, but we knew who AtScale was because you guys were the first to launch on the Universal semantic layer. And I think that our thinking was a little bit different at the time, but conceptually what we were trying to achieve was very similar. But unfortunately, we ran out of runway while you guys were pushing that rock uphill for everybody else.

So we ended up going through a transition. The original engineering team and some of the IP went into a new company called Coginiti. And so that’s how I ended up here today. And you know, this is now 12 years later from the start of leaving grad school and coming back into data. And all the things that I thought where I’d studied semantics and cognitive systems — that I thought maybe didn’t apply as much to data — have now become very relevant again in the age of AI and where we’re sitting today. So that’s kind of a long story, but not direct.

Dave Mariani: It’s a great background. So we have a lot of similarities other than our love of semantic layers. My first job was starting with a consulting firm when we transitioned into a product company. And that was very early days. Also, you’re obviously a practitioner when it comes to helping companies actually build out their data estate. So similar backgrounds there. It’s always great to have that first layer of experience in actually building data products. So when we go and build software to help people build data products, we actually have some empathy for what it takes to get there. So you definitely felt the pain. And it really helps when you talk to customers and prospects because they can feel that empathy and they know that you’ve suffered the pain. And that was really the inspiration for starting AtScale for me — somebody needed to solve this problem because I felt a lot of pain not having a semantic layer and having to service all these masters who needed data to do analytics. So similar inspiration. That was a great story.

Matthew Mullins: Yeah, I remember very early on, some of the inspiration was like we went into a customer, a large retail customer, and we did a bit of analysis and they had 29 different definitions of customer across their organization. And then they wonder why they have inconsistent analytics. Right. And it’s like, well, you’ve defined this all over the place — different teams have defined it different ways, they’ve got multiple different BI tools where they’ve defined those in different ways. And so you just get this drift all over the organization and there’s no way you can keep three different BI tools in sync. And so that was a real serious problem. It’s one that I hope the OSI initiative is going to help with as well.

Dave Mariani: Yeah, I mean, look, I had the same experience at Yahoo. Multiple BI tools — we had everything, one of everything. We had ClickView, we had MicroStrategy, we had Tableau. And for us it was clicks and impressions. We had multiple definitions of a click and an impression. And so it made it impossible to compare homepage to sports to Yahoo Finance because they all had different metrics for measuring their ad performance and also just who was looking at what. So you mentioned OSI. The way we got acquainted with each other was when OSI was announced — we had of course worked in the past 13, 14 years on a semantic layer platform. And in doing so, about four or five years ago, we really saw that there was more of a code first approach that was needed in terms of creating a semantic layer.

When we started, it was all about UIs and giving the business analysts a drag and drop approach to being able to create those semantic models. And if you can’t create a semantic model, there is no semantic layer platform — there’s no value there. So making it easy to create semantic models is really the keys to the kingdom to be successful. Otherwise, nothing can happen. And so we introduced SML, Semantic Modeling Language. We open sourced it. And we really hoped that by open sourcing it, we could build some groundswell in terms of industry adoption. Then Snowflake — Josh Klahr, who was AtScale’s founding head of product and is now at Snowflake — sponsored OSI. And there was just a tremendous groundswell. There’s obviously a lot of need, a lot of interest in defining an open semantic standard. That’s clear. I’ve been on calls with 70 people amongst 50 different companies all wanting to contribute and adopt this thing. But there were some things — some constructive criticism I think you wrote in your note, Matthew. So talk to us a little bit about how you see OSI and how do you think OSI could be better?

Matthew Mullins: Yeah, so we saw the announcement from Snowflake and we were very excited about that as an initiative. And Snowflake had this ability to pull in some of their Snowflake venture partners as well as early participants.

We’re small — we sell mostly into highly regulated, highly secure environments. Our customers include like Navy or cybersecurity command — you may not know us, but our customers know us. We’re not as well known in the Snowflake community, although we have some customers. And so we saw the announcement, we were excited about it. We didn’t know at the time — is this a marketing thing? Is it going to be real? That was kind of the big open question for us. But we were excited. We went and signed up, because we had some customers that had asked us about it as well. And I thought that it was an important initiative because if you’ve been in the data space for half a minute, then you’re familiar with inconsistent standards. The SQL standard is a standard that follows what platforms actually implement, right? Which is why there’s such diversity — there’s not really a SQL standard. There’s a standard around the ANSI core and then there’s all these extensions to it. Or I think about the way that Java-based platforms implemented JDBC, where Oracle put out a standard and then just kind of left it up to everyone to do whatever. And so it was very inconsistently implemented. We struggle with driver implementations all over the place. So you approach the announcement of a new standard with some uncertainty about how that’s going to go.

Because a lot of times either they just follow the industry and are just stamping what the big players have done, or one person’s put it out and a bunch of other people were forced to adopt it. And so we kind of wanted to get in on the ground floor because it’s important. I do think that the stated aims of OSI are very important — that how we define meaning across our data layer is incredibly important to an organization and we don’t want that locked into a vendor. We want to be able to move that across platforms and smoothly move it between tools. If an organization has three different BI tools, can they easily move those definitions between those tools? Can we easily import things from Power BI into Tableau? Users should have the ability to use the tool of their preference — if a particular user experience makes them more productive, can they stick with the tools they have but share that meaning across tools?

So we thought that was very important. We submitted the form. We didn’t hear anything back. And so we were kind of locked on the outside. We were like, well, maybe Snowflake needs to invest in us. Snowflake, we’re still open to that.

So we followed what was coming out on the blogs and noticed that most of the content coming out was on the Snowflake marketing blog as a press release. You’re trying to find out like — who owns this, who are the participants? I do a lot of things with Apache software foundation projects, and you can always see the chair, the committers, the contributors in the repo, the issues people are opening. And there was just no way to see any of that on the OSI project. I contacted someone at Snowflake that I knew and they were like, I can introduce you, I can tell you who the guy is. So it was one person. I did eventually get in touch with Josh. And then they announced they were going to have a GitHub repo — you could go in the discussions — and we went to the GitHub repo and it was basically five people from Snowflake. One of them is an attorney. So then we contributed some of the discussions, but I was like, I don’t know what to do. So I wrote the Substack piece saying these are the things I would like to see. Like I would like to see OSI in a foundation, right — not owned by a single organization. I think Snowflake was great in that they have the institutional power to bring a lot of people together, to draw in AtScale, ThoughtSpot, Cube and all these other players. But I also think that having a home that has open governance standards is also really important. And I know in talking to Josh after I wrote the piece, that is part of their plan — to put it into a foundation — because it gives it that open governance and it gives people the trust that it’s not under the direction of one company. And I think that will help drive more visibility around the governance and how decisions are made.

Dave Mariani: Yeah, and I think that’s a big piece that’s lacking. I know that’s always been the plan — to contribute to ASF, to the Apache Foundation — so a lot of that is just laying the groundwork. But Snowflake is not necessarily known for open source chops, right? So there’s a learning curve you can see in action there.

So when it comes to OSI, Matthew, there’s even a really fundamental discussion — is OSI an interchange format? So it’s just a format to basically transfer from one semantic layer platform to another, including BI tools. Or is it a language standard for defining a semantic model? What’s your opinion on that?

Matthew Mullins: Well, my hope is that it’s an interchange format because we’ve already got a standard for defining. You guys have your SML. We already have existing standards in our tools for defining that. So my hope is that it’s an interchange format. I’ve seen conversations that blur that. And I can imagine new tools could emerge in the market that adopt something like the standard just out of the gate. I think the important part is — can we account for all the semantic objects that different tool vendors might provide? There’s all the standard things like dimensions and hierarchies and domains and things like that, that you want to capture. But I was excited to see that they’re also catching things like synonyms. I’m on the fence about calling that “AI context” — I would like it to be a little less tied directly to AI. But the idea that you can have synonyms is great.

Dave Mariani: Yeah, to me it’s just context. The fact that it could be leveraged by AI is just another use case.

Matthew Mullins: Right. And the use case for people that don’t know is like — if you only have “orders” in your system and someone wants to ask a question about “sales,” sales is a synonym for orders. And so an individual or a model can pick up and say they meant orders by that. I think being able to capture all that is great. But then you see people want to have language-specific things in there, and that’ll be the trick. But I do hope that it’s an interchange.

Dave Mariani: Well, I think there is a lot of interest in it. A lot of people want to see it succeed. When I talk to customers and prospects, they’re all very much behind this effort — no customer wants to be locked into a single system and have all their IP locked in. They want portability. And so it’s a really important project from that perspective. But let’s talk about why people are even interested in semantics. You and I have been at it for a long time now. Primarily the chief consumer of semantic layers has been business intelligence tools. But obviously when OpenAI launched ChatGPT, the world changed in so many ways, and it became pretty evident to us that the semantic layer is the bridge — the deterministic bridge — for enterprises and LLMs and those frontier models. And so we’re seeing a lot of interest in being that adapter for enterprise AI and LLMs. Are you seeing the same thing, Matthew, in terms of interest in investing in semantic layers now because of AI?

Matthew Mullins: We are. That’s a primary driver because it’s the consumption layer for that. It’s the guardrails. It prevents the hallucinations. It gives the standard naming, the relations, the standard way to do the measures. You know, I don’t know if you were following Gartner Data and Analytics last week —

Dave Mariani: I was there.

Matthew Mullins: Major, major endorsement from Gartner. I mean, Gartner got on stage and said by 2030 there are three key components: cybersecurity, your data platform, and your universal semantic layer. That reminded me of a few years ago when they said data catalogs are the new black. And Gartner laid out the roadmap for justifying the budget — that it’s a key capability every CDO needs budget for, the universal semantic layer, to cover their data estate for agents. I think that was a very clear signal to the entire industry. It’s not a nice-to-have. MCP — model context protocol — is not going to solve these kinds of problems. You’re not going to be able to do this directly against the data.

When I look at the way that models work — models don’t actually ingest your data, right? They’re language models. They reason about the language about your data. They reason about the column names and the table names and things like that, and they can generate SQL, but they need that strong semantic content. They need the context and descriptions. They need the defined relationships to know how to join that data, and then they can be really powerful tools. The semantic layer is a key component of that. And I think it’s just going to drive the market larger. I saw that you guys made a really key hire on the basis of this from DBOS, which I thought was really exciting — we’re going to get so much pressure. And we had this internal discussion about whether the semantic layer is a system or a platform — it’s firmly a platform. People are going to build tools on top of the semantic layer. You’re going to see more agents driving on top of it. It’s going to be that key layer of context. The semantic layer is a semantic graph — it has edges and nodes, the relationships are join relationships. You can represent it like a graph and the models know how to traverse that.

Dave Mariani: Yeah, I try to say this all the time — we actually compile our semantic layer into a knowledge graph and that becomes the center point for our query planner. And I think a lot of people don’t realize it’s not just about the metadata. Data catalogs are out there and they’re kind of nice glossaries and nice documentation, but a semantic layer is operational — it’s actually taking a logical query and turning it into a physical query. And that’s where the determinism comes into play. So it’s really the semantic query engine with that semantic metadata that are the keys to the kingdom to make those LLMs do their jobs.

Matthew Mullins: Yeah. I saw one catalog company tried to rebrand as “semantic intelligence.” And yeah.

Dave Mariani: Yeah, there are so many posers now. Look, we’ve been working on Gartner for years to tell them the semantic layer is really important and they’re like, yeah, yeah, yeah. Well, they finally listened. But then everybody else is now — just like everybody slammed AI into their name or value prop as a modifier — now everybody has a semantic layer. And it’s like, I’m sorry, but you didn’t have a semantic layer two months ago. You don’t have a semantic layer now. So let’s get real.

Matthew Mullins: And it’s the wrong kind. The metadata is interesting and it’s important for observability, traceability, and lineage and things like that. But it’s not the right semantics about your business. And the thing I oftentimes talk to businesses about is — this is not a thing that LLMs can solve for you. A model can be an accelerator sometimes, but your humans know your business. This is key for analysts because the semantic layer sits in that space between the business and the technical. And you need the people that understand the business to define what’s going to be in the semantic layer — what is your business naming, what are these KPIs, how do we measure those things. And that doesn’t come out of your pipelines.

Dave Mariani: Yeah, I always use the analogy of self-driving cars — that’s the LLM, right? There’s amazing smarts there, but without a map, without a mapping interface, they’re useless. And so the semantic layer is the map. It’s the map of your enterprise data, and the LLM is the self-driving car. You combine those together and you have an amazing piece of technology — but one without the other, you can’t do it. You can’t do conversational BI. You can’t let agents roam freely on your data without some kind of business meaning and context.

Matthew Mullins: And the car can’t build the map.

Dave Mariani: And the car can’t build the map. Thank you, Matthew. That’s a really good way to add on to that analogy. I love it. So we only have a couple of minutes left, but a lot of the really interesting chatter — there were even sessions on this at the Gartner Conference — was about the future of BI with this new context of AI. There were even debates between two of the analysts about whether AI would subsume BI or whether it would make BI better and they would continue to coexist. So Matthew, what’s your opinion about the future of business intelligence in this new landscape?

Matthew Mullins: I mean, if you think that business intelligence is dashboards, then that era is probably a bit dead. If you think that business intelligence is about making business decisions and being able to come in and get answers, I think that’s going to explode. The AI for BI — the ability to come in and talk to an agent, have it generate graphics for you — I mean, when you have the defined semantic layer, an individual can come in and do that natural language query and ask “what were fourth quarter sales for our northeast region, can you give me that in a scatter plot, can you change the colors to black and gold” — and they can iterate over that using natural language and get exactly what they want. And a lot of times those things are just things they need to drop into a PowerPoint or use for a report or they’re trying to answer an ad hoc question in the moment. And those are the kinds of things that never need to live in dashboards to begin with. So I think we’ll still have some dashboards for key metrics that are defined across the organization — and I think most orgs probably need just a handful of dashboards for key things they measure. But most of it is going to be ad hoc AI for BI. And these agents have the ability to generate those charts, do all the colorways and things like that, and when they’re operating against a semantic layer, they can get an excellent degree of accuracy.

Dave Mariani: Yeah, I’ve heard that as a good example — dashboards are great for common metrics you need every day, so rather than having a thousand people ask the same question, you can have a dashboard that answers that question once for everyone. But when it comes to exploratory analysis, that’s where LLMs and their creativity really shine. What I’ve started to do is just ask open-ended questions to Claude and say something like “why are my sales trends changing?” — just like that. And it will go and come up with the questions it wants to ask of the semantic layer, get the right answers because the semantic layer is running and navigating the raw schemas, and then it can analyze and ask deeper questions. And it can do that much faster than I ever could as a human.

Matthew Mullins: Sometimes after I’ve been in a session, I’ll also ask “what questions could I be asking that I’m not?”

Dave Mariani: There you go, that’s another great one. It’s really a new world. And I think as a business analyst it’s kind of scary, but it just means that rather than you dragging and dropping to the end of time to try to find some nugget of value, you can use tools like LLMs to actually navigate that data for you and find value and answer a true business question — versus trying to drag and drop and create charts and dashboards at the end of time. So exploratory analysis — I think it’s a game changer.

Matthew Mullins: Yeah, I mean, it puts you much more closely in service to the business than to the data. The point of this was never the data — it was to serve the business, right? That’s the end. We’re trying to serve the outcomes of the business. And I think it just allows analysts to go deeper on that, and they get increasingly closer to the business in that way.

Dave Mariani: So Matthew, it’s been a great discussion. Thank you for your thoughtful piece on Substack. What is your Substack so people can go and check it out?

Matthew Mullins: It’s Field Notes on Data.

Dave Mariani: Field Notes on Data — check it out. You’re a great writer and a great technologist. Thanks for joining in on the semantic layer revolution here and being a pioneer. We appreciate it.

Matthew Mullins: Thanks for having me on. I really enjoyed the opportunity to talk.

Dave Mariani: Likewise. All right, everybody. This has been another episode of AtScale’s Data Driven podcast. I want to thank Matthew Mullins for participating — and you guys go out there and be data driven. Thanks for listening.