AI Frontier: Agentic Analytics on Semantic Layers

In this episode of AtScale’s Data-Driven Podcast, host Dave Mariani, Co-Founder & CTO at AtScale, sits down with Jens Kröhnert, Principal Solution Architect at Oraylis, to explore the AI Frontier: the next generation of analytics where autonomous agents operate on governed semantic layers to deliver insights without hallucinations or drift.

–> Key Takeaway: The semantic layer isn’t just infrastructure for better BI. It’s the foundation that enables AI agents to operate autonomously without hallucinating, drifting, or violating governance policies. Independent benchmarks show LLMs paired with semantic layers achieve 4-5x better accuracy than raw LLMs. Learn how AtScale’s Universal Semantic Layer and Model Context Protocol are making agentic AI enterprise-ready.

See All Podcasts

Meet our Guest

Jens Kröhnert

Principal Solution Architect, ORAYLIS

Meet our Host

Dave Mariani

Chief Technology Officer, Founder, AtScale

Dave is the founder of AtScale and is the Chief Technology Officer. Prior to AtScale, he ran engineering and data at Klout and Yahoo! where he built the world’s largest multi-dimensional cube.

On why semantic layers matter more for AI than for people

"Machines don't have the context that the human does in their head… a semantic layer is even more important for a machine."

— Dave Mariani, AtScale

On the danger of unguided AI

"IIf you don't have the right context, the AI will guess and will hallucinate and potentially drive into a different direction than you wanted it to."

- Jens Kröhnert, Principal Solution Architect at Oraylis

Transcript

Dave Mariani
Hi everyone and welcome to another AtScale Data Driven Podcast. So today’s topic is the next generation of empowering AI and bots. So we’re gonna talk about moving beyond traditional BI, but we’re not gonna leave it behind. So we’re gonna talk about how we got here and I have a really special guest who follows a very similar path that I did and we sort of cross paths over the years.

And so I want to welcome Jens Kronert, who is a principal solution and architect at Oralist. So Jens, so welcome to the podcast.

Jens Kröhnert
correct

Jens Kröhnert
It’s such a pleasure to have me Dave, it’s really a pleasure. I’ve been following you since lots of years and as you said, our paths cross sometimes. So it’s really an honor to be on your podcast.

Dave Mariani
I really appreciate that, Jens. you know, look, we’re sort of working in the same area. so, and what we’ve, as we sort of talked online and talked in person, there’s some very clear trends that we both sort of share. And so we want to share with the listeners here and not just in my normal podcast where we do a lot of talking, we don’t do any slides or demos, but I think it might be worth it just to

talk a little bit about those trends and show you for real why this is such an important sort of new development when it comes to Gen.ai with semantic layers and the like. So, Jens, let’s just start with share with the listeners your background and talk a little bit about your career in BI and the analytics space and sort of where we sort of cross paths.

Jens Kröhnert
Thank you for the possibility. Yes, I started with studying engineering. I’m a mechanical engineer with a diploma in automation. And that topic of automation is the meta topic that follows me up to today. And I started after that only as an IT consultant and worked my way all this deck up. In the beginning, I deployed operational systems like production planning systems, ERP systems.

and enhance them with mobile solutions. called it at that time real-time enterprise solutions. then one day I did read a book of Ray Kurzweil and I was so fascinated about the future of AI that I thought maybe I should get some of that knowledge and dive into that topic. And that was when 20 years ago I had changed companies and I started at Rayless. It was a great development when I started here. We were 10 people. Now we are about 160.

And we grew with the topic of Microsoft Business Intelligence. It started all with the SQL Server 2005, where Microsoft did put in everything you need to build a full stack business intelligence system without that much license costs as others did. And so this was the path we followed. We became one of the most important Microsoft partners for that topic in Germany and grew with that topic. At first, it was small companies that tried it out.

And after some time, we reached enterprise level. I deployed that classical Microsoft BI stack based on SQL Server about 10 years myself in the field, using it full stack. I know you worked with that stuff too. We used the full stack like integration services, connecting different source systems, using a staging DB at that it was called that way. Then used a dimensional data warehouse, still relational.

And SQL Server had a quite fancy feature. It’s called Analyze Services, which serves special cubes to multi-dimensional cubes. And we used that for semantic layer at that point in time. And after 10 years in the fields, I switched roles. I went to a more strategic role and did build up the technical pre-sales and did go into the innovation lead. And it was the right time. Innovation exploded at that time.

This was also the time when I first got a glimpse of who Dave Mariani was, because it was the time that the backend scaled out. Suddenly it was not only the very famous SQL server, but there were open source decks like Hadoop, and I visited my first Hadoop conferences. Then there was a shiny example of one of the biggest clusters in the world, and someone who was brave enough to build a 20 terabyte analyzer service cluster on top of that.

And I guess that was you Dave.

Dave Mariani
Yeah, so I think Jens is referring back to my work at Yahoo. like Jens, it’s like, look, at Yahoo, I was really looking for running analytics, was looking for a way to really introduce a semantic layer so that my business partners could self-serve. I knew that a multi-dimensional model, especially with fast queries, where you could pivot any way you want.

was going to be popular and boy was it ever. mean, people, the business users loved it. And so, and of course at Yahoo, we invented Hadoop. So that was my, know, another team that actually was innovating there. But somehow I had to go from the Hadoop cluster, which is where all the data lived and somehow make that feed and analysis services cube.

Which again, is right. It’s like it was about 23 terabytes and it was about, it was one, that was one quarter, one calendar quarter of data. So three months of data was 23 terabytes. So while it was popular, man, it was really hard to scale. And that was sort of the inspiration for founding AtScale was I know what works and what business users want. They want that multi-dimensional engine, that ease of use.

but we needed to find a way to scale it in this data lake and big data world. So I think we ran into each other at the Microsoft conference where I think I presented that white paper about analysis services on Hadoop.

Jens Kröhnert
Yeah, and we have been quite far out. We have not been that far out, but we also had very big enterprises as customers. We reached one digit terabyte numbers with Analyzer Service, but also did find out it is difficult to scale with it old legacy technology. And so for me, your vision was very clear. And that was also for me to be the innovation lead for a company. We switched our back end to something that scales. It’s totally logical.

to find a semantic layer that also scales and this is exactly where ETSCale went for and this was also the time 10 years ago when I did write my first article, I guess you did read it, when I also already mentioned that I think ETSCale could be the future for that scale out semantic layer technology.

Dave Mariani
Yeah, so I think we both saw how important a semantic layer was. We both used off-the-shelf parts to sort of try to achieve that. like we discussed, it’s like magic happens when you give users easy to access, speedy, and semantically consistent data and let them use the tools like Excel and…

and Tableau or custom applications in our case at Yahoo to be able to explore it. But know, we’ve all sort of started in that dashboardy space. And I think with Analysis Services, we definitely encouraged ad hoc analysis, which was something that was harder to do at the time because a lot of the tools required really SQL expertise.

And even Tableau sort of allowed you to do that sort of exploration, but you still needed to model your data. And with the semantic layer powered by analysis services in that case, it allowed those users to skip that sort of modeling exercise where they may not have been, you know, had the sort of skill sets to do that. But we sort of moved beyond that now, haven’t we, from dashboards. And now we’re really sort of creeping into the

the descriptive, not just descriptive, but the predictive kind of realm for analytics.

Jens Kröhnert
Yeah, that’s totally true. The scene has changed. AI is on the scene. people think of fantastic features about AI. And it is possible. But if you don’t give AI the context that it interprets the data correctly, it does anything. It does hallucinate. And I find it quite funny, to be honest.

that now for AI the spotlight comes very heavily on semantic layer, while we over 10 years tried to explain how important it is also for humans. And for humans the budgets weren’t that big to implement a semantic layer, but now that there’s AI people see that there’s a need and now the budgets are rising to implement semantic layers. And this is also because why we always talk again.

We’ll come to that later. We also have an automation tool to deploy all those data into iPlatforms. We currently can generate all the layers except the semantic layer. And this is why I was very happy to see that you go some kind of open source way, defining a semantic modeling language, which could be the bridge for us for our Databricks-based lake houses to also generate a semantic layer, metadata-driven. I think this would be the way to go.

because in the old classical systems every cube is handicraft. If you have one measure like margin you have to implement it in several cubes several times. In the Microsoft world there is also AI that can connect to a cube but it can also only connect to one cube. Now they have a variance and they choose which cube they use without the authorization of the user.

quite random what comes back and I think you have all the answers with your product for that.

Dave Mariani
Well, we’re trying to get there, but you covered a lot of ground there. so it’s really important that you mentioned about context. With humans, course, a semantic layer is much easier for a human to interact with. But as you can imagine, we’re now asking machines to interact with raw data. And a semantic layer, like you mentioned, is even more important for a machine.

because machines don’t have the context that the human does in their head. It’s just not there. We’re still pretty good at some things. And I think that’s probably having the context and memory as a human working with data, we could probably still do better than an AI bot can do without a semantic layer today. So as you mentioned, it’s like really important for these AI agents to work is to have a semantic layer that makes it easy for them to understand.

the idiosyncrasies about the business, the business specific terminology, but also using the semantic engine to make it really easy for them to get deterministic answers. So let the semantic layer do all the nasty joins and table relationships and many to many relationships and calculations and just let the agent do the analysis part of it. So that’s really important. And so you called that out.

And then you moved on to about, what about the semantic models themselves? They still need to be built to enable this capability. And I think that that’s really the real next frontier. And I want to give you some time to talk about the AI frontier, because it’s a really interesting concept that you’ve pioneered. And it’s about now not just being able to use a semantic layer to query your data with an agent.

But how do we actually build those semantic models themselves using those same agents, those same LLMs, so we can actually take the human effort out of the equation or at least lessen the involvement to actually create the semantic model? Because you can’t get value unless you have a semantic model. So we have to solve the problem of constructing those semantic models.

Jens Kröhnert
Absolutely. In that time now where AI can help us nearly anywhere, we really have to define what are the stakes we want to have committed to have a conformed view, not only within a group of people, but also with the extended group of people and agents that have to work together. You can generate lots of stuff today, but at some point there has to be a human in the loop. They have to define

What is our understanding of the world? What are our goals we have to follow? What are our guardrails? When we let AI lose, what are the ethical guardrails they should follow? All that stuff is essential for working together with AI if you still want to be in control. And this is why the semantic layer is a crucially essential part. Nearly everything else, I would say, is mostly automatable.

At that point, a human has to say, what is the context? What is my ontology? What are my most important measures? Which are my goals? I want to lower or rise. And so you have the right goals and the right context for the eye to operate. If you don’t have that, the AI will guess and will hallucinate and put

Dave Mariani
We don’t, we do not want it to guess.

Jens Kröhnert
and potentially drive into a different direction that you wanted it to be. Potentially, if you don’t have a semantic layer where you have all the actions that are being taken respected in that, you potentially even don’t see that something is happening that is in your idea.

Dave Mariani
Yeah, Jens, know, it’s the thing about these LLMs, that they want to please you and they sound so confident when they give you an answer. So it’s really misleading because that answer could be completely wrong. And not just can it be completely wrong all the time, it could be completely wrong some of the times, which is even more scary because the answer can change from prompt to prompt, even the same prompt based on the fact that these models are probabilistic.

So can you talk a little bit about, Jens, you’ve written about the AI frontier and maybe if you could talk a little bit about that and share that with our listeners and feel free to use some slides or diagrams. We don’t normally do this in the podcast, but today we need to show and tell because this is too important to leave to talking alone.

Jens Kröhnert
to have the chance. will share a slide with you.

Jens Kröhnert
This should be the right slide. Yeah. Where does this name come from, AI Frontier? I didn’t invent it. It’s from Microsoft Research. And as I said, I’m the innovation head at Aurelis. We are working very close to Microsoft or partner. And we also have another shared partner, Databricks. But in this case, this AI Frontier model was invented by Microsoft.

in the context of this AI innovation base. comes from Microsoft Research as a Microsoft partner we very early were included into these ideas and the left side of this slide the high level overview of what that AI frontier operational model is that comes from Microsoft Research. It consists of three phases. Plan, develop, operate.

And you see develop and operate again in a square, which means here humans don’t operate anymore. It can be developed by AI, it can be operated by AI. You have to do a plan as a human and then you should govern and monitor what is happening by automation. So this is a very high level understanding, but I guess everyone who has played around with AI

knows that you can develop software just with Vibe coding. That would be, for example, the develop part based on metadata we give him. We want to have a website that does X and Y. Vibe coding can do that with AI today. And of course, if you define some measures, FinOps agent can operate this solution. And so the human is a

about to plan such things and then govern and monitor and potentially optimize and mitigate those solutions. So this is the high level idea which has a very big impact for companies like us because with those records like develop and operate we used to earn money. So we have to adapt to that. We have a quite good basis for this because we already have worked on automation.

for developing data on the AI platforms, which is our core competencies. I will have another slide with that. So it’s more and more important to have a good plan. Here still humans, we still see humans in the lead because I guess for long time humans will be heuristically and commercially responsible for companies.

I don’t see that changing in very few times. So it’s a human who does the plan. in early days, it was only some word documents. But now we go down to the level of defining metadata. What is your context of your company? What entities do you have in your company? What’s the hierarchical aggregation of those entities? What are your goals? Do you want to maximize sales, or do you want to minimize costs?

or all that stuff should be laid out in the plan to a level of metadata. And if you have that fine-grained metadata, we get close to the situation that you just push a button and based on that metadata, we can develop such a platform. We can already develop it. Let’s just say there’s a big discussion in the market about names of layers, but just say there are four layers, whatever they are named.

We can already develop three layers based on the source systems and the plans. What we are currently thinking of is developing also the semantic layer, the last layer. So we would have that complete. And here also our combination of our two tools could play an essential role, especially for data rigs based lake houses. And then we would have the chance to do the complete plan, push a button, have the complete.

generation and development of the platform and then operate it. And this leads to the idea that in the end those agents will be in the lead. for our customers the right side is what we thought further. How could that be implemented with our customers? Of course not with the Big Bang, but you start with the first iteration and reach a milestone of an MVP and humans can train with this system.

The second big milestone would be that the agents you build to do some functions like maximize sales, minimize costs or whatever have such a good quality that they can run in a shadow mode. But humans are still the execution people. in the end, when we have such a good quality of the agents, people can give them the driver seat and the humans go into the backseat and just control those agents. So this would be the meta.

view of this AI frontier model and how we see it that could be implemented. building up such data and AI platforms are quite complex. Here you see our four names of layers. There’s a big discussion what the naming is. As I said, in early days, it was stage and data warehouse. We just give names which have

an indication of what this layer is doing. And the first layer from our perspective has the core idea of replication. was Bill Inman who initially invented the idea not to have any report go directly to operating systems. You had him on your podcast lately. I have seen it. And the father of Data Warehouse. It’s a great honor to be the next one to talk about that. So I will mention him as the father of Data Warehouse here.

But he was not the only one. After that came Mr. Kimball and said, why do we have to implement the business logic redundantly in any report? Just store the business logic in a dimensional way. And this is why we invented the name business layer. So here, the business logic and the view from the business people is implemented with namings that business people use. And of course, on top of that comes a semantic layer, which is

often also another technology, like in-memory technology or stuff like that. after some time, there was also a third guy who invented something, Mr. Linstedt, with Data Vault to have a long-term storage and have it very good historically persisted so that you can go into any changes historically. So this is how we name it, replicationary technology, a business layer and serving layer.

The first three layers we can already generate, we are working on generating also SML now for the serving layer that we could connect then at scale for the serving layer for Databricks-based leg houses. The tool that we invented is called Datamate. You see it on the right. It’s an open source tool. We work with automation since the early days of BI. The first guy who had that idea not to write any SQL by hand was Mr. Kimble.

He deployed once in a day an Excel sheet where you had for every table an Excel sheet where you put in the metadata. And then there was some slides where you could download them, the SQL for the tables. And that was the basic idea. We developed with this automation tool for about 10 years. And two years ago, we followed the wish.

of our customers who said, with what tool did you run generators? That we want to have that too. So we said, OK, let’s go fully open source. And it’s called Datamate. You can download it at GitHub. And it works that way. We couldn’t put an adapter in the source systems like SQL Server or Oracle or stuff like that. And then we don’t have to write the metadata. We can generate the basic metadata. What source tables do we have?

Because we have that, can generate the first two layers, the replication layer and the technology layer, because they are driven by the information of the metadata of the source system. Then comes the part where humans come in and have to define how they want to use with, how they want to work with the data. What are the entities? What are the ontologies? What are the context data that we want to have? And this defines the business and the serving layer. And we have, when we have this metadata collector, we again push a button and then we have the full stack.

deployed and we also have a variance or included that we can use different platform systems like Microsoft FairRig or Databricks or even Synapse so the customer can choose which is best fit for him. So this is a high-level overview of what the AI frontier model is and how we as a company tries to adapt to this idea and include all the AI possibilities, the effectiveness.

but still have this very important part of defining metadata that will be persisted also in the serving layer.

Dave Mariani
I love it. So this is a really great sort of overview about how you can sort of construct all this and sort of make this all work, right? So Jens, I want to switch gears for a second and say, okay, once we have that sort of that, once we have this system, what can you really do with the the tools that are available today with LLMs and the like? So if you wouldn’t mind, I’m going to let’s let’s switch gears here. I’m going to I’m going to go ahead and share.

and show you what and take the listeners through a demonstration of just how this all can work when it comes to actually using a semantic layer with Gen.AI. So as you can see here again, this is at scale design centered. So this is our semantic model that we have and

Here’s my measures and my dimensions that you see over here on the right. And they’re all made up of semantic objects. You mentioned our open source SML that stands for semantic modeling language. So these models can all be created programmatically or you can create them using an interface like this with drag and drop. And we can also use AI to actually generate these models. So…

So for example, I can come to my Databricks warehouse since we’re talking about Databricks and I can actually look at my data sources here and generate a new model from that really quickly. But so once I have these models deployed and I have some models deployed through model context protocol, you can see I have Claude here. Model context protocol is like a JDBC connector for your LLM.

So here in Claude, you can see I have an AtScale MCP server that has been connected to my Claude session. And you can see that MCP has tools, it has prompts, and it has resources. And so here are the different tools that we’ve exposed the AtScale semantic models or semantic layer to Claude. So you can authorize, you can describe models, list models, and then you can run queries.

So all I need to do with that is in Claude, can say, for example, show me the at scale models I can query. And Claude is gonna go ahead and using the MCP, it’s gonna go ahead and authorize into the semantic, the at scale semantic server, and it’s gonna list my models here that I can look to explore. So there’s the models that I’ve deployed.

And then I’m going to be focusing on the internet sales model. So internet sales model, Jens, it’s AdventureWorks. And what I’m about to show you is that you and I have worked with Microsoft AdventureWorks sample data for years and years, right? And so we understand it well. So I can say, show me the metadata for the, we call it internet sales, internet sales model.

And you can see that this internet sales model has a whole lot of dimensionality involved in it. So it’s got, you know, several, you know, lots and lots of customer attributes, product attributes. So at this point, the LLM, which is Claude Sonnet 4, knows about all these different metrics and all these different dimensions. So you can see the context has been shared.

with Claude. So now I can do, if I was doing the old school way of querying data like I would if I was in Excel or Power BI, I might say something like, show me sales by product for Germany, since we’re talking to Jens who’s based in Germany.

And so what this is going to do is it will actually use the semantic layer to run a query. And it’s going to figure out the right way to run that query. You can see the first one it tried was incorrect. But it’s smart enough to use the semantic layer metadata to correct it. And then here’s all my products and my sales by product for Germany. Now,

Jens Kröhnert
It doesn’t have to guess which Berlin it is. Is it Berlin in Germany or is it Berlin in USA? This is all included in the semantic model.

Dave Mariani
It is because we have our hierarchy for location. We also have our hierarchy for products. So you can see we have product name that rolls into category and product line. So it understands that. But Yens, look what it did, right? I just asked for, I didn’t say give me my sales by product, right? But it showed me first of all, the top performing products. And it also gave me some key insights.

Like it says that road bikes dominate my high value sales and it noticed some patterns here that it delivered that for me. But this is like the question I asked, which was show me sales byproduct for Germany. That’s an old school way of doing it. That’s a BI type of query. But what people can do with these LLMs is they can do this instead, Yance. They can just say, tell me something.

about sales. And so now I’m letting Claude generate and figure out and run its own set of queries and then analyze the results of those queries and tell me something that I don’t know. So it’s doing the job of the analyst without the analyst having to hunt and pack and drag and drop and pivot and do all that for potentially days.

Jens Kröhnert
Totally fascinating.

Dave Mariani
So look at what Claude is doing at this point. Again, this is not at scale doing this. This is not an at scale interface. This is just a off the shelf chat bot. And so it says it found some fascinating sales insights. So what did it find? It found that the United States is by volume. It says that Australia is the highest efficiency per unit and that Australia is punching above its weight. This is like.

38 fewer units sold, but much higher value purchases. And it shows the, I mean, it has a Goldilocks effect, so it shows that road bikes are the premium segment king, and they draw sort of in other purchases. It did it by time, it did it by demographics, and showing me my split between male and females, and ultimately the big picture says premium bikes and accessories is that this is what you should be doing.

You should be building a global presence across markets. And you mentioned about KPIs. Okay, I can just ask it. You know, what are the best KPIs I can use for sales? And I’ll let’s I’ll let’s Claude come up with that for me and it to generate its own KPIs. So and here they are. So so I was blown away again because

What people I don’t think, like I didn’t realize is that I think of using natural language query as a faster way of doing drag and drop in my BI tool. But using and pairing an LLM and a chat bot like this with a semantic layer goes way beyond.

Jens Kröhnert
Definitely. Definitely. And even you can even think further. You could ask strategic questions like what products would be very beneficial to push, for example, by a marketing landing page in that region and you will get an answer. Now think further.

Dave Mariani
way beyond it.

Jens Kröhnert
Today I had an internal innovation lab again here at Aurelis and we tried out Vibe coding with Co-Pilot and we were all fleshed out. Guys who could code were fleshed and guys who never really coded were fleshed. They all went from zero to a running system in one hour. let’s put that in here. You get hints what could be beneficial. Let’s keep it with marketing and landing pages because that’s an easy one.

Now you give the complete system also some freedom. Again, based on the semantic layer, based on your context, based on your understanding and your measures, now you say, okay, you get some budget and you get some freedom. Here is a cloud space, you can set up landing pages by yourself, measure the result, check back if it’s working. If it’s working, go further. If it’s not working, try the next best idea. So this is…

Dave Mariani
Mm-hmm.

Jens Kröhnert
the AI frontier model thought even one step further. You can do the operation then automated but only in a good manner if you have a confirmed understanding of your surroundings, confirmed understanding of your entities, your goals and if you don’t have that, think about all the other companies having something like this in place. And in the future

Dave Mariani
Disney

Jens Kröhnert
You will only be visible if you have that agent. Only agents will communicate anymore. You don’t use the internet as a human anymore. Only the agents will use the internet anymore. You will not be visible if you don’t use that. So it’s clearly two steps. Define your world. Write it down. What is important for you? What are your goals? If you don’t do that, you will not be visible as an entity in the future. And this is why they’re so very important. If you add on the future thinking step in the future.

AI will operate on that stuff. And if we don’t enable it, the others will, and we will have a competitive disadvantage.

Dave Mariani
So, Jens, I couldn’t finish with a better, I couldn’t finish any better than that. That’s fantastic advice and it is a new frontier for AI. So, I hope that you took away something here. This is a pivot point. This is not just a regular innovation. It really is an inflection point for analytics. And I hope that

Yens, thank you so much for helping sort of make that and deliver that message because it is a new frontier. So Yens, thank you for joining me today. And for the listeners out there, thank you for listening and stay data-driven. Thanks everyone.

Jens Kröhnert
It was a pleasure Dave.

The AI Frontier: Semantic Layers for Autonomous Agents with Jens Kröhnert

Meet our Guest

Jens Kröhnert

Meet our Host

Dave Mariani

Transcript

Be Data-Driven At Scale