Semantic Layer, LLMs and Data-driven Insights

In this interview, Dave Mariani interviews Ryan Janssen, the CEO and founder of Zenlytic, and Paul Blankly, the CTO and founder of Zenlytic. They discussed the importance of Generative AI, Natural Language Query, and analytics in the data industry. The conversation delves into the role of analysts in the evolving landscape and the potential of LLMs and the semantic layer to revolutionize data consumption. Overall, they explore the opportunities and considerations in leveraging LLMs and the semantic layer for data-driven insights.

See All Podcasts

Meet our Guests

Ryan Janssen

Founder and CEO at Zenlytic

Ryan is the CEO of Zenlytic, a business intelligence tool you can talk to. Zenlytic has raised two rounds of funding; investors include Bain Capital Ventures, Primary Ventures, and the Sequoia Scout Fund.

Paul Blankley

Founder and CTO at Zenlytic

Paul is the Co-founder / CTO of Zenlytic, a self-serve BI tool that uses LLM's to provide a simple chat interface to complex business data. He's worked in data for 7+ years, managed dozens of data stacks, and is passionate about all things data and AI. He has a master's from Harvard in Data Science and he lives in Denver, CO, where he spends his free time snowboarding, running, and rock climbing.

Meet our Host

Dave Mariani

Chief Technology Officer, Founder, AtScale

Dave is the founder of AtScale and is the Chief Technology Officer. Prior to AtScale, he ran engineering and data at Klout and Yahoo! where he built the world’s largest multi-dimensional cube.

Semantic layer which you know inherently just the ability to translate, an end user’s needs and commercial business users regularly speak type request into the complexities of a modern data warehouse!

– Ryan Janssen

My general take on what I’ve seen so far playing around with all these models from the open source ones to the closed source ones is that generally the people with tons of money will probably always have the highest and most generally effective models. So my sense is that the Amazons Anthropics Open AI, Microsofts, Googles of the world will in general have the best sort of like general purpose, large language models. I think as tool chains and open source models become easier and easier to train, we have more tooling for creating your own data!

– Paul Blankly

Transcript

Dave Mariani: Hi everyone, and welcome to another episode of the Data-Driven podcast. I’m Dave Mariani, the CTO and co-founder of AtScale. And I’ve got another couple of founders as guests today on the podcast. So I’d like to introduce you to Ryan Janssen, who’s the CEO and founder of Lytic. Welcome, Ryan.

Ryan Janssen: Thanks, Dave. Pleasure to be here. And thank you so much for hosting.

Dave Mariani: And Paul Blankly, who’s the CTO, like me and founder of Zenlytic. So welcome to the podcast, Paul.

Paul Blankly: Thanks, Dave. Excited to be here and excited for our conversation today. We’ve got a lot of great topics queued up.

Dave Mariani: Yeah. So, hey, so there’s so much talk about Gen AI and Natural Language Query. We had conferences, just last week, from Snowflake and Databricks. It was all the talk of the conference, lots of excitement around it. And you two guys have started a company that’s right in the heart of, of gen AI and analytics and natural language queries. So I would love for you guys just to, and maybe Ryan, you could start just to just talk a little bit about yourself and what you guys do.

Ryan Janssen: Yeah, for sure. Well, I’ll start with Zenlytic. we like to say Zenlytic is the world’s first self-serve BI tool. and the reason we say that first raises eyebrows, but, we said that because, in, in our experience, and, we can explain our origin story after this, in our experience, a a lot of the BI tools out there are really by analysts for analysts and mm-hmm. . there’s little, you know, SQL windows popping up where you have to know one or more programming languages or whatever to really get the full value outta them. if you don’t have that ability, if you don’t know how to think in terms of data and things like that, quite often you’re limited to a, a fairly static dashboard experience. So, so our goal broadly has always been to be, the world’s, you know, best self BI tool.

Ryan Janssen: and that’s sort of revolutionized, I suppose, that’s become, accelerated. And, we’ve, we’ve embraced some of the developments that element technology to really make that happen. and Paul and I actually originally met studying ai, in master’s degrees, and we, have always sort of had some sort of language component in the tool. but things changed a lot in sort of December this year, when the capabilities at lms, especially the ones led by open ai, just started to kind of skyrocket. and that allowed us to unlock a lot of new functionality. I mean, it went beyond just better comprehension. It actually allowed us to add this sort of always on instant AI data analyst chat bot, which is kind of a new paradigm for working with data, but it turns out that’s actually sort of a, one of the necessary conditions for having a really great self-serve experience.

Dave Mariani: Yeah. And so, tell, tell, tell the audience a little bit about, about your background and, and how you got to Zenlytic.

Ryan Janssen: Yeah. Do you wanna go first, Paul

Paul Blankly: Yeah, totally. So Ryan and I met, in grad school. we met at Harvard doing master’s degrees in ai. This is right around when the, you know, first transformer paper was published. So we gotta see a lot of the current sort of l l M technology go from early, you know, everyone’s like, wow, what is this transformer thing This is pretty crazy. into, you know, its current incredible, instantiation now. So we worked on a bunch of projects together there, and then right after that we started a consultancy setting up a lot of these data stacks, working with a bunch of the existing BI tools. That experience, led us directly into Startings Zenlytic, where it was like, self-serve is not solved, not even close mm-hmm. . And, you know, we need this next wave of large language models to be able to enable true self-serve in any organization.

Ryan Janssen: Yeah. And, and my background is I was an engineer in my native Canada for a couple years, but I actually ended up going to get an MBA in the uk, and then becoming a venture capitalist, for, for about six years. I invested in all sorts of different types of technology deals. then I finally came to my senses and, and crossed the table back to the, the technologist side again. went back to grad school, which is where Paul and I met Paul. Paul’s very, humble about it. He says, we went to grad school together in actuality, he carried me through it. and a lot of other people in the class. one of my, one of my favorite, stories of our misadventures at Harvard was, one of our, one of our major projects was, with an organization called the Minor Planet Center.

Ryan Janssen: This was then in conduction with the people who run it actually. and, the, the, the project was to identify if there are any asteroids that are gonna hit the earth. and group does a great job identifying that. They’ve been congressionally mandated to do that. But there’s a bunch of sort of computationally intractable trajectories of these partial glimpses of asteroids that they basically have to throw in a big drawer and say, okay, these gonna be, you know, where are these asteroids going We can’t figure this out. And the original plan was to actually have this, they’re building a data center that could actually handle this over the course of several months, like a cluster of computers. Mm-hmm. , and, you know, Paul and the head of the, the minor plant center worked together to build this really amazing ML algorithm, that solved it on Paul’s MacBook, over the course of a weekend, and sort of substantially matched like 99% of these sort of unidentifiable trajectories. Wow. so successful. It was actually published, and so successful after that it was actually implemented. And that algorithm is now in telescopes, you know, around the world actually watching for asteroids to protect the earth as we record this podcast. so long story short, yeah. Paul carried us through that program. .

Dave Mariani: Well, yeah. That’s pretty, pretty amazing, Paul. So we have you to thank or for you to like, you know, if Yeah, I hope you’re right, dude. Cause, if, if the, if we get hit with an asteroid, it’s all your fault now, right

Paul Blankly: Yeah.

Ryan Janssen: Don’t, don’t make any, don’t make any plans for February 21st, 2032 ,

Dave Mariani: Man. This is when AI better be, right. I don’t wanna have any, I don’t wanna have any skew in that, in that data that you’re generating there, . but that, that’s actually, that’s actually pretty amazing. and so, so obviously ai, has been, you guys grew up in the ai, era. so did you like, gravitate to large language models or was it sort more, more sort of the traditional AI and you got interested in, in LLMs later on

Paul Blankly: So we started out, you know, like the transform papers published when we were actually in school. Okay. So we started out with basically like, you know, the, like, that was kind of a time where like professors would rock up and say, Hey, what I was gonna show you today is actually not really relevant anymore, so we’re gonna talk about this other thing that was just published. so it was a really fun time to be, in school. just you kind of get to see like the field, you know, galaxies forming and all that. Mm-hmm. . so we’ve always been interested in nlp and our product has always had an nlp, component back when it was called NLP and not large language models. Right. But what that used under the hood was a lot of the sort of predecessors to, you know, chat G P T. So think, if you’ve heard of Google’s like burnt model before mm-hmm. , that was underlied a lot of, our technology before. So we’ve been using this stuff for a while, but it just hasn’t been at its level of , ability until recently.

Ryan Janssen: Actually, a fun fact about that is that, so Zoe, our chatbot we’ve named for Zoe, is is a nod to the old, old, old LMS of your, because they were, a lot of people don’t, they originally named after Sesame Street characters. So there’s, and then there was Bert, and then there was Ernie, and Zoe is a character from Sesame Street. So that’s just kind of an homage to the original large language models.

Dave Mariani: I

Ryan Janssen: Love it. I love it. And it’s actually one other interesting point about that general AI versus sort of task specific mm-hmm. , we, at least for me, I had a really transformational moment in that in our early consulting days, one of our earlier consulting clients, we were building some generalized ML models for, you know mm-hmm. , predict this, predict that. And, you know, we, we, we basically built this really, really amazing predictive, there’s an extra booster, something predictive model that could identify if, you know, which they’re gonna convert. Lead scoring is a classic use case for this, right Mm-hmm. , and, we gave them a big list of emails. We were like, look, here’s, here’s, here’s your, here’s your 10,000 emails. Like, these are the ones that’re probably gonna convert. And they looked at us and they’re like, yes, but we’re gonna email all of them, right

Ryan Janssen: anyways. And we’re like, yes, but now you know, which ones are gonna convert. And they’re like, the vibe was kind of like, so what And then we, then we, they kind of looked over to the left and they were like, what’s that And we’re like, well, we just sort of cleaned up all your, we pulled together all your data. We cleaned it up. you know, we made it human understandable. And then that’s what we trained it on. And that’s what made them really excited. And they’re like, wow, we’ve never seen, we’ve never had this access to clean accessible data before. and it’s like, it’s, you know, this is amazing. And, and we were like, yeah, but this, this model can predict the future.

Dave Mariani: 99%

Ryan Janssen: Accurate. They’re like, no, but look at this clean data. You know And that was, that was a big realization for me. really, I guess about two things. I mean, first was just the importance of what’s happening, with data tools now. And, you know, it actually reemphasizes the importance of a semantic layer, especially, the other thing is that, AI’s super cool, but it’s important to focus on what problems it solves as opposed to how cool the tech is, you know Yeah. and, I think LMS are getting better at that, and we’re starting to see some, some real world applications now. and that’s always been the lens that I’ve kind of looked at this stuff. So I guess LMS have always felt like one of the earliest opportunities for us to really make good use of AI technology. So I guess it’s always been a, a, it’s always had a special place in my heart, I suppose.

Dave Mariani: Yeah. Yeah. Look, it’s, you know, it’s a, it’s kind of the same thing. I have the same sort of feeling about, about real time on streaming. It’s like, most of my customers are like, I could have it updates in one day. Wow, that’s fantastic. And it’s like, forget about streaming guys. It’s like, streaming’s are, streaming is great for very specialized use cases, but most customers out there are just, just want to get access to clean data daily. And it’s, it’s so sad because we’ve done so many things in our product to make it so that you can ingest data so much faster. And they’re like, eh, a day is okay. And it’s like, so, you know, sometimes we just don’t realize how, you know, how, how, how, how most customers are really at the starting line, and we’re sort of trying to take ’em, you know, take them, you know, into the end zone. It’s like, we gotta, we gotta help ’em out. so, you know, so, so that’s, that, that’s, that gives you, gives us some really good context about sort of where the whole idea for Zenlytic came from. So you, you, you sort of sounds like you sort of found, you know, you, you saw the importance of data, and, and, and, and you applied your love of, of, of, of, AI technology, to actually make that experience of exploring data much easier. Does that sound sound about right

Ryan Janssen: Yeah, I think so. There, there’s actually an important thing too, that’s, that’s highly relevant to this discussion, which I think there’s actually two really important pieces, which are necessary conditions for good self-serve. One is lms, which we talked about mm-hmm. , the other is actually the semantic layer. Mm-hmm. , which, you know, in inherently just the ability to translate, you know, an end user’s needs a, you know, commercial, you know, business user regular speak type request into the complexities of a modern data warehouse. Mm-hmm. , I mean that, that, that, that, that feels very, essentially selfer in general. but then the overlap of those two things I think is very powerful.

Dave Mariani: Yeah. And, I, you know, that I’m a, big proponent of the semantic layer. So, that’s, that’s, that’s obvious. But, yeah, having that sort of, that, that business lens that I think of like a z a translator, you gotta, you gotta translate the technical data into business terms and, and, and, and you need a human to kinda curate that, that process. Right. But once you have that, then it opens up all kinds of interesting things, especially with LLMs, cuz now you’ve got some great training data, to really help people to be able to ask questions of that data in a much more natural fashion. you know, pointing and clicking and, you know, using a mouse, it’s kind of a really crude interface when you think about it, when it becomes the, you know, computer human interface. Right. and so what’s so exciting is that it’s like we’re actually using English as our language, as our programming language now, rather than uhl. and that, that’s pretty exciting. Sounds like that’s what you guys are really making, making a reality.

Paul Blankly: Oh, yeah. And I would even say for, for most business users, English has been the programming language for a while. It’s just that English has had to been, had, has had to be intermediated by someone with knowledge of the business. Mm-hmm. , whether that company has a semantic layer or not. That question comes in English to an analyst, gets thrown into Jira so it doesn’t get lost mm-hmm. . And then a few weeks later, that person takes their sort of tribal knowledge of how do we do things How do we calculate active users How do we do revenue and then they go get the answer, bring it back. with a semantic layer and large language model, you can actually take that human out of the loop from most questions where you’ve got the definitions, you’ve got the business context, and you’ve got the language model that’s able to take that input and kind of map that loosely formed input with some follow up questions into the real definitions that the person’s asking about. And that’s super exciting.

Dave Mariani: So, so Paul, you bring up a sort of a, a, a good question, which is, you know, where is the, you know, you have your data teams, you got your, you got your end users or your, consumers. So, you know, where does you know what, you know, who handles what in this sort of new world how, how do you deal with that

Paul Blankly: Yeah, I think one of the best questions to ask is like, how does the analyst role change, right Mm-hmm. , for all the analysts listening to this, this is really important. This is how, you know, your role’s gonna change as AI comes in. and I think it’s pretty exciting actually. You spend more time doing high quality, you know, high context things that large language models can’t do. Things you’re writing custom code custom SQL for outside of the semantic layer. But then the other part of your job that’s gonna increase is maintaining that semantic layer, making sure descriptions, definitions are up to date. Because what you do by maintaining that semantic layer is you enable the LM to answer all these ad hoc questions. Mm-hmm. , pretty much all data teams right now, you know, at least a third of their time, if not more than most teams, more, is spent just answering these ad hoc questions. And I think that number is going way, way down to the point where the ad hoc questions are pretty much just, can we start tracking this data instead of Yeah. Can you just go create the SQL query for me

Dave Mariani: Yeah. I love that. I love that. Like, yeah, semantic layer was really sort of the missing piece, isn’t it I mean, we had the semantic layer, look, you guys are, you guys are young. I’m, I’ve been around since when, when OLAP started, and OLAP really was for me, revolutionary cuz I learned sql, and wrote SQL and became a sequel expert. But when I saw olap, I saw how the end user, the business user, just completely loved it and was so much more productive and did incredible things. And then OLAP as a technology failed, right Because it, it’s, it’s, you can’t pre-calculate everything. The data got to be too big and too complex for that. And then we just forgot about it for a whole generation. and, and, and literally it’s like, you know, it was me when I was at Yahoo.

Dave Mariani: It’s like, you know what It’s like, gosh, I just need this semantic layer part of, of Ola, because that’s was really the gem, was the semantic layer part. And so here we are talking about that again, but there was a whole generation of, of analysts and data engineers who just never really understood what the value of the semantic layer was. And so all of the, the, the visualization tools that you guys are now competing with and blowing away, I’m sure are all about how can I make that analyst a better sequel jockey and that’s just seemed to be completely backwards and wrong to me. And so it seems like we’re bringing it back to, you know, bringing it back to reality where we’re saying, Hey, anybody should be able to ask questions of data, not just somebody who knows how to write sql.

Ryan Janssen: Yeah, absolutely. Something, something really magical happens when, you put, the, when you combine a domain experts, you know, domain expertise mm-hmm. with the ability to quickly, you know, effortlessly have access to data that just, you can’t quite get when you’re doing the quick data polls and you’re going back and forth. And first, that’s a much slower process in general. but you know, the sort of insights that come up when you have a really, really fat pipe into what’s going on, and you already have that human expertise, whether you’re a performance marketer or a sales op, you know, whatever, whatever it is that you’re an expert in, that’s, that’s where really, really effective like data usage happens in, in my opinion, basically. And, and, you know, the BI tool thing, is really interesting cuz you, you’ve, you, you covered something that I’ve always really, you know, my, my worldview about BI tools is that we, two things have, have always happened, right

Ryan Janssen: So like, the first is that it’s been sort of a monotonic progression, or at least a, a long arc of progression mm-hmm. towards, better capabilities for self-serve users, right In the beginning it was just static dashboards, then it was sort of filterable dashboards, then you could drill a a little bit and then, you know, you could slice by a few things, though a lot, then you could slice by everything with a sematic layer. So like, it’s been a, it’s been a progression of increasing self-serve capabilities mm-hmm. , and then it’s always been, what’s possible has been dictated by the availability of compute, right So like Ola existed, because there wasn’t really enough computing hardware to actually run those quarries live, for example, and like mm-hmm. . So whatever was possible at the time was dictated by the compute. And we’re rapidly now entering a world where the cost of compute is essentially free, right Mm-hmm. . So the, what gets me really excited is like, what new paradigms for consuming data are gonna be unlocked as the cost of compute goes down to zero. so that’s a very top-down sort of CE point of view. I love it. I’m excited about it. Yeah.

Dave Mariani: I love it. I love how you still got that VC hat on the road,

Ryan Janssen: ,

Dave Mariani: You know, I’m sure your fundraising is much easier being that you, you, you, you’ve been on the other side of the table, isn’t it

Ryan Janssen: Yeah, it’s been, well, it’s been interesting actually. Yeah. And, you know, I think it’s, it’s a good tool set to have for sure. I’ve learned a lot on the other side of the table, even though, you know, I was a vc, I saw thousands and thousands of pitches, and I thought I kind of knew how it worked, but like mm-hmm. , you know, it’s just kind of interesting stuff that you don’t anticipate learning on the other side of the table that ends up being really, really, really, really true. So I feel like I’ve learned a lot about the pitching process by being on both sides, and I still continue to learn how to do it as a founder now.

Dave Mariani: Yeah. And, well, and you know, you know what they’re looking for too. So it always, always helps to be able to be at the ready there. you know, so look, so natural Language Query has been around for a while, right ThoughtSpot was out, as out there, has been out there for as long as AtScale has. and, you know, the BI tools themselves started to add natural language query features, and I, you know, what it’s like, and it’s always been sort of nichey. It hasn’t really sort of exploded. So, so what is the difference guys in, in today versus, you know, those sort of early versions of, you know, of, of natural language query

Paul Blankly: Yeah, that’s a great question. I think there’s two important lenses to look at here. The first one is just the capabilities of the, of the language models. I mean, that’s the, that’s the most basic one, but it’s like, you know, instead of trying to parse a string using some, you know, parts of speech kind of tagging or something, and just, you know, mapping that to existing columns, basically mm-hmm. , you’re able to pass this to a large language model that can actually look at this and have a way deeper understanding of English. And can, if someone asks about New York can see, oh, in the database, this is in y instead of New York, I should do the in y filter. Whereas previously that just would’ve been impossible. So, the aggregation of all of that made this stuff just not really possible before. The other really important thing that I think most people miss actually, is that you need a semantic layer to make this work.

Paul Blankly: So look at Tableau’s ask data, for instance. Mm-hmm. If you ask it like, Hey, how much revenue did we have for, you know, people in Oregon Then it’ll go up and sum up some column mm-hmm. most, for most organizations, the revenue calculation is not a simple sum on a clean table. Yes. It is a lot more complicated than that. And you’re filtering out canceled orders. You’re, you know, like getting rid of gifts, you’re like taking friends and family and employee stuff off. Like, like, this is a really complicated calculation, and you’re not just gonna guess at getting the sum right, which means that it’s kind of like any neat tool. People try it, they’re like, oh, that demo works great. And then they try it on their own data, which is really complicated with sophisticated calculations for active users and revenue and these other things. And it just doesn’t get anywhere because it’s just trying to sum up columns and you’re not able to encode these metric definitions that really matter,

Dave Mariani: You know, Paul, so Yeah. And you’re getting at sort of some of the risks of, of sort of relying on on large language models or, you know, well, well, is that a risk Right Is it, it’s, I guess with the semantic layer, you’re sort of lowering that risk, but what are some of the things that can go wrong in this sort of new model

Paul Blankly: So I think using a semantic layer is absolutely crucial here. Mm-hmm. , if you don’t use semantic layer, you run the risk, both of it getting a definition wrong, but also of it just changing a definition. You can’t have non-determinism in metric definitions. You can’t just change your definition of active users, especially as you look at different things. You could be looking at feature X, feature Y, you get to feature Z, and the LM decides to change how it counts active users. And you think feature Z is doing great, it’s not, nobody’s using it, it just changed the definition. that’s scary. That’s really scary. That’s really dangerous. And you have to have, you know, determinism in, in these metric definitions. and the other thing is, even with a semantic layer, you can run into some of these problems, which means the, the way you have to get around that is you have to show the user everything that’s going on in the query all the time mm-hmm.

Paul Blankly: , because if you do that, if it picks, say, net revenue instead of gross, as long as you make that abundantly clear to the end user, and that’s like the thing they see, in the result, then, you know, it’s clear, oh, okay, well, you know, I asked for revenue, it picked net. I I was thinking gross, I should just, you know, clarify and ask it to do gross. That’s a simple fix. If you just sort of gave it, gave a number without saying exactly what filters are applied, what time period are we looking at, what metric is this in your semantic layer with the description, it’s still gonna be misleading. So even with a semantic layer, it’s not a, it’s not fully a silver bullet. You’ve still gotta show the user all the metadata, everything that’s involved in the query, so that if it does pick something that they weren’t expecting, that’s abundantly clear to them and they can correct it.

Ryan Janssen: One, one, one interesting approach that, you know, there’s a, there’s a lot of people who are focused in this right now. I think a a lot of those people are trying to use LLMs to write sql, mm-hmm. . Yes. And, and I, I, I don’t think that’s the right approach, for as much confidence I have in the cognition abilities of lms, you know, this is when you’re doing public company reporting or board reporting or HIPAA sensitive reporting or anything, you know, you can’t trust the non-deterministic solution, mm-hmm. no matter how good it is. Right And these LLMs, if you have them writing sql, they’re just a risk. They’re gonna write bad sequel, and it’s either not gonna work at all, which is the best case. And the worst case is it fails silently, right

Ryan Janssen: So mm-hmm. , again, that’s why the semantic layer is so important is because the sequel generation can be done in the semantic layer in a much more deterministic fashion. And that actually is a big unlock. the, the other actually really big unlock, which I think is also important for Dan, your question about why, you know, what’s happening now versus people who have been adding this forever. Mm-hmm. , this is where LMS really shine is, the last six months they’ve really unlocked sort of chat versus search, is how I think about it. Mm-hmm. , and, you know, we’ve, we’ve always had LMS in inside analytic, before the revolution happened. it was, it was a single quarry search, right You, you type in a sentence and it would try and build it into a query for you. you know, the the fact is, is any analyst will tell you that those quick data pulls, pretty much never end in a single question, right

Ryan Janssen: Mm-hmm. , there’s, there’s very poorly formed questions, right Which you have to guide the end user to, you know, what do you mean by sales here Do you want gross or net And like, you know, you forgot to include whatever it is, you can, you know, fill in the gaps of the structure of that question to make sure that it’s explicit data question. people want to follow up and, you know, they want to go back and change things. They want it daily instead of weekly, or they want to be able to drill into something here, or actually tell me more about this. And those, those quick data polls, are always a back and forth, mm-hmm. , there’s, there’s a really interesting, you know, open question right now. I think we’re still figuring out collectively as, you know, the software industry, how LMS should be integrated in to conventional tech.

Ryan Janssen: And, you know, some people are building chat windows, some people are building, you know, they’re, they’re sending pre-canned prompts into an lm and there’s all sorts of different, you know, things that are emerging. I think in our case, we have a really clear use case, that chat is sort of the right answer. and people, people ask us a lot, it’s like, is, is this chat gonna replace, you know, dashboards Like, how are you gonna get this to replace dashboards And I think that’s a, you know, people will misunderstand that we’re not trying to replace dashboards. we’re trying to replace the following conversation with the data analyst mm-hmm. , and we have such a clear North star with that. Like, we’ve been in those conversations so many times, we know what those look like, and those, those are a chat, right So like, as, as, as a North star for us, we have a clear indication that, you know, chat is kind of the, the right paradigm. And it just so happens that that’s what LMS got really, really good at about six months ago. So we’re really excited. Yeah.

Dave Mariani: Yeah. You know, so I mean, what I think what I hear you’re, you saying is that it’s, it’s not a feature, it’s really an application. In other words, you know, all the, the big BI players have been treating, natural language query as like a feature of, you know, as, as a quicker, as a, as an alternative way of building that dashboard. But you’re saying that there’s, there’s actually much, much more. It’s what happens after, after that initial question where, you know, where chat, chat can become, a more natural interface to data. Yeah.

Paul Blankly: It’s, it’s how you deeply integrate, sorry, go ahead, Paul. No, I, I was just gonna, just gonna like, you know, confirm that basically, that dashboards are super useful and using a large language model to make creating dashboards easier is great, but dashboards are viewed a lot more than they’re created mm-hmm. . So it’s like, you know, you only get so many, like, so much efficiency from helping, you know, make creating dashboards easier. The thing that takes all the time for the data team are these ad hoc questions. And these ad hoc questions come in from someone who just doesn’t quite know which thing to use, you know, wants the best of this thing, but isn’t sure quite what they mean by best. And it’s like all of those question answering that takes iteration, that takes, you know, Hey, what do you mean by this

Paul Blankly: Hey, we’ve got gross and net here, you know, which one do you wanna look at you know, and just walking through these definitions, basically taking that business context and taking that user’s question and saying, Hey, you know, we’re not quite there. You know, what time period do you wanna look at You probably don’t want this for the history of the entire business. And just helping ask those clarifying questions to make sure you get down to what the person’s actually thinking. And the, the amazing thing with the AI analyst is that this is happening in seconds. This is happening live. This isn’t, you know, comment on a Jira ticket that then has another comment on it like five days later. So, you know, well, that’s a, you can really increase the, the speed at which organizations use data. And that’s,

Dave Mariani: Yeah. I mean, that’s super exciting. you’ve convinced me that it’s definitely not a feature. It should be a whole new way of, of, of, of, of asking questions. So, look, our vision has always been that the semantic layer will help more people, you know, use the tools they already know Excel in, in, you know, in most of our customer’s case is Excel is the interface, but even that’s a pretty clumsy interface versus what you’ve been talking about here. So, so that’s an exciting future that you guys are, are painting and delivering. so, you know, what, what else, what else should we be looking out for So, look, it’s like six months ago, I never heard of the, of, of the acronym l l m honestly, even though I’ve been tangential to ai, for, you know, for most of my career.

Dave Mariani: So, first of all, just actually back to that, was it just a, a killer interface Was that the, was that what was missing or was just, chat G B T, we just needed an interface, or have these large language models been lurking, you know, b behind the scenes and just not been able to see light of day until there was an interface that somebody put out there that allow people to use it. Why, why now So why, why six months ago did the world change guys You guys, you guys have a, you guys have a, you know, a front row seat to answering that kind of question.

Paul Blankly: Yep. I would, I, I would say that it’s a combination of two things. One, the interface is, is crucial, right Like normal, like previously, you know, when you would see a paper come out, you would’ve to be someone like me who goes and pulls this thing in to like SageMaker or some other, you know mm-hmm. like interface to like load this gigantic model into memory and test it out on a few things. It’s very, very involved. And even for a technical person, it’s kinda like, it’s a lot of work, to go and test this thing out. So the interface definitely makes a difference. But the other thing I would say is it’s, it’s really fundamentally more about the availability of just massive, massive amounts of compute and massive amounts of data. So it’s like, as, you know, cost of compute, like Ryan was saying, just gets cheaper and cheaper and cheaper and cheaper. you know, big, big vendors like OpenAI are able to train on just ever larger amounts of data with like longer training time, spend more compute, you know, optimizing ever increasing numbers of parameters it seems. and that’s kind of what has unlocked this behavior where it was before, you know, all the previous models were kind of quite literally small in comparison to the things that are the things that people are training now. So a lot of advancements in infrastructure and available compute, I’d say, is even the bigger driver.

Dave Mariani: So question about where things go from here. So, is, are large language models gonna become a service of the big, the major cloud vendors because of that What, you mentioned that the cost of compute and the, the requirements of computer so large, or will it filter down to individual enterprises where they’ll be doing and building their own large language models A lot of that sort of debate that we saw over, you know, in the Snowflake and and Databricks conference was where does this go from here Does it consolidate to the, to Amazon, Microsoft and, and, and Google and their clouds Or do you see a, a, a future where enterprises are actually using this technology specifically to solve their own use cases

Paul Blankly: Yeah. Well, you go ahead. You go first, Paul, I’ve got some thoughts too. So, so I was gonna say, I mean, I definitely, I definitely don’t have like a silver bullet here. I do see both, both features kind of happening to an extent. My general take on what I’ve seen so far playing around with all these models from the open source ones to the closed source ones mm-hmm. is that generally the people with tons of money will probably always have the highest and most general, like generally effective models. So my sense is that the Amazons anthropics Open ai, Microsoft’s, Googles of the world will in general have the best sort of like general purpose, large language models. I think as tool chains and open source models become easier and easier to train, we have more tooling for creating your own data.

Paul Blankly: Enterprises will be able to come in and say, Hey, you know, I don’t really need the thing to know about like the Calvin Hobbs comic from forever ago. I just need it to know about this specific thing, really, like, and know that really, really well. and then that’s where you’re gonna see a lot of the open source applications. And, you know, that that high level of specificity, I might be able to get better performance out of an open source model, but you’re fine tuning on a specific data set to do a specific thing. So I think we’ll see a lot of open source models. they’re smaller for specific applications, but I think general purpose applications will probably still be the purview of the major cloud players.

Ryan Janssen: What do you think, Ryan I, so I have, I have a, I have, I have a two part thought on this. The first one is, I, I kind of agree with Paul, where, generally it seems like, the most expensive or the most, the most the biggest war chest has has been winning so far, the people that are best, most in the compute, I would say I’m more balanced on whether or not that will continue. I look ahead, and I, I look ahead by looking back. So like, and the, the closest example we have for predicting this is actually image generation, I think mm-hmm. , and, you know, if you recall like image gen was like solely the domain of you, you know, Google, for instance, for a long time. and they published all these models at opening, I had their own, you know, also proprietary model, but they lost that lead to, either like smaller tools like Mid Journey or like completely open source tools like Stable Diffusion.

Ryan Janssen: And it feels like those are now the go-to tools, you know, for using image generation. and I think it becomes, it’s becoming increasingly democratized, I suppose. And we haven’t seen a lot of really specific image gen inside, you know, a vertical inside a company, for instance, but mm-hmm. that could certainly, we could certainly see like models coming up from that. The technology is capable of that. So, I would say it’s, it’s certainly possible for us to see more of the open source community and, and third party models gaining traction here. Mm-hmm. , the other two factors in that, by the way, first, I don’t know if I, I, I, I dunno enough to debate the technical validity of this, but the, the big models are bootstrapping the open source models now. So like GT four outputs are be being used to both rank and train the other ones.

Ryan Janssen: Mm-hmm. , which may or may not be a good thing, . and the other thing is there’s just, but there’s just a lot more open source people up there, right yeah. You know, there’s millions of open source researchers working always will. Yeah. Yeah. And they’re, and they’re working hard to work around the cost of compute problems. So there’s all these technologies now, like Laura, for instance, emerging that allows you to train models with much less compute much, much faster. which in a way, it kind of feels intuitive that, you know, these, the, the big models have a multi-year training cycle now, and it just feels like we’re gonna have to get to a world where that’s a lot faster in the long run. And, so that would kind of be a case for democratization. the second part though is that I don’t think that’s what’s gonna be the most important thing that happens over the next year in sort of LM technology.

Ryan Janssen: I think that that’s gonna keep steaming ahead. but I think the real opportunity right now is in the application layer. Mm-hmm. . I think so. So like, we, we have not even started to start, to use the capabilities of current tech. Like even if we didn’t, if we press the pause button on development now and just use what the current tech has, like, there is probably multiple generations of startups, but as an entire generation of, you know, startups and new tools that are gonna be built with this LM technology that can be done today, it’s just slightly harder from a UX perspective. Mm-hmm. . And like, a great example of that was my, my wife, is currently in Japan right now. She’s Japanese, and she had to give a presentation for work, and she said, Ryan, I gotta make this presentation tomorrow.

Ryan Janssen: Can you help me and I was like, yeah, sure. And I, I went on to chat gbt, I load up GPT four, and I, you know, I said, okay, here’s how I structure this presentation. And I said, you know, give me a summary of these three books, and it gave me a little text, you know, title, text, text, text for like 10 slides. And I said, okay, great. Now make a Japanese. And I said, okay, I don’t read Japanese, but I trusted it. and then I used this little trick that not a lot of people know about, where I said, okay, now can you write me a PowerPoint VBA macro that will generate this And it said yes, and it gave me the macro. You can go and paste that into PowerPoint, you know, click the run button as a, as a visual tick macro, and it will make the whole slide deck too.

Ryan Janssen: It goes from like this text into an actual slide deck. And then it was like three clicks for design. And it wasn’t even just the, the planning for the slides, it was actually the deck itself. Mm-hmm. , and I just, you know, sent my wife the Pptx file in Japanese, and she was like, how did you do this in 10 minutes and she was just blown away by it. And, you know, that, that, that could be an entire business, right Like, people don’t realize that you can actually build the PowerPoint directly, like just make an intermediate step. And I think as we see more and more of those tools start to grow in popularity and more people start to embrace those like this, this is, this is gonna be the biggest Cambrian explosion of tools in the history of Cambrian explosions. It’s like, there’s so many use cases that we could use for this tech.

Dave Mariani: Yeah. You know, and we’re seeing like, at least the initial sort of applications have been sort of co-pilots, right Mm-hmm. some kind of as assistant. but what you just talked about was, goes way beyond an assistant, to something that’s like, you know, that you, you’re right, as a, as a full-blown application. That’s pretty amazing. Mm-hmm. . well guys, we’ve, we’ve talked for a long time, longer than longer than we initially, plan for, but it’s, this has been a fantastic discussion. I love what you guys are doing. and I think that, really reinventing bi, which is you guys are doing is, is has been, a long time waiting, you know, things, I mean, we’ve been dealing with the current generation for more than 10 years now, so time is ripe, for a new paradigm. So I’m really looking forward to what you guys are, have high up under your sleeves, and can’t wait to see, you know, see what you come up with. So, thanks for, thanks for joining, the podcast today. and for everybody who’s listening, go check out Zenlytic, and and see what these guys are working on, cuz it’ll blow you away. Thanks a ton, Dave. Thanks for having us, Dave. Great chat. All right. Ha. Have a great day everybody. And thanks. And, and stay data driven. That’s right. I.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

The Semantic Layer, LLMs and the Future of Data-Driven Insights with Zenlytic’s Founders, CEO Ryan Janssen and CTO Paul Blankly