Trends in AI & ML, Feature Stores and more with Miles Adkins at Snowflake

Data-Driven Podcast

Listen to Miles Adkins talk about Snowflake evolution of the last 5 years, and how they intend to be the one-stop-shop data platform for all the data personas in your organization. You’ll hear about running data science workloads on Snowflake and what’s so great about using Snowpark, Snowflake’s client-facing SDKs, to work on top of Snowflake, and push down compute to the Snowflake engine.

See All Podcasts Button Arrow
Quote icon

My unique experiences fit perfectly into where the direction Snowflake is going. If you go back five years it was all about winning over historical analytics workload, giving customers a new engine for populating their BI dashboard. Snowflake is now taking the same journey as our customers with the adoption of going from what happened in the past to now what’s going to happen in the future by investing more in their data science workloads.

Snowflake is approaching this as you need one data platform, but we are thinking about tailoring it to different personas out there in the market. The data science persona has gotten a lot less blurry, and much more rigid and clear in terms of what are the types of tools they like to use, what are the typical processes that they go through for taking raw data to a model. You can think of Snowpark as client-facing SDKs that we built to work on top of Snowflake, and then all that work is getting pushed down to the Snowflake engine.

Transcript

Dave Mariani: Well, hi everyone, and welcome to the, at Scale’s Data Driven podcast, and today’s special guest is Miles Adkins, and Miles is the senior partner sales engineer for AI and ML at Snowflake. So welcome to the podcast, Miles. 

Miles Adkins: yeah. Hey, Dave. good to be back. happy to be here. 

Dave Mariani: Yeah. So we’ve, so Miles and I go back, away as we’ve done, with LinkedIn Live. We’ve done some sessions together with, joint sessions with Snowflake and, and at scale. So Miles today, I would just love to, to, to really pick your brain about, you know, we’re AI and ML are going and you’re, you know, you were, you became part of a snowflake, Snowflake, traditionally focused on more sort of, analytics workloads. So, you’re bringing some expertise here on the AI and ML side. So, so just let’s just start, Miles, just let’s just talk about, for the listeners out there, can you tell us a little bit just about how you got to where you are, how’d you end up landing at Snowflake, and really what was your path into, into machine learning and data science 

Miles Adkins: Yeah, yeah, definitely. I, I would say my path is not one that I, I really charter or, you know, chartered out, I guess early in my youth. I went to college and got a bachelor’s in finance. and that was really a pivot from wanting to go be a mechanical engineer. so, you know, just getting like a few math classes that freshman, sophomore year, you know, I, I had some of the math background, but I was like, Yeah, you know, mechanical engineers don’t seem to make as much as a, as a finance person these days, you know, let’s go check that out. And so I graduated with a finance degree, started working at an investment firm for, I think like the first four or five years of my career. But the good there was, the firm that I was working for was very much quantitatively driven. 

Miles Adkins: so at the time, you know, was how do we, you know, make decisions with nothing other than raw data How do we remove biases and all that stuff when it comes to making, investment decisions And so that’s kind of where I got my chance to, you know, just dig into learning Python. You know, back then, like the only package available for, for doing, you know, predictive modeling, we didn’t really call it ML at the time, was, you know, psyched learning. You could use, you know, linear regression and, you know, people were still kind of like, what’s a, what’s a tree model, right and so yeah, from there, you know, just putting in the work and, and the learning kind of in that first job in my career, I was able to, really develop the, the foundations of being a data scientist. 

Miles Adkins: Once I was kind of done, you know, trying to, you know, predict where markets were gonna go, I really wanted to develop my people skills, you know, client facing skills. And so, I had the fortunate opportunity to kind of go into the technology industry joining the company data robot. and, and then again, of course, you know, a bunch of new stuff to learn, you know, how do we communicate with clients How do we sell them data science platforms And so, a few of those, I guess, unique experiences kind of fit perfectly into where the direction Snowflake was going. Obviously, Snowflake, you know, if you go back five years was all about just winning over that, you know, regular, you know, historical analytics workload, right how do we just give someone a new engine for populating their BI dashboard 

Miles Adkins: and so I think Snowflake is really taking the same journey as customers are taking today with that adoption from going from, you know, what happened in, in, in the past to, okay, now what’s gonna go happen in the future And so they were going to invest into the, the data science workload, and obviously with me having a good, background, working for one of their top ISV partners at the time, you know, is kind of a perfect fit for me to now start just taking this to their entire ecosystem. And so that’s what I’ve been working on today. 

Dave Mariani: Wow. That’s a, that’s, that’s a cool journey. So you were a practitioner to start with, really applying machine learning to actually make a difference in finance. I was also a finance guy. I have a BA in economics, so go figure, right, we’re both here in technology now. So talk a little bit, let’s talk a little bit about how, how Snowflake is investing, when it comes to data science. So you’re right that I think the perception most people have is that Snowflake is a data platform which is built for analytics, which it is, and it’s awesome. So, so what kinds of investments is Snowflake making, to help the the data scientists be smarter 

Miles Adkins: Yeah. Yeah. So I think the way that Snowflake really looks at this is one, we wanna kind of remain a data platform. We’re not gonna come out with a new product, a new whole new product related to, you know, here’s your, your data science platform that we now want you to work with. Everything we do from a design principle is about how we make new features that we launch kind of first class with everything else that Snowflake is built for. So of course, scalable warehouses, governance, security, replication, all that kind of stuff. the way that Snowflake is really approaching this is you need one data platform, but it’s how do we think about tailoring to the different personas out there in the market Obviously, you know, the data science persona has really, you know, gotten a lot less blurry and, and much more rigid and clear in terms of, you know, what are the types of tools they like to use, what are the typical processes that they go through for, you know, taking some raw data to, to a model. 

Miles Adkins: And so it’s really about creating kind of those, what I would call last mile, features that really, you know, draw that core audience into, you know, the main, the main Snowflake platform. And so you saw us spend a lot of time over the last, I believe 18 months here, building out Snow Park, and specifically Snow Park for Python to really allow data scientists to develop and, and program in their language of choice, but still be able to tap into their, their, you know, their, their core data platform that their organization is giving ’em. and I think you’re just gonna see us expand from here, right we have this concept of a udf, right You can now take, you know, any type of arbitrary code and run it inside of, Snowflake as well. How do we think about building abstractions on top of UDFs to where, you know, we have a model registry for you, for example. So continuing to build on the core kind of low level anatomic features that we’ve been building out for the last several years. I think Snowflake just had their 10 year old birthday, so 10 a decade, and, you know, just adding a few little rappers on top of it to, to tailor to the, you know, to the specific personas that we’re trying to chase in the marketplace. 

Dave Mariani: So, so Miles then with Snow Park, you’re sort of, you know, keeping the workloads close to the data is, is that right So it’s like, that seems to be sort of a, a, a, a common trend, where traditionally, even tools like DataRobot, right You sort of would extract data and then process them on separate, separate infrastructure. So, so if somebody using Snow Park, are they, are they using, Snowflake, Snowflake data platform as the compute infrastructure when they’re, building and training those models 

Miles Adkins: Yeah. Yeah. That, that’s, that’s exactly right. So you can think of Snow Park and all these kind of client facing, SDKs that we built, just do kind of your work laly on top, and then all that work is getting pushed down into, into the Snowflake engine. And so, you know, you’re not having, again, data kind of coming across the wire. I think it’s one interesting that people kind of don’t really point out is data scientists for the most part don’t care where the, the computation lives, right They would much prefer, you know, in their perfect world, a laptop the size of, of a super computer, you know, with memory gps, and they can just forget about the fact that they have to worry about, you know, you know, scalability limits, right 

Miles Adkins: What matters with keeping that data inside of Snowflake is all the other different stakeholders that, you know, have to do things like protect the data, right as, as really changed our mindset as data, as a cost to data as an asset. Well, like, you wouldn’t leave the gate open on, on your goals, on your, you know, your safe at your house, right it’s, it’s kind of adopting that same mentality. And so it’s where we need to win over that, that data governance person as well is why, you know, we hear, Hey, this is really valuable to us from not a data scientist level, but a much higher executive level on why we need all this work to happen. We’re trying to enable these things, not just for a Python developer on their, on their notebook, but also partners as well, so they can just run and, and run on top of us and kind of mitigate all these kind of very comprehensive security and governance reviews that, that any ISP would have to go through. 

Dave Mariani: Yeah. And, and ultimately, right, that’s, a data scientist’s job is not just to read data, but write data. so they’re gonna generate predictions and new features, and they gotta write ’em somewhere. So it’d be great to actually integrate it right back into the same data platform that’s being used for doing historical analytics. Now you got historical predictive in one platform, which, which, which is, we’re a big proponent of at ad skill. so, so that’s, that’s, that’s, that’s awesome. It’s, it’s, there’s some really good, good work going on in there, over there. So, congrats, congrats on your success so far. so let’s, like, let’s, shift gears a little bit, Just talk about some, some trends. So we’ve been talking technology and I’ll talk technology all day when it comes to, when it comes to, these kind of podcasts and, and, you know, sometimes we neglect toward of the people or organizational challenges. So can you talk a little bit about, you know, Miles, talk a little bit about just, you know, the state of, of the state of, AI and ml, in terms of the customer view and, and sort of the organizational people challenges, that some of those customers might be struggling with 

Miles Adkins: Yeah. Yeah, that’s a good question. I would say kind of at the very high level, I still think we’re like in the very early innings of trying to figure out what the real value of, specifically like ML is to a company, mm-hmm. in, in certain cases, it’s, it’s, it’s mind blowing, right If you look at like the top technology companies out there, Google, you know, Apple, you know, it seems like AI is everywhere and they’re, you know, these massive companies, but a lot of people kind of like in your, let’s call ’em, you know, mid midmarket, they’re still scratching their head like, Well, I, I’m not really seeing a whole lot of return from my, my data science hires from the data that I have. You know, why is the case So I think there’s still a pretty good split between people, like, is there value here 

Miles Adkins: And I think that split, what that kind of culminates to is we can’t say that we’ve, we’ve moved, we’ve moved from the phase of like tinkering with ML to, like, ML into production is like the main path that the, that the world is going, right Mm-hmm. . And part of that is, education gaps, right most ex, excuse me here, most executives don’t understand the numbers and the algorithms and all that stuff, right we need to, and I think, I think there’s a lot of great programs that are continuing to elevate the base of, of, you know, data literate people out there, which is good, but there’s, there’s still a long way to go to where, an executive will even trust a model. And this really, I think, systemically happens at the top, top layer, right You have, or CDO and your CFO come into a room and they show one number for revenue, and the other guy shows ano another number for revenue, and the CEO can do nothing, but just say like, Why don’t the numbers match up Right How can I trust any of this if you’re not giving me a concrete specific number, you know, from a field that’s supposed to be super heavy scientific, you know, backed, right and so I think there’s just always a lot of doubt as soon as the, the data, the numbers don’t line up, and that just, you know, I think is really what is holding back the, the entire industry for the most part. 

Dave Mariani: Yeah. You know, hey, Miles, that’s, that’s one of the reasons why I started at scale was, same, same kind of problem. It’s, you know, without a semantic layer, it’s really tough to, it’s really tough to make those numbers tie out because you got humans involved and humans are gonna, you know, create calculations different ways and in different contexts and different tools, and at the end of the day, they’re not gonna match up. It’s just, it’s just almost impossible. So, I hate to pivot back to technology, but sort of one of the sort on the, you know, we talk about metric stores, that’s more semantic layer, but we also talk about feature stores when it comes to AI and ml. So, so how do feature stores, do feature stores help in terms of reconciling those differences And, and can you talk a little bit about what, what a feature store is, why customers might wanna use it and, and, you know, and where it’s going 

Miles Adkins: Yeah, yeah. So I, I think a feature store is a, again, a pretty last mile concept if you go back to what is the root of the problem. and, and Snowflake has also been trying to solve this problem from a little bit of a different angle, right We’re trying to, again, knock down data silos. We’re trying to understand that the reason that you’re having all these downstream discrepancies in your numbers is because up top, well, historically, you’ve been copying data to like three different places for three different teams. They’re not using the same data. One person might make a change, and that propagates through the entire system, right and so that’s kind of where a semantic layer will come in. And then for the feature store, feature stores, again, are really specific for your, your ML engineers and your data science practitioners, but it stems and tries to solve the same problems as, you know, kind of like that, that upper, upper stream issue, which is we just need to have one consistent view, in this case, features for the data scientists that wanna consume them, as opposed to, you know, them in a vacuum creating, again, the same feature name four different calculations of how they do it, and then trying to reconcile this later with the ML engineer as they go to put it into production. 

Miles Adkins: And so, you know, a feature store, in my opinion, again, is very similar to any sort of standardized metrics technology that’s really just looking to you define it. Once we reuse and reuse and reuse again, much like a, you know, a car factory for example. It’s a very nice standard, you know, process of materials moving through the floor. We want data to do and follow that same path as opposed to, you know, a, a a, a person coming in and writing their own custom code to tweak one little thing, but then that never can kind of merge back into a standard, you know, development process. Again, models today are built like a custom car, right Assembled piece by piece. We gotta figure out how to develop that model T factory for, for, data scientists specifically. That’s where feature stores kind of come in, But again, this can apply to the BI side, right? The historical analytics side of the house too, when we think about just standard reporting. so it all kind of, you know, funnels back to that same perspective of we just wanted to find it once, we wanted to be able to be shared everywhere by any business group or data science group or, you know, backend group that wants to consume that data. And we have that consistency from, from end to end through the pipe. 

Dave Mariani: Yeah. You know, you, you, you, you, you picked on something there about, just the, the problem with really operationalizing, machine learning. I mean, it’s like you, you talked about building that Model T factory, you’re right. It’s like, it seems like it’s a very bespoke activity even in some of the largest, most sophisticated, companies out there. So, so what is, what is ML ops What is operational ml Like, what, what kinds of things do you think miles that, companies can do to get out of the bespoke, I guess, of, of, of data, of data science and, and machine learning 

Miles Adkins: Can I, can I say fire all their data scientists, 

Miles Adkins: I think I, and so this is where I think I’ve evolved. If I look back at what I was doing, you know, just five years ago, right I was, I was the cause of this problem, right if you look at my, my, you know, my laptop, I have untitled notebook, 1, 2, 3, 4 to, to 22, right And at the, and then I come, yeah, put this into a single script and, I say, you know, here you go, go, let’s put this in there, right And, you know, I got another guy next to me doing the same thing, and, you know, it just never reconciled. So I think the alert of data science I get, right. Let’s take some raw data. We’ve, we’ve scoped out a use case where if I can just get 5%, 10% left on my model, that’ll generate, you know, 2 million for the company. 

Miles Adkins: I look like a genius, right but there again, it’s a guy behind the current pull in the strings. We need to figure out a wave to, put all the building blocks around that person first for him to be successful in a, you know, in, in, in the business setting, right We gotta be able to think about scale and production and, you know, producing those models on like a factory line as opposed to, you know, just, just one person. and so I, I do think instead of trying to bridge this gap of, all right, let’s, how do we think about surrounding our data scientist How do we actually have our data scientists forget some of the processes that they’re so used to doing and plug him into a much more operational type of workflow to where, you know, again, we can get that production chain going. 

Miles Adkins: I have this idea of kind of, yeah, getting rid of this data science title. We should have like, what I would call ingest engineers, people who just worry about ingesting data into the platform from their systems. From here. We have like feature engineers where they’re, they’re just, all they’re doing all day long is generating, you know, specific features that matter for the VML use case. Then you have your, your modelers who all they do is build models, and then you have your production people where all they do is they put models into production, right You’re trying to get a data scientist to do that end to end when in real, when in reality, as opposed to having four people doing this end to end on their own, if you just, you know, compartmentalize each one to a specific task, give ’em a nice standardized contract of how they talk to each other, moving that data, that model, that artifact from, you know, beginning to end, you’ll get a lot more stuff into production. The data scientists are just spread way too thin. It’s not good to have a unicorn in your company. you, you want, you know, a very compartmentalized, systematic, you know, process, unicorns. Yeah. They cost a lot and, and they don’t produce a lot. 

Dave Mariani: Yeah. You know what, Look, I, that, that’s a, I love that idea. That’s pretty controversial, but you think about it in terms of how we build software. You know, we don’t have a full, not every engineer is a full stack engineer. Mm-hmm. You have people who specialize on the backend, front end, on data. and, if you think about your data pipelines, like you said, the, you have, you have just o obviously sources that, that you’re, you’re capturing and you’re ingesting, but then, you know, the data scientist is responsible for generating data from data, and that should be part of that data pipeline, right if you really think about it, right You’re delivering enhanced data to the business. It’s not just about, we’ve been so focused these days of just delivering historical data to the business. I mean, that’s been the focus and that’s hard enough, right 

Dave Mariani: But then you have off to the side, like you said, you got the data scientists off to the side who just sort throwing some, some, some softballs or into the, into the arena, and you gotta figure out, okay, what are you gonna do with them It would be great if they could be integrated into that data pipeline, manufacturing chain, really, if we thought about it, is that where, is that what moops is, on miles Is that, is, is that what people are talking about for moops, or is it, or is it different 

Miles Adkins: Yeah, I think, again, I think, I don’t think that’s what they’re talking about for moops. I think moops is more like, how do I get a data scientist to still do this whole kind of end to end thing, right How do I think about like doing feature drift tracking and then also model, you know, evaluation on hyper parameter tuning and then like, like again, they’re trying to do this whole end to end thing. Like one person can, you know, manage it all go deep enough in any chair to be an expert everywhere when, when in reality, right Software I think figured this out probably 25 years ago, where, you know, we need the guy for back end. We need the guy for the database, we need the guy for APIs, we need the guy for the front end. Oh, by the way, we need a designer for front end too, because that guy’s not an artist, right 

Miles Adkins: right. I still think this is because we have a lot of shoes to fill, not enough bodies, right This is kind of where we get to this idea of there’s not enough data scientists in the world, so let’s try to start automating everything. mm-hmm. is, is kind of what has come to the market in the last five years. but no one has stepped back and said, Hey, look, let’s kind of rethink how we’re doing this from, again, that kind of lower level anatomic level that, you know, people didn’t think about five years ago. This is kind of like how Snowflake came about. You know, databases have been around for 30 years. How as a snowflake was able to, you know, do so, do so well, its founders kind of took a step back and said, Wait a minute, This whole way we do this is broken. Let’s try to introduce a new way to the market here. And, you know, fortunately for Snowflake, they were right on, on how they think about, you know, doing that. 

Dave Mariani: Yeah. You know, it’s, I think there’s another trend possibly here where, you see machine learning being built into the applications themselves. And so for me, I think, the success for the success of AI would be when, you know, you have consumers using tools that are using ai, and they don’t even know that they’re, they’re actually using ai. You know, you think about, like Yelp for example. It’s like, when somebody’s using Yelp and looking for a restaurant, you know, they’re not, they don’t realize that they’re, they’re basically doing analytics, right I mean, at the end of the day, you know, they’re searching, they’re aggregating and, and, and drilling. and, and they don’t even realize they’re doing it. It’s just an application that is a means to an end. So I’m hoping that we see a lot of data science get out of this sort of bespoke mode, where their one offs, and we see a lot more of that being baked into making our applications smarter. 

Dave Mariani: So we can, we can all be smarter. I think the obvious front first, first part of that is making the BI tools smarter and adding more, you know, more features there where you can do outlier analysis, and, and just do simple things for the customer to help them find what they need a lot faster, rather than asking, you know, 50 queries. and, and, and being able to generate that and find the needle in the haystack, that’s where I think machines could really help, and, and, and Snowflake and, and platforms could really help, to make that, make that a reality. 

Miles Adkins: Mm-hmm. and yeah, and I think if you, if you just look at some of the apps that you use today, right These companies have gone from, you know, nothing to multi-billion dollar IPOs in the last 10 years. You’ll notice that a lot of ’em are doing AI in the app without you even knowing they’re doing mm-hmm. AI in the app, right every time you order from, you know, seamless pizza, it’ll tell you, here’s what it’s gonna cost, Here’s what your driver costs, here’s when we think it’ll get delivered to you. You know, all these predictions are being made on the fly, and you’re just like, All right, cool. 20, 20 minutes. That sounds, that sounds great, but, you know, no real appreciation for the, the investments that they’ve made into getting these pipelines, right and then it’s not to say that no one’s doing this, right Again, there’s these select kind of one percenters that have figured out what it takes to, to do AI well, and, you know, they’re, they’re gobbling up market share because at the end of the day, it’s a better user experience for, you know, people looking into order food, people looking to, you know, get, get analysis out of their, out of their business data. and so, you know, it’s still very much a green field. 

Dave Mariani: Yeah. You know, it’s, and like you said, it’s like that’s, it’s part of software, right It’s part of the pipeline. It’s all integrated and, and it’s that integration that delivers the power and the ease of use. So, I love all that. So, Miles, it’s, I always love to ask my guest to predict the future here. So, so where do you think, where do you see things going from here What, what should the listeners be, keeping their eyes on, for, you know, the, the near the near and medium term 

Miles Adkins: Yeah. Yeah, That’s a good question. if, if you had asked me, would I could, if I could just predict something that would make society better, it would be like, you know, a personal robot chef every day, 

Dave Mariani: Jetson style, right That, 

Miles Adkins: That would help a lot of people, right I, I do think, I, I give credit to Elon for just understanding, like, if we can just remove having to drive a car everywhere, that would give a lot of people their time back in life. I think making dinner as well every night for themselves would be another good way for people to get time back. yeah, I, I think, I think the, the good thing is the future is bright for everybody, right the cost of, of education in this space is, is I think near zero. which is, which is fantastic, right I think a lot of the people that you see in industry today that, have been able to climb the, the knowledge ladder have really done probably most of it through, you know, just open free education. and so that’s fantastic. 

Miles Adkins: I know that’s definitely how I’ve learned most of this stuff. And then, yeah, medium to long term, I think, again, you’re gonna see a lot more innovation happening, you know, faster than we even, even did in the last 10 years. Hopefully one day we can get to the spot where, you know, people’s jobs are being taken away because the technology has been so good. Obviously businesses and governments will need to figure out how people pay for stuff if they never have to work. mm-hmm. . So, you know, I do see a lot of issues ahead that will need to be overcome, but ultimately, you know, I, I think the, the, the progress of civilization is about living more, you know, fulfilled happier lives. And, you know, if you don’t have to work 40 hours a week because you know, it costs you near nothing for a robot to make you, that’s, that’s stuff overall that you would love to live in. 

Dave Mariani: Hey, you know what, technology is supposed to automate away the grunt work, right Right. And so, some people wouldn’t think cooking is grunt work. I do. And so I’m with you on that. I’d rather have a robot making me dinner . So, so, with that, miles, this has been super fun, and, and, and thanks for all your thoughts and thanks for spending your time with us. And, and so the listeners out there, stay data driven. 

Miles Adkins: Yeah, absolutely. Thank you, Dave.

Be Data-Driven At Scale