Deepak Prasad, Principal at NSW

Listen to Deepak Prasad share his thoughts on embracing automation and why the semantic layer is on the rise due to the big persistent problem of inconsistent BI reporting impacting Enterprises’ across the board.

See All Podcasts

Meet our Guest

Deepak Prasad

Principal at NSW

A strong believer of cloud, data & agility. Happy to follow and chat about anything and everything which can bring programmability to data which has huge potential in increasing transparency and reducing the technical debt in the data space. Expert & Certified consultant in designing dashboards, metrics, and end-to-end solutions using Snowflake, Tableau, and Qlik with well-honed expertise in working with global teams executing strategic projects and data analysis. My background entails around 14 years’ enriching experience in Data Sciences, Cleansing, and Transformation. I am enthused most when given the opportunity to exhibit my skills to develop strategic, substantial, and pertinent dashboards encapsulating large volumes of data.

Meet our Host

Dave Mariani

Chief Technology Officer, Founder, AtScale

Dave is the founder of AtScale and is the Chief Technology Officer. Prior to AtScale, he ran engineering and data at Klout and Yahoo! where he built the world’s largest multi-dimensional cube.

People have options to create their own structures and formulas with the data that they have. That’s a problem. The main issue is the formulas for calculating the same insights can be different, and they aren’t matching, although the data was correct perfectly. One person is downloading the data on one team and working on all their own formulas, using lookups in power BI or Qlik or Tableau or Excel. And, another person is downloading the data and doing analysis in another way. Everyone’s realizing this now, and that’s why all the semantic layer talk is starting to pick up.

Data engineering, modern data analytics, everything – can be orchestrated by piece of code. You don’t need a server. I have all my data in Snowflake. I think a semantic layer is a terrific tool in that stack. Don’t be afraid of automation. Please embrace the change that we are all going to incur together. Let the things that can be automated, be automated.

Transcript

Dave Mariani: Hi, everyone. Welcome to the outscale data-driven podcast and today’s special guest is Deepak Prisaad and Deepak is a BI consultant currently working with the family and community services organization at NSW in Australia. So Deepak, welcome to the podcast.

Deepak Prasad: Thanks for having me, Dave. yeah, so that’s, that was a good intro to start with. my son Deepak pre-SAT, I’m currently engaged, as a principal consultant, BI side, technically managing the program of all the initiatives, in BI when it comes to family and community services, which is a department, entire, department of planning, industry and enrollment and, primarily, Vera, I am on a hallway reach just like a 14, 15 years of story. Let me try to shorten it in five sentences, maybe.

Dave Mariani: Well, the Deepak you get, you’ve got to, you’ve got an incredible background and you’ve got deep experience in, in, in a bunch of BI platforms. So, so yeah, we really, tell the listeners a little bit about yourself and what you’re currently working on and what your path was into, into data and analytics.

Deepak Prasad: That’s. Yeah. so when it comes to BI, right. My experience in BI, is measured, or credited bell today because of my previous experience in the database and the working in the OAS level, a unique site and things like that. So, whatever, I am today is all because of my foundation. This is what I tell to each and every one, whenever people ask, how, how, how, how is it possible for someone to know in architecture level that’s because you worked in Unix I worked in Unix. I used to write all said scripts and so on. So, I know the scheduling, the fancy scheduling is all in black and white and Cron jobs that you run. So I started from that. So whatever I see in the fancy screens and the sophistication that’s been spilled over, I always think from the black screen and how it’s been orchestrated in the front, beautiful, elegant, UI and UX that we have now.

Deepak Prasad: So, so when it comes to BI rights, journey specifically, I started, let me go to the story, Unix, Oracle. Then I did a small bit of, ETL using, Informatica. Then I moved, to Cognos micro strategy then, click and Tablo around 2011 ish. I think that blue was version six or seven by then and, many functionalities that we have now, today in taboo, we, in that same with Qlik, many functionalities that we have today that in there, meaning that Qlik’s latest, innovation towards, APA driven, platform wasn’t even there. We had Qlik View and taboo. many people talk about LOD is nowadays, and already wasn’t that in version seven or six, but what, so it was a journey that makes you, that’s what gives you a SME level knowledge. So the journey is very important. so reaching mastery level, in anything and everything, needs to have a strong foundation. That’s what I believe in. And, if you can explain your technology to your kid and especially your mom, and if they can understand it, that’s when I feel that you reached mastery in whatever you’re doing.

Dave Mariani: I love your, I love your sort of like, starting with the black screens. Like it’s, I think it’s easier, you know, for people who today who start with all the nice fancy gooeys and tools and the like, sometimes those hide what’s really going on behind the scenes. and so having that sort of background where you’ve seen it from the other direction, you know, gives you a real appreciation about what’s happening behind the scenes. I think it makes you and makes you a, a better engineer overall, if you can, if you know where you came from,

Deepak Prasad: Same, same when, when talking about detail, right there also a tool called I’m not, I’m not sure whether you

Dave Mariani: Heard it right.

Dave Mariani: Part of IBM and then IBM bought by an issue, I think. Yeah,

Deepak Prasad: Yeah. yeah, maybe during my early part, when I dealt with ammunition, I felt I’m still working in the next, like, the ETL was completely happening in the piece of code, which is what we are talking now, data engineering, modern data analytics, data, spec, everything, everything can be orchestrated by piece of code. You don’t need a server. People nowadays say cloud, before 20, 30 years, we called it mainframe, same, same principle. You have the server and everything been taken care of by someone you, they bill you by R or a average time average, consumption per day. And then you pay them that they call it mainframe later. Now they call it cloud.

Dave Mariani: It’s really interesting to be pop cause you you’re, you look like a young guy. But so, but that’s, I started sort of like in this, in the same kind of era where, you know, if you’ve been around long enough, what what’s, what’s old is new again. it’s like, I always, always likened it to bell bottoms, bell, bottom jeans, you know, it’s like, you know, they’re going to come back around and come back and fashion. It seems like the same things happening in with the cloud data warehouses. I mean, we had, we had the data warehouses like Oracle and the like had their day, lots of, lots of difficulties getting those to be successful, seem to be replaced by data lakes and Hadoop for awhile. And now we’re back to cloud data warehouses. so, and, and your, I like your analogy with the mainframe and the cloud it’s, we’re back to shared compute again. so, so, so, so yeah. What, what else would you, what other parallels do you see between, you know, the beginnings and what we’re seeing now

Deepak Prasad: So, mean again, being a unique fan and what you do in AWS. Nobody’s with Linux writing all the shell scripts and automation, when you do it in the architecture level, people learn Linux on the go now and then do it. And when I see all this, I still, connect back, look back to the old days. And even now, talking about the other things that I compare later now, and then this would be SQL. One thing that didn’t age well is like the data at all, like SQL is always the name of the town. Like, one of the thing that I have to tell us, big data, people were, so much talking about big data and everyone was saying, oh, when you have lots of data. So then I used to ask what is a lot of data Someone would say 30 million, someone would say 600 million.

Deepak Prasad: So, then I asked, no, no. Then, then people started to curate what they say and they say, no, it depends on three V’s. Okay. What is three V’s volume, variety. Veracity. And, okay. So if I can orchestrate with what we have now, forget about big data, everything end to end in a database, would you agree that this is a big data solution No. Then they expected everything to be in the flat files and they wanted MapReduce algorithm to run about the flat files to segregate the data in batches and then connect them all together and put it in the final bucket. then, okay. So if I do the same thing with Unix and a SQL, would you agree No, you can’t still because we need, we have, we want to process unstructured data. So what do you mean by unstructured data They finance sector data. To me, a lot files, emails are all unstructured. Then how would you classify audios and videos So the conversation just goes on, right So when you start,

Dave Mariani: Oh, so Deepak, there was, just, there was, some interesting, I had some interesting sort of experiences with my own, engineering team at Yahoo because, you know, we, we were incubating Hadoop. And so, as running analytics, I had to use Hadoop as our core platform and we, and we had Oracle, but we were busting Oracle. You know, we had Oracle rack and we had as many, as many nodes on Oracle rack that Oracle rack could manage. and that was pretty tough. But, you know, I asked for SQL on Hadoop because MapReduce and pig, I don’t know if you remember pig, but that was like the way that, that was like, that was what you did, but hive was actually not invented at Yahoo was actually invented Facebook. And I’m like, well, I was telling my team, why can’t I use hive on top of a dupe, but Yahoo, because SQL, like you said, SQL is the one interface that’s been around.

Dave Mariani: It’s been consistent through all these transitions. And I’m like, well, why would you want to use SQL You’ve got, you can use MapReduce. It’s like, no, no, no, no, no. That’s like a, and so you’re absolutely right that, I think that during that sort of big data revolution, we sort of took a Hadoop and what was really a data lake architecture and conflated it with a data warehouse, thinking that it could replace a data warehouse. and there really are two different things. Aren’t they, a data lake and a data warehouse, really are sort of two different things used for due to two different purposes. so completely agree with you on that.

Deepak Prasad: Well, according to me, for a person who was asking a basic difference between data lake and data warehouse, I tell it to doing the two words I use as data lake is ad hoc, and it’s flexible. And then you can accept the ongoing changes, but data warehouses has well-mannered one, you know, it’s all, it’s all very well. It says, yeah, even if you come with this, this qualities, I let you enter into me. If no, please don’t enter, I will, I will send you all to the rejection detection sites. So you will be sitting outside. So that’s how I see it, right. That’s how I see the data lake and data warehouse and hobby room. But when we talked about big data and the SQL right there, the one thing I have to tell, tell about I’m very, very interest about us. How, how, communication layer, which is the translation layer can change, can literally disrupt everything, right

Deepak Prasad: So nowadays, if some recruiters called me in Sydney and Australia and they, they asked for big data engineer, and then I asked them like, what do you expect in a big data engineer Then now we have to process lots of data then, but before two years I’m telling like snowflake is already there, but people aren’t very much about it before two years, that snowflake kind of enrollment that it’s all schema on read. It’s not about the other way around where you can have a translation layer, which is our favorite SQL. And you can deal with the unstructured data, let it be a Jason document, Avro, parquet, or anything for that case. So you can still use SQL to flatten the data. Even it can be RA nested RA, or with the notes, the notes, anything that’s loaded. Then you have SQL where you can communicate with when your friends asked you in Yahoo, right What can be used Hive venue asked SQL. That’s what I see like white, white. We are all talking in English. I can talk to my own own language at the time, comfortable with, you can talk in the own language that you are comfortable with, but if something needs to scale, then it be all need to accept the fact that if need, if it needs to scale, we have to use a framework which can be widely accepted and widely been understandable. That’s that’s what it is.

Dave Mariani: So, so, so, so you’re, you’re, you’re diving into one of the topics I wanted to talk with you about, because there’s this whole argument between the industry, the Hamus around lake house versus data warehouse. So you kind of touched a little bit on it, but what’s your, what’s your take on that argument of lake house versus warehouse What would you do

Deepak Prasad: So, according to me, business drives everything. So at the end of the day you wanted lake house, or you can run with data warehouse, it’s dependent on, what your business wants. It shouldn’t be, you shouldn’t be chasing targets here. I saw, I saw people, there is a famous, I didn’t get this name, spontaneously, famous, data engineer who works in Airbnb. he writes about the architectures that they would implement in Airbnb for things to happen very fast. and almost CDC, as soon as it happens in the data base, it should be there in the data lake house. And then it moves to the data warehouse like spontaneity. But my question to all of them who wants to implement the Yahoo infrastructure, or any, ad BNB or Uber infrastructure in their organization, that’s their organization, turnaround time.

Deepak Prasad: same as Airbnb. I watch a movie today in Netflix and tomorrow I open the flex, it gives me multiple recommendations. So just take your own business, let it be banking insurance, or you’re running stock market. Like, would you, do you want to record each and every ticks in the screen, in your database And then you have to send it your Lakers and then you have to curate it. No, we have to understand VAT. We stand what we do. What is our end user wants What is our actual turnaround time And that’s why I always say, don’t start with the solution. Start with a problem. What is the problem you’re trying to solve When you come with the architecture, whether you want a data lake house or data warehouse, stop with the statement. Okay. I am going to build a data base. It can be a lake house or warehouse where if a person wants data in my organization, I wanted him to get data in one heart without any filtration.

Deepak Prasad: So take that’s the statement. You have to lock it in. And then they say, okay, for a person, if you want to request a data, how is going to request for the data If he wants to get the data in one heart, how I have to orchestrate my solution so that he gets the data in one are filtered, curated, enriched, so that he does just have to do select and start downloading the data and get off or in Excel, just click, refresh and get off. So we have to start with the mission statement of what we want to implement, and then go backwards. Not with like, we, we cannot start with other way where you have data lake house, which is a mix of data, bad house and the data lake, which is what you’re asking. So according to me, I see this in a heterogeneous place where you have both unstructured and structured data, but for some people they want an unstructured data in the raw form to run their train and test the data.

Deepak Prasad: Scientist would touch the raw format and you wanted the business who wanted doesn’t even want it to write a giant query, connecting the dimensions and facts, what they wanted to do, run a single query, which gets all the attributes they wanted in the aggregation they wanted in one goal. So, in my place, wherever I am, they had three layers, which is an extract transform load layer, then people weren’t using at all. so I changed the name, honestly, when people started using, so I changed it to silver, gold and drill diamond slash diamond, whatever you want them. So then people started asking silver means water. It’s just direct replica from the data, our house. So the layer inside the BI that I had, and then I had transformation where you have a proper transformations rules implemented that you say, oh, zero means male. One means female. One means a small two means medium three means large. So you have all the transformation load in the transformation.

Dave Mariani: Yeah. So it’s, yeah, they call me, they call that the medallion sort of architecture, right. Or if where you could do a, a bronze, silver, gold, you know, Databricks talks about that all the time. and, and so I’m with you that, you know, I don’t think it’s an either, or I think you need to have a data lake because that’s your landing zone for your data and then your data warehouses for your curated data where it needs to be. Right. And it needs to have a certain SLA. So, it’s not like you have one without the other, you need sort of both, you need to have the, you need to have the right tool for the job. So I like what you’re saying about, that you start, you know, you start with the use case, you start with what the business needs, and then you work your way backwards. You figure out what technology, is going to work for you. So you’re not just always going to go plug in a snowflake and click res. So you’re going to figure out, you know, what’s, what’s best for that business use case.

Deepak Prasad: And I’m very sure, maybe if we are doing our other podcast next year, we will be talking about Kafka, mark, my words today, everyone. the men, everyone wanted to, implement Kafka in their organization, not only for the edition, but also for the transformation side. the being that Kafka was always on the books, what, the moment Apache release, they a framework, but what’s happening now is conflict coming and saying that we will make sure the way you implement Cafco is all augmented properly. And they are giving a governance layer that says, we will take care of the streaming, the producer and consumers on how you scale. you just write a piece of code. We will take care of the containerization and how it scales. So I’m very sure Kafka is that, is coming to take over, streaming the data streaming, which is like CDC, basically when it comes to streaming transformation as well. It’s transformation is always batched. we have to agree most of the places it’s batched that’s because we need the history data some of the times to, to compare and run an aggregation and things like that. Imagine if Kafka can do with, streaming, in transformation layer as well. That would be a big thing I’m looking forward for that.

Dave Mariani: So Deepak, you know what, like a lot of people think of top gun, they think about real time is real time really a thing, or is it like a specific edge case Is it an edge case or is it, is it, is it the use case I mean, in other words, is it an edge case or is it as it, is it right there in the center and your opinion

Deepak Prasad: That’s very true. so that’s, that’s why I started the call, right So when I said, well, if Netflix is implementing, it’s fine, let them implement because they have a use case. Do you have the use case That’s, that’s what it is all about. Like, we have to see the use case. We have to see the turnaround time. We are, which industry we are setting. Let’s say insurance, like no one is going to care. Like if you, if I submit the claim, I have to be to all have to be done in another five minutes, no one is going to ask for you. That is the turnaround time, 24 hours and still make it. So if it is 24 hours, if my operational team can see my data in one, not two hour should be fine. They can prioritize based on the risk rating, this, that all we are still fine.

Dave Mariani: Yeah. I’m with you on that. I think that it’s, to try to do everything in real time, it creates additional burdens and complexity that you probably don’t need. And can’t justify, of course there’s use cases where you need to, but, you know, you could create an Yahoo. We had what we call an off ramp. so, we had our data pipeline, the off ramp was real time. So you could basically sip from that sip from that stream, but that was, you know, that was data that wasn’t guaranteed. And it was a, and was definitely not going to be of the quality that you would get from the data pipeline once it, once it got to its final destination. So I liked that idea. so, let’s, let’s, let’s talk a little bit about, the modern data stack, D pocket and get, and get your opinion on this.

Dave Mariani: you know, there’s for a lot of data engineers, they’re just DBT breaks. They love DBT, in love, you know, loved the declarative, ETL writing sequel and, and using that as a framework. And they just recently embraced the monster monster plant, run monster round. And they’re talking about now about, about, about introducing a semantic layer. And of course a semantic layer is definitely something close to my heart. so w why do you think people are talking about this And, and why do you think it’s taken so long for people to start to have this conversation about semantic layers

Deepak Prasad: So it’s two things, right People thought they can fix data and they can fix everything. Then the problem is they fixed data, but they couldn’t fix people. People started taking the, all the Legos that they got, they have access to, and they started building their own buildings. So someone built Porsche out of their Lego, someone built, that like, emperor building out of the Lego, that they got someone built a dam bridge whatsoever. Still people had options to create their own structures with the data that they have. Right. That’s a problem. One of the problem, but business, wants people to consume data in a bay that would benefit business. They could have a proper tight control and governance over that. That’s when the semantic layer talk started. Okay. We were past 20 years trying to solve a big problem in data would said, oh, we will bring on the quality, because this is all based on the data that the quality is not that in data additions would go wrong.

Deepak Prasad: So let’s fix the data, let’s fix the data. And they were fixing the data and still, they are not seeing the addition because one guy is downloading the data in the sales team and working on all the formulas, lookups in his own power BI or click or Tableau or Excel. And the other day is downloading the data. He is doing his analysis in this way. And, and the main problem is formula for calculating the sales that the guy is sitting in the sales, the formula for calculating the same sales, the guy who’s sitting in marketing person matching, although the data was correct perfectly. All right, because if you run the query here, 3000 rows, if you run the query that 3000 rows people thought you solved the problem, oh, I’m getting 3000 rows in both the platforms, because I’ve clearly articulated my consumption layer, which is the last way of getting data from.

Deepak Prasad: So we solve the problem, but again, people let, the adoption, right So the way, so it’s, it’s not always you, even if you build a beautiful road, with no potholes and everything, but it’s a long road, people always take shortcuts. People are good at taking shortcuts. So, you have to come up with something which says, this is the right way, and this might not be the, this might not be the right way for you, but this is the right way for business, for us to operationalize things. So that’s been the thought, okay, why can’t we have a good control of how my data is utilized Right So if I take my kid to the beach and put them on the sand, and if I want him to come up with the shape, which looks like crap, and give him a free hand, he will, he can grab a hundred different ways.

Dave Mariani: If you give him a, if you give him a mold,

Deepak Prasad: You can give him a mole. Or, and if you say that is the crab shape, imagine that hundred crabs that’s key, gonna come up with all, going to be uniform looking, same and same number of legs, same number of eyes, and same number of class. That’s what semantic layer is going to do.

Dave Mariani: I love it. I love, I love that. That’s a great analogy. The crab analogy, the kid at the beach. so, so, but so back to the primary question though, Deepak. So why has it taken so long Is it because we tried different approaches We thought the BI tool might do it. We thought that data warehouse might do it. And now we’re sort of like realize that that’s neither place is the right place for the semantic layer to live, that it needs.

Deepak Prasad: All right. So that is our understanding problem also. So, which are, which is what we were talking like semantic layer. According to me, if you say, Hey, the deploy, the semantic layer for your organization. So people do it. So the way I define semantic layer and the way you define semantic layer might be different. I would say aggregation based on the region country, based on the state level postcode level, get all the aggregation and get the cube built. That might be the semantic layer for me. And you would say, no, don’t build the cubes so that that’s no way people can be controlled and say, this is the right way. And one way we can achieve it is by, start facing what we are talking in technology.

Dave Mariani: So

Deepak Prasad: Uber get to the Uber, someone said, this is the way to book a cab so that it arrives on time. So you have a technology, it just asks you to click a button at the time that you wanted a cab. So it was V it was looking very well on a paper, but when it goes to the technology and it’s been implemented in a mobile app, it forms the industry standard. And then all our came and then D D came and then many, a low cap came. Many apps came that became, that becomes the standard. Someone has to get out to the table and say, guys, some, you build a semantic layer. If you want a challenge, let’s challenge and move on and build something new. So for example, you at scale came up with semantic layer and DVD. Now, I think they’re going to orchestrate everything using the Zynga framework.

Deepak Prasad: So when they write SQL queries, they are living impression in each SQL queries, which, which stays a ref, this ref, that rough test. And then, so that, that will give them a clear lineage of how thousand tables become my 200 final data warehouse, dimension, and fact tables. How my thousand tables there became my 200 tables. If they have a clear lineage of how this happened, this would give them a clarity on lineage, then come to this problem. So semantic problem is, can be sold. One only if we go a field level, not in the table level. So, which is the thing, right So all the lineage that we had before, two years, three years, or all table level, now our client is coming to the party and saying that we have done the lineage, but in the field level, okay, now I have the clarity that, okay, how am I

Deepak Prasad: Sales is calculated from my source to my consumption layer, clear cut, but still, if someone, what about the Riverside Right What if I can create once I have my data and there is no governance around, if I change the definition, is that an approval mechanism So according to me, margin is calculated by sales minus cost price, and this, and I’ve calculated suddenly businesses coming and saying, guys, we are getting rid of the cost price and the price to make the final product, because we are buying from someone else. So take that away. So if someone had just a formula again, after the business direction changes, who is that to approve the numbers, approve the calculation. So that’s what semantic layer is going to Arab now. So semantic layer, I wouldn’t say, I wouldn’t say Medicare is done yet. We are all going to make it better together. That’s how I see it. But, DBT getting into semantic layer is the right thing. And at scale doing, semantic layer, is also the right thing. Let’s see, like we need competitors that’s when we grow together.

Dave Mariani: Yeah, I agree. So Deepak, you know, when you brought something up here about, just about, about who about control and control over definitions, like if you have a semantic layer to me, the Samantha layer is a foundation where you can actually, you can, you can consolidate your definitions. You know, you have one place to actually put them and serve them. And that’s really important. That’s, that’s obviously what we do and that’s an important piece of the equation, but then there’s the workflow around it. And, you know, there’s been a lot of talk about, about hub and spoke models, data mesh all about sort of decentralizing creating these domain, owners of data and, and, and, and, and being able to decentralize your analytics management. So what’s your take on this sort of like, talk about, about distributing that the role of, of who gets to define those definitions and define those data products, decentralize that and put that into the, into, other parts of the business.

Deepak Prasad: So people always tend to, relate sexist, right Success is something drives the next blue. So if, if people are seeing success in something, they tried to implement in a place where they haven’t tasted success. So this is the mindset of people. So getting an organization into a product mode has given success for the past three years, everyone was working in agile guy and everyone was building a product and they were able to see the product in agile, and it was all working. Then they thought, okay, white card B bring the same product mindset to data side, instead of saying data, what can we say We do what product to make sure we deliver an analytics that is connected to an action for business to make some decision and do some actions perform some action, right So instead of just saying, oh, give it to the dashboard, give it an a, give an insight to the people, to make decisions of brand all.

Deepak Prasad: so take all the dimensions. We have take all the measures. You have run all the permutations and combinations, tell them which state, which product is doing better, which state, which product is not doing better in one goal, because people can’t just click dimension all the dimensions and measures and go to the final insight. So they said, okay, this is what we started the meeting with, right. Come with the mission statement. So I, I, as an organization wants to give data to any people in my organization or an external organization in this timeframe, this is fixed. Then you spin up a product. You say my product, once it’s done, it’s going to curate data, the mindset of people in my domain who understands their data, they will give you, we will make a proper ownership. Ownership is what it’s an abstract concept. According to me, when it comes to data mesh, it’s not technology, it’s an abstract concept with says you own the data. So you tell me what is the quality you define the quality measures on it. You define them, calculation for the data and you drive it. So that’ll be, let’s say that that is an organization. let’s take ad BNB. So instead of saying, they take all the data. So sales team will be taking and forming their own domain,

Deepak Prasad: Although our

Dave Mariani: Advertising, advertising,

Dave Mariani: yeah, you have billing would be a separate domain. It would be, you know, it would be, you know, basically the, the, the, the website experience would potentially be a separate domain, and driving user engagement. So yeah, you can definitely see that those different organizations, the data is very different and they obviously understand the data. So they’re probably the best to create those data products, but they do need standards. Don’t they to, standards for how to actually create those data products. And so there’s, these still needs to be some sort of a central team to actually define the standards and maybe pick the tools that people are creating products in.

Deepak Prasad: Yeah, that’s, that’s what do data stewards from the business They will say when I was blessed to work with one of the best business people trust me, they, we as enough ETL engineers or a BI consultants, we think we are smart. They are 10 times smarter than us business. People are 10 times

Dave Mariani: Stronger than completely agree with you on that.

Deepak Prasad: So once, once upon my time, I have an experience that I say this number is correct, because I ran the query. I put the filter, I wrote the transformation, and then I run the same query in multiple base. And I’m getting the same answer. So this is correct, that guy sitting cool, and his chair back, laid back and looking at the number and saying, no, you’re under numbers off. You’re wrong. And I, I was puzzled. Like I have the data I have, I’ve got all the requirements correctly, and I’ve applied that equipment assets it’s been tested verified, but this business guy is telling it’s wrong.

Dave Mariani: So, yeah, like I said, this Deepak before on this podcast, but it’s, it’s easier. It’s easier to, to, to, to teach a, a business person about the technology than it is to teach a data engineer about the business. I mean, it’s, it’s just, there’s too, too much of too high, too, too, too tall of a wall to climb there. so you got to find a way to empower the business, to be able to create their data products, in my opinion. But you know, in a way that’s going to work and it’s going to be shareable so that you can combine your advertising data with your, you know, your, your website data and your web traffic data. So you can correlate together and then create composite products. So I think it’s really important to, have those data stewards, but still have those sort of technology underpinnings that allow those, those stewards to combine their efforts, to create all new data products that are, that are composed of each other.

Deepak Prasad: So this is what happens when you change the name. You said data, no one was listening to change the name to big data. Everyone started listening and you changed the data. You said data warehouse and data lake, no one was listening. And you changed the name to cloud data platform and everyone that’s listening. And you ask people to introduce business to data. Well, before you start the journey, no one listened. And then you name it, give a name, saying data, mesh data fabric. And then you said, what is data mesh That means you have to domain your data and you have to involve the business, get the roots from them, make them the custodian, make a product ownership there, and then you do it. And then, yeah, everyone wants to build a data mesh. I think this is another way of getting people on board and doing right things in the right way, but you need some fancy names around to support all this theory.

Dave Mariani: You know, I can tell you, I definitely don’t like the name data mesh because to me, it sounds like it sounds, it sounds like the Federation I try to use hub and spoke is, you know, I get, it’s like, you know, it’s, but it’s, it’s the concept of decentralized management where you have domain and, and data stewards, domain owners and data stewards. That’s a powerful concept. And I think that’s an, that’s a people concept as opposed to a technology concept and data mesh sounds like a technology concept and it’s really not. And so that’s my one, that’s my one beat.

Deepak Prasad: It’s

Dave Mariani: People, it’s people not technology. Okay.

Deepak Prasad: That’s why I started, when defining data mesh it’s abstract. It’s an abstract, it’s a con it’s a framework that makes sure you walk in the rails correctly to reach your destination. It has its own rules. Keep your data, own your data. It’s your data. You have to take care of the quality, brings the ownership into the equation and it takes you to the next level. It’s just a concept. And again, data mesh when you, when you do it with the proper technology, that’s when it happens and the framework would be coming along with the technology. Right So just imagine Uber app given to you, you can customize VAT to keep the book now button you can customize, you can move around the cars in the map in the way you wanted, what do you think, what will work in the way that you wanted to know it Wouldn’t you move the car from the map from one place to another, but actually car will be moving in some of the place you’re basically disturbing the system. So this is what

Dave Mariani: Let’s bring it back to the crab example there.

Deepak Prasad: that I always say to many people when they talk about the semantic layer or anything.

Dave Mariani: No, I love it. did you pop, we could, we could, we could continue on for, for another, couple hours here. but, I’ve taken up enough of your time, but, I wanted to start, I always like to ask my guests a closing question, which is, is, put your future hat on and think about, tell us a little bit about what you see coming up and the way you think data and analytics are going to change. And, you know, in the next five to 10 years, what do you, what do you, what do you think w what should we expect

Deepak Prasad: So, I’m expecting the boring part would be all automated that, so when I say the boring part, there are certain quality rules. Let’s start with the data quality rules, right So data quality rules are still, trying many, many technologies. Many companies are trying to scale the park because 60 to 70% of the data quality rules, according to me can be automated 60 to 70% age of things can be automated in ingestion as well. It’s always build versus buy decision. So no one wants to go to the build by addition, because they thought they have to continuously pay for the technology. But what they are not understanding is it’s okay. If that is a technical debt that you create, when you build your own technology by yourself and fixing them again. And again, it’s better. If you buy something with solidly, perfectly working, that’s fine.

Deepak Prasad: So when it comes to build versus buy, I I’m thinking, many people will move from build to buy and go start paying us M a X, a S. That’s what I tell it as everything as a service is better in that way, but at the same time, you need to have portability as well. That will make sure the guy who you buy from will create uncertainty. For example, let’s say I am. I have all my data in snowflake. And you have, suddenly you from Google is pitching to me and saying that you have everything in snowflake, but trust me, I can create something in between value can swap from snowflake to Google cloud without zero disruption to your business on day two. So portability is one of the important factor in the coming 10 years or 30 years, we going have it, the one who promises portability is going long way.

Deepak Prasad: That’s how I see. Point number one, portability point number two is scalability. So when we, talk about scalability, many people say the horizontal scaling, and, also many people talk about, on demand, right on demand scaling that snowflake has because everyone has now successfully decoupled the compute and storage and the way they wanted, that should be enough transparency in that. Everyone is trying to bring transparency. Someone is still saying that I can charge you by our, I can’t charge you by second. I don’t have that. So people need transparency in scalability, as well as they should have the lowest level of detail in scalability on, okay, how many computers are running How many nodes are running for non-technical people. It’s okay to talk about medium, large, extra, large, and extra small sizes for technical people. It should be programmable. That’s another thing too.

Deepak Prasad: I mean, just imagine, have like before five years, when we want to spin a node in a cloud or anything, what we go and do is you choose what Ram you wanted 32 GB, or how many calls I wanted. You make a tick and what, what VM I wanted, whether I wanted a windows or Linux, you make a tick, tick, tick, an apply. And after five minutes, you will have a VM value you can log into and do things. But think, imagine now a piece of code is creating a VM in real time. And still, if you think in olden ways, that’s, that’s, that’s absurd. Like still, if you get the infrastructure, as a service can be done completely in a piece of code, then what you, where you wanted someone to click and do things. So you have to change. They have thinking one such best example that I can give for you is, technology designed towards storage.

Deepak Prasad: I mentioned modeling one of the myths that people always talk may not, it’s not a myth. One thing that people always talk about as a storage, the amount of storage you will take, if you flatten your data and do everything that would, men now, you would have to pay a lot of bill for the storage and things like that. Now, storage is peanuts. You don’t have to do any optimization for storage. If you have to do some optimization, do your optimization in compute layer, variable, pay more, not in the storage layer. So that’s another piece.

Dave Mariani: Yeah, that, that, that’s awesome. So I hear portability scalability and, our, our sort of, really, really key there and, and focused on compute. yeah, I, I couldn’t agree with you more there. And like a lot of those, like, you know, that’s something that snowflake did pretty well, right They were the real first to separate compute and storage. For real, not just, not just clustering with a shared nothing where you combine, you know, disc and, and compute on the same node. I mean, they truly did separate it. And a lot of people thought that was crazy, but it’s a model that works pretty well.

Deepak Prasad: And for the hub-and-spoke, we are talking Dave I’m, I’m not able to think of anything other than the, computer and storage difference. men decoupling them have been spoken with perfectly book. So when we are talking about data mesh and what does the technology that I can make data mesh work, I can think of Azure synapse. I can think of snowflake because they have successfully decoupled. If I give you my data warehouse and give it to 10 different departments who have the 10 different cost centers, and they can, they have to pay for what they use, because I can show them the billing. I can exactly show sales. Men sales literally used 12,000 credits marketing use 13,000 credits. Here is your bill. Let’s cross charge and other things, no flood that I think it’s, it’s, it’s good. It’s kind of a silent revolution. According to me, that’s one of the biggest, innovation that snowflake did, which has shattered. So when your data sits, in cloud, that means it’s some bad outside, but it saved secure, rested amendments, data security at rest in motion. That’s all fine. But imagine all the computes that you have done, and you have made your data from brown status, move to silver and gold. And if when you share data from gold layer, just remember the amount of computes that you’re saving for an organization to is getting data from you.

Deepak Prasad: Just, just that, that piece, right That wasn’t remarkable piece. That’s going to change the way people think about the data. And instead of me getting the raw data from organization one, I’m doing everything from the scratch. It’s waste of time.

Dave Mariani: What

Deepak Prasad: I need is a goal, deep dive data basically shot from other. And if I do my computes with whatever, the shared data that people shad for me, I’m going to pay for the compute plastic on paper. That was the best innovation. According to me, trust me, men. When I explained the concept to my mom, I always explained concept to my mom because my mom

Dave Mariani: Is your mom’s eyes glass over. When you start to try to explain that person she’s is she really that interested Are you talking at her

Deepak Prasad: No. No. I, I, I know when I go to a main thing, so my mom used to ask, like, what is she a seriously asked me What is snowflake And then I told, imagine if someone is getting, met with an accident on the road, usually what happens is when someone has met on accident, he’s been taken to the hospital. He go through the pain, everything, and then after 10 days, he will reach out to the insurance company and say, please process my claim work. Otherwise, hospital is not letting me outside. And then insurance starts the paperwork on day 10 and finish it on day 12. And then they have to make lots of calls. They say, no, I only agreed for this. And all the mess goes on for the patient. Who’s already a patient going through the pain, go take a call and speak to insurance. Just imagine the same scenario it’s snowflake. What if a hospital was taking a patient he’s sharing data with an insurance company That’s, that’s exactly what would happen. Like they tend when the patient picks up the phone. If insurance company says, Hey, Dave, I know you are in hospital. Don’t worry your playing us all process taken care of. You can be discharged tomorrow. That’s the experience we are talking here.

Dave Mariani: So I love that deep it’s like portability, scalability, data sharing, confidence to share that data because you have data quality and those data stewards, I’m completely with you there. I think,

Deepak Prasad: That’s,

Dave Mariani: That’s I think a semantic layer is a, is a terrific tool in that stack to deliver on that, on that value prop. So, thank you, you know, deep pocket, this has been awesome. and you are a wealth of information. This has been a great conversation. we gotta keep, we gotta keep talking. Yeah. And anything you want to say to the audience before we, we sign off

Deepak Prasad: Yeah. So don’t be afraid of automation. It’s all. Please embrace automation. Please embrace the change that we are all going to incur together. if automation is coming, that means that we have wiser things to do. If something can automated. If we, we have been lifted up things that we are wiser to do. So please take the second part and say, let the things that can be automated, be automated. Let me do the things that I can do better than the automation choose the path. We will be all growing together.

Dave Mariani: I love it. I love it. Deepak Prasad thank you so much, to the audience out there, be data-driven stayed, stayed data-driven and thanks for listening.

Deepak Prasad, Principal at NSW on Data-Driven Podcast

Meet our Guest

Deepak Prasad

Meet our Host

Dave Mariani

Transcript

Be Data-Driven At Scale