Data Lakehouse with Bill Inmon,”Father of the Data Warehouse”

Data-Driven Podcast

Listen to Bill Inmon discuss the future of Data Warehousing and the importance of Data Lakehouse as self-service architecture to operationalize valuable information that’s historically been tied up for Enterprises. He shares his thoughts on Ralph Kimball and the difference between Data Marts and Data Warehousing, and the next step in the evolution of data architecture with a Data Lakehouse.

See All Podcasts Button Arrow
Quote icon

My perspective is that data architecture is like an ever-evolving river. It’s like the Mississippi River, the Mississippi River from one day to the next is never the same. It’s always changing. The same goes to data architecture. What’s happened is Data Warehousing applies to structured transaction-based data. That’s really the heart of data warehousing, but there’s other data in the corporation that’s viable and important data as well.

I’ve never had one bad word with Ralph Kimball. I’ve known Ralph personally and worked with Ralph personally on many occasions. Now, I will say this much, Ralph understands Data Marts, and what Ralph was writing about was Data Marts, not a Data Warehouse. And from a structural standpoint, there is an architectural difference between a Data Mart and a Data Warehouse. I’m looking at the data lake house as merely the next step in the evolution of data architecture.

Transcript

Dave Mariani: Hi everyone. And welcome to AtScale’s Data-Driven Podcast. Today, we have a very special guest, Bill Inmon, the father of the data warehouse. Bill, thanks so much for joining the podcast today.

Bill Inmon: It’s my pleasure to be here.

Dave Mariani: Great, great, well, Bill, you know, everybody who’s been in data and analytics knows of Bill Inmon and all your, writings and all your research and what you’ve done for data warehousing over the years. me included. so, I’ve read your books and have lived and built, built companies, around your concepts. So, you know, bill, I think that, myself and the audience would just love to hear how you got involved in data and analytics, you know, what, how did it all, how did it all begin for bill and then,

Bill Inmon: Well, Dave, it’s a strange story, but I’ll be happy to share it with you. When I was 10 years old, I picked up a golf club and I fell in love with golf and I wanted all my life to be a professional golfer. I was a good golfer. I was never a great golfer, but I was a good golfer. And, when I got out of college, I tried for a year to make it on the PGA tour. And I found out then that I was not good enough. And I had, I, I tell people I had my own midlife crisis, very early in life, which, which is a certain amount of truth because, when I was 28 years old, I quit golf and I never, never looked back. I had the honor and distinction of playing with golfers.

Bill Inmon: You would know, I played at the same golf course with a guy named Lee Trevino and, and, and, I knew Lee Trevino before, before he was anywhere near famous. He was just an assistant professional at a place called Horizon Hills country club in El Paso, Texas. And I was a high school student and, my best friend, his father was friends with Lee. And when I was in high school, I used to play with Lee on a regular basis. And, Lee, he, but first off, he’s a really nice guy. I mean, I mean, I mean, a genuinely nice guy, but, I was a good golfer. I was not a great golfer, but I was a good golfer. And, I’ll tell you one quick little Lee Trevino story.

Bill Inmon: The last time I played with him was at a place called El Paso country club. And we were playing the 18th hole and the 18th hole is surrounded by water, and it has a very high hump on the green. And Lee was about a hundred and I don’t know, 150 yards off the green. And, and he said, well, he says, I’m going to hit this shot. it’s going to have a lot of backspin, it’s going to hit the green. It’s gonna, come down the green, it’s going to break to the left. And, and this is 150 yards out. He almost made the shot, he alone. And, and, and I thought about it, you know, in my wildest imagination, I could never, ever hit a shot like that. I mean, I mean, I mean, his level of skill was beyond anything that I could even imagine.

Bill Inmon: So, so anyway, I tried for a year, I was a professional at a little golf course in Louisiana and, and, I got tired of starving to death. So one day I was looking in the newspaper and there was an ad for programmers. Now, back in that day and age, there weren’t many people that were programmers, and I had taken a couple of courses in college on programming, and I had very basic skills in programming. I went and applied for the job, and that’s how I got in. I just plain got tired of starving to death and, and, that’s how I got into, computers. And I got into, into, computers as a programmer, a COBOL programmer and, and, cobalt. You don’t hear that word much anymore No,

Dave Mariani: I have no COBOL. I remember COBOL is one of my first languages I learned.

Bill Inmon: Yeah. And, so, I was a football programmer and I started in Shreveport, Louisiana, at Western electric. And, and then I took it from there.

Dave Mariani: So, so you, you, you sell yourself short. If you were on the PGA tour, bill, you’re pretty good. You were pretty good at golf. I mean, there’s that, that’s an elite, an elite group of people, so that’s pretty amazing. So,

Bill Inmon: Yeah, they, the, I can tell you, we can sit here and talk all day long about it. I was a good golfer. I was never a great golfer. Let me tell you something, those guys you see on TV. Every one of them is great. I mean, every one of them is great. And, I was never, never a great golfer.

Dave Mariani: Well, bill, did you, I mean, did you have a background in, in, in software or computer or engineering or, you know, how did you, how’d you get a job as a cobalt programmer if your, your goal was to be a, a pro golfer

Bill Inmon: Well, when I was going to college, when I was going to college, I took a couple of courses, in, in technology and computers. And I mean, I took them because they were interesting and they were interesting, and I just kind of wandered back when, and I mean, this was in the 1960s, there weren’t many computer courses, there were a few, but, there wasn’t very much back in those days. So it was an accident.

Dave Mariani: Yeah. I also ended up in software as an accident. I was an economics major and just thought that writing basic on my apple two plus was a hobby. And somehow it turned into a job. So, that, that’s, that’s amazing. So you were 20 years old, got your job as a COBOL programmer. and, and obviously no databases at the time. So what was your, what was the first, you know, what was the sort of the first aha when it came to, you know, relational databases and how did you get involved in, in, in being a pioneer in that space

Bill Inmon: Well, I was working at the time for a company in El Paso, Texas El Paso, after gas, and they brought in IBM’s IMS, and I learned IMS and, and for whatever reason, for whatever reason, I have an aptitude towards data structures. To me, understanding data structures was as simple as falling off a log. And so, we all have different aptitudes and my aptitude happens to be, I am good at understanding data.

Dave Mariani: So, you know, at the same time, you know, you know, a lot of us also know about Ralph Kimball and, and you guys were sort of concurrent and, you know, had had, different, different approaches, similar, but different approaches to data warehousing at the time. So I’m just curious, Bill did, did, did, when you and Ralph, did you guys know each other, will you get, cause you’re in the same ecosystem. Was there any kind of like relationship there, or, or, you know,

Bill Inmon: I’ve known Ralph personally and worked with Ralph personally on many occasions. And, I used to have a conference. I invited Ralph to speak at the conference on many occasions and, Ralph, and I’ve never had one bad word between us, not one. And, I read some of these articles in the paper about me and Ralph Kimball. And you, you, you know, you’d think that there was something awful. There’s nothing awful. Ralph is a decent, intelligent, nice person. Now, now I will say this much, Ralph understands David Marks, what Ralph was writing about was data marts, not a data warehouse. And from a structural standpoint, there is an architectural difference between, a data Mart and a data warehouse. But, but, I’ve never had one bad word with Ralph Kimball.

Dave Mariani: I love that. I love that. So, and obviously it’s, you know, we we’ve seen, and we grew up in the, during the data warehouse, boom, and a lot of companies spend a lot of money on data warehousing and, and I happened to be in, you know, working for Yahoo at the time and data lakes and, and, and we’ve invented Hadoop. And all of a sudden that data warehouse was sort of out of, out of favor. Now we had data lakes and, you know, we threw all that out and now we’re back to data warehouses and data warehousing concepts again. So it’s really kind of crazy. So, you know, given the focus and you see companies like, like, snowflake and Databricks and, and the like, so bill, like, what do you think, what, what do you think went wrong during that first phase of data warehousing and then what do you think people are, are doing different today, to sort of not repeat the, I guess, mistakes of the past or the disillusionment of the past with the new generation of technology.

Bill Inmon: That’s actually an easy answer at the time when data warehouse first started, the large consulting companies were just ending their year two K experience. And then the large consulting companies were looking around for the next big thing to take all of those consultants and keep their billing rates up. And so that’s when data warehouse came along and I can tell you firsthand those large companies, and I’m not going to mention them by name here, but you know who they are, those large consulting companies, they didn’t understand data warehouse from, from, from, Kleenex and, and, and, but, but, but they, they would tell people, oh yeah, we’re going to build, they were afraid. And the things they built were no more data warehouses that Cal could fly. And I blame both the consulting companies, but more than anything, I blame the people that bought into the consultant company.

Bill Inmon: What makes you think that a consulting company understands what’s right. So there were all of these monstrosities that were built that were not data warehouses. And what’s interesting is, it’s data warehousing and got the blame and said, oh, we tried to build a data warehouse and it sailed, they didn’t build a data warehouse. David’s it’s like, it’s like you having your next door neighbor rob a bank, and then the police come and arrest you for Robbie. What a minute I’m not, they, you know, but, so, so that’s what happened. The, the, the hunger that drives a data warehouse is believable data. As long as people want believable data, that they can rely upon to make decisions. That’s what is behind the data warehouse. And, and that’s why the data warehouse has, has lived, to this day.

Dave Mariani: It’s popular again. So, so what, what are the, some of the things that, that enterprises and companies out there So what is a different, what is, what is, an approach that they can sort of do that’s, that’s, that’s, that’s not the hiring the consultants of the past. So what, what kinds of things do you see, have you seen bill that that seemed to be working, for this new generation

Bill Inmon: God bless the people enough. People took the time to understand what we were talking about and, and, and the, the transformation that has to occur to data. Thank goodness enough people took the time to understand how to build a data warehouse properly, that there are a lot of grills, success stories out there. And, and if it hadn’t been for those people, I don’t know what would have happened, but, no, thanks to the, I’m not gonna mention the name of the companies, but, no, thanks to the large consulting companies that raped and pillaged, their customers and, took their customer’s money, without knowing what was doing. But, there were enough people that took the time to understand the fundamentals behind the data warehouse that it thrives today.

Dave Mariani: Yeah. You know, I was, I’ve been involved in a lot of, I’ve seen a lot of projects that have failed, and it’s always sort of boils down to trying to, trying to boil the ocean, really trying to do everything at once and have it be the end all. And it just seems like, you know, without getting a win, you’re like, you know, spending two years on working on something, you have no feedback. And then at the end of two years, people are tired and they’ve done other things. And those other things are often worse than what they could have done. If we just, you know, they would have been more agile to begin with. So,

Bill Inmon: Never said, boil the ocean, never, you said, do it incrementally, do an iteration at a time. And, and the people that were successful, that’s what they did. And it worked well for them, but these people, I’m sorry, don’t get me started on all the failures.

Dave Mariani: Yeah. I’m with you. So looking forward to not boiling the ocean is that is the message to the audience here. you know, when, so, you know, you know, Billy, you’re not even close to being done yet in this industry. I mean, you’re still a prolific writer, and you just published a new book called building the data data that the data lake house. So building the data lake house, you guys should look it up for listeners of the pod here. so, so tell me, what made you, what made you write, this new book and, and talk to us about, you know, w what it means, what the lake house means and should mean to the, to the world and the audience

Bill Inmon: Well, let me tell you, let me tell you my perspective. My perspective is that data architecture is like an ever evolving river. It’s like the Mississippi river, the Mississippi river from one day to the next is never the same. It’s always changing. It’s something’s happening in Mississippi. And the same goes to data architecture. What’s happened is, okay, this is kind of involved. So if you, but I’ll, I’ll try to describe it. What’s happened is data warehousing, really, applies to structured transaction-based data. That’s really the heart of data warehousing, but there’s other data in the corporation that’s viable and important data as well. And so our friends at, mostly, Databricks, but our friends in the world have said, gee, there’s other data that’s out there. Let’s start to be able to use it analytically. And so the evolution of data architecture now includes the data lake house, which includes a whole different type of data. And so I’m looking at the data lake house as merely the next step in the evolution of data architecture.

Dave Mariani: Yeah. And so we, we, you and I have talked a lot about this, that, that it’s a, you know, being, being able to incorporate semi-structured and unstructured data in with what typically a data warehouse has been very structured, very rows and columns. and, and it actually introduces a new challenge because if you have semi-structured unstructured data, you know, you do have to apply some structure. We, you know, we’re, we always talk about the semantic layer. You can’t have a semantic layer if you don’t have some structure to the data. So, Bill, you’ve also been talking a lot about, about how to, how to take that semi-structured data or unstructured data and make it a, and make it structured. I think you call it textual ETL. So, talk to us a little bit about what that means.

Bill Inmon: Surely if you look at the data in the corporation, there’s really three kinds of data in the corporation. They’re structured data, textual data and machine generated data. And, there’s a lot of really valuable information that’s tied up in texts that have been overlooked for a variety of reasons, have been overlooked by, by the industry. And, what, what we have is technology that allows you to read the text and turn the text into a standard database. Once you’ve done that, you can then start to do amazing analytical processing against it. So, that’s what, texted all about. So,

Dave Mariani: Yeah, and I, and I’ve noticed that, that it’s, it’s, it’s particularly useful, you know, when looking at medical records, medical records, meaning doctor’s notes are, are, are absolutely unstructured, but there’s, there’s nuggets of gold in there that, are incredibly important. and I think your technology with textual ETL is able to turn that into something that’s useful, for analytics and machine learning. Yeah.

Bill Inmon: And I’m happy to report that, we’re doing some really major league projects right now exactly on that. We, we, we have every intention of changing the way medical research is done.

Dave Mariani: Yeah. That’s, that’s, that’s amazing. That’s amazing how far we’ve come. Well, you know, on a, on a, a related, but a little bit different topic. I mean, we have, you know, we have a war going on between, you know, the, the, the traditional or not traditional, but the new cloud data warehouse versus a data lake. so, you know, you know, bill, it would really help for somebody who knows both, and is a pioneer really in, in both, you know, what should people take away with this whole, is it, is it, is this a phony argument Is this really a real architectural choice, data warehouse versus a data lake house What’s your, you know, what’s your, what’s your take on this whole, kind of a little bit of a war of vendors going on right now.

Bill Inmon: And it’s, it’s a war that’s going to break into a bigger war. I, and, and I don’t mean to insult or offend anybody, but I’m going to tell you right now putting data, if you have a mess and you put your mess on the cloud, what do you have? You have a mess on the cloud. And there are a lot of people that think, oh, we put it on the cloud and that makes it okay. No, it doesn’t make it. Okay. all you have is transferred the mess elsewhere. So, I’m not a big fan of just putting something on the cloud for the sake of putting it on the plop. If you’re going to put it on the cloud, you need to clean the data up. And so I’m very reluctant to get behind people that put things on the cloud for the sake of putting it on the cloud.

Bill Inmon: Now, the other people that are out there that are building the data lake house, they recognize that there is, they want to clean up that mess. And, and, and so I’m, if I had to choose a, a horse in a race, I’d be very much betting on, the people that are facing the problems of cleaning up the mess, before, because, because you can put stuff on the cloud, they, and it doesn’t matter if you put it on the cloud. If it’s a mess, when it goes on the cloud, it’s going to be a mess on the cloud. And, and, and, I hate to say it, but I think our industry has seen so many silver bullets of a, of a, well, you just do this. And, and, and those silver bullets never work out. I mean, I remember, just go big data, man. Just put it in big data. And, and, and that makes it right. I remember, gosh, I remember, I remember way back when, this, this, the safety bag, but when people said, well, gee, now we have COBOL. Even secretaries are going to be writing.

Bill Inmon: We laugh at it now, but, but, but those, and, and then not to pick on IBM and DB two, but I remember says, gee guys, if you just put it in DB two, all your problems go away. Oh, that it’s laughable now that we actually took those people seriously.

Dave Mariani: Yeah. Look, it’s like, it’s, it’s, it’s always, I think we always overestimate the capacity of, of regular folks, business folks who just have their job to do, and the amount of time that they want to learn technology, to be able to make data useful to them. And, and I think we’ve done a really poor job over the years and swipe that’s my passion for why I started at scale is like, look, data needs to be usable by everyone. Not just somebody who is a data engineer who understands, or like you understand the data structures, you know, it’s like, if, if we do that, you know, we really, we, we, we really raised the bar to the point where only a few people can actually use data to make, to make decisions. and, and we don’t want that. So we want to make it approachable for everyone. And hopefully, you know, people are, are, are doing analysis and they only realize they’re doing it. It’s this there’s part, it’s part of their job, as opposed to, you know, doing what we traditionally call data, data analysis. So, you know, that’s what I think is, would be a good goal for people in our industry. Everyone is a data analyst. Get rid of the title, data analysts. Everyone is a data analyst.

Bill Inmon: And I think, I think you are absolutely correct. I mean, I, I support what you said 100%.

Dave Mariani: I love it. So we’re working through Brooklyn towards that end. So, so, you know, Bill, so, you know, just for some, some, some young folks out there, so you started out, you know, on the PGA tour and you ended up being the father of the data warehouse. what, what advice would you give, some folks starting out, for how they can really make an impact in data and analytics Or what, what kind of advice would you give Would you give people young people that get started in an industry like ours

Bill Inmon: I actually teach a little class in college and, and I love working with, young men and young women. I really do. And, and, and, and here’s what I tell them. I, I don’t want to demean doing the nuts and bolts of technology being the, being an IT person, but, but the, as far as I’m concerned, the exciting place in the corporation is where the decisions are being made and doing analytics do not doing database design. We’ve got to look, we’ve got to have people to do those, those things, and they’ve got to be done, right. But at the end of the day, doing the analytics is where the important corporate decisions are being made and, and, and, and, and trying to do, I mean, I’m, I’m a big, big fan of analytics. And, so if I had a young lady or young woman, sitting here talking to them, I’d say, you know, get to where you’re doing analysis for information about, about products, about new customers, about new revenue. Those are all of the good places to be.

Dave Mariani: That’s terrific advice. That’s great. So, bill, one, one last question, which is really about, predicting the future. So I’ll put you on the spot. what do you think is next for data and analytics What do you, what, what, what should we, what should we be on the lookout for

Bill Inmon: Well, now, now I have to admit that I have a prejudiced opinion. When you ask me the questions, I’ll give you my prejudice. The world of text in the corporation is untapped. It’s like California in 1848. There, there, there, there is this land sitting there, you know, you know, it’s interesting, I’ve heard stories. I don’t know if it’s true or not. You could literally walk into the streams of California and start to pick up gold. Now it may not have been that easy, but, but, but there was gold there. That’s where the gold was and it was there and you could find it. And, and to me, when you take a look at texts, there’s medical records, there’s corporate contracts, there’s data on the internet. There is hearing the voice of the customer. There’s email, there’s all of this land out there. This, this, this, this, this opportunity out there it’s never been touched and, and believe me, there is a great opportunity there. So, that’s, that’s where I think that the future is going to be. And, that, that’s my humble opinion.

Dave Mariani: And that’s, and that’s, that’s exciting because you’re right. you know, if you, if you probably look at the amount of data that we collect, it’s, it’s, it’s mostly texts, it’s mostly unstructured. and that’s what really is driving all the data we’re collecting. And it’s just sitting there and sitting there on servers on, on disc doing nothing. so turning that into information, turning that data into information. I love that, that, that is, that’s a, that’s, that’s a great area to explore. So, so listeners, you heard it from Bill, the man himself. there’s, let’s go and let’s tackle that problem. and bill, I want to thank you so much for spending time with me and, and, and, and talking about everything you’ve done for the industry and what you’re doing, right now and in the future. So thanks so much for joining us, Bill my pleasure, have a great one to have a great day. Thanks. Bye-bye.

Be Data-Driven At Scale