June 4, 2020Data News Roundup- Thursday, June 4th, 2020
Business Intelligence (BI) is a term that means many different things to many different folks, a definition I came across from cio.com is “the activity of leveraging software and services to transform data into actionable insights that shape and inform an organization’s strategic and tactical business decisions.”
Self-service BI is defined by Gartner as end users designing and deploying their own reports and analyses within an approved and supported architecture and tools portfolio.
Benefits of Self-Service BI
The benefits of self-service BI over traditional BI include the following:
- The reduced lag time when users can answer their questions and follow-up questions
- A crucial step in creating a data culture in the organization
- Reduced reliance on overworked IT functions
How does Self-Service BI work?
Organisations have been striving for self-service BI for decades. I would posit that the objective most organisations had when designing and implementing traditional data warehouse architectures was really to support BI activities, and furthermore to support the business users being able to access data easily without having to go off and find the data in operational data sources, and then figuring out how to join this data together in a sensible way.
Over time, various normalization techniques were then adopted in the data warehouse to eliminate data redundancy, to avoid data anomalies, and to ensure referential integrity amongst other things. This led to data models that whilst super efficient, became unwieldy and unnecessary complex for the average business user who just wanted to understand how his business was doing. Furthermore, once a business user had spent time and effort crafting the correct query they would then experience performance issues due to having to process complex queries when they ran a report, or viewed a dashboard.
This then led to the creation of data marts, where essentially a subset of the data in the data warehouse was brought together to support answering a set of business questions. This was done by the IT or data function at the request of the business users, so they could be freed up from crafting complex queries and actually do their day job.
This approach was great until a business user asked a question that could not be answered by the data mart, after all the data mart is a subset of the data. At this point the business user would make a request to the IT function to modify the existing data mart or create a new data mart.
At this point the concept of a ‘cube’ comes into view, the idea that we have a set of dimensions that map to the business structure, and a set of metrics that map to the things that a business user would want to measure. This cube would contain aggregated data that could answer many business questions without the need to be reliant on the IT or data function within an organization. As this cube is pre-built the answers to the questions are available almost instantaneously. But again what if we want to ask a business question that requires access to the atomic level data (that’s not in the cube) or what if we have a question that the cube was not originally designed to answer? Then it’s back to calling the IT function.
The advent of self-service tools attempts to address this. Self-service tools will typically generate the SQL code that needs to run, so there is an argument that we have removed the complexity of the accessing data in its atomic format, and that may or may not be true – our business users will now have to have a deep understanding of the data model. But running complex queries that execute many joins and perform aggregations when the business users click the button will not result in the instantaneous query responses that users of cubes are used to, no matter how much hardware is purchased or how much money we are willing to spend with our cloud provider.
There are approaches to avoid this with some self-service tools, which will run a query on the underlying data warehouse and drag the data to some local BI server or cache and then query from there. But to move this data we need the IT or data function, for various reasons not least governance and security, and again what happens when we have a question that cannot be answered by the data that we have moved?
I would suggest that for business users to have self service BI they need to have the ability to use the tools that they want to, to access data the way they want to, without needing to move it, without needing to pick up the phone to a member of the IT or Data function, in a cost efficient and performant way.
This is where AtScale comes in.