June 28, 2022Improving Data Analytics: Key Insights from Fifth Third Bank, Stanley Black and Decker, and Snap Inc.
This blog is part of a series on Microsoft’s BI and Analytics tools from Greg Decker, Vice President at a global IT services firm who leads a practice specializing in Microsoft technologies such as Dynamics 365, Power BI, and Azure. Greg is a Microsoft MVP for Data Platform. He is the author of five books on analytics and Microsoft Power BI. Follow Greg on LinkedIn, Twitter and on GregDeckler.com.
Power BI is a powerful business intelligence tool that allows users to connect to various data sources, visualize and analyze data, and create interactive reports. It also enables them to share content such as datasets, reports, dashboards, and scorecards. One of the key features of Power BI is the ability to import data into the tool for analysis, or use DirectQuery to connect directly to live data sources without the need to import data into a separate data model or dataset.
If you have worked with Power BI for any length of time, you are probably already aware of these two primary storage modes, import and DirectQuery. Yes, there is also live connections, but this is really a kind of special case that works similar to DirectQuery. The fourth mode, dual, is simply a model that contains both import and DirectQuery tables. But what are the advantages and disadvantages in using Power BI DirectQuery vs. import mode?
To quickly review, import mode means that data is imported into Power BI. It’s moved from source systems into a tabular data model stored within a Power BI Desktop file (PBIX). Import mode can be visualized as follows:
When users interact with an import mode report, the local tabular data model is queried directly. This enables fast performance for data retrieval and measuring calculations. However, import mode datasets require that data from source systems are refreshed at set intervals such as every day or every hour.
Conversely, with DirectQuery, the data remains within the source systems. When users interact with a report built using DirectQuery, Power BI dynamically retrieves data from the source system(s), translating the data queries into native source system queries, such as SQL queries. DirectQuery can be visualized as follows:
If you have been around long enough, you probably recognize certain parallels between import, DirectQuery, dual storage modes, and the different modes of On-Line Analytical Processing (OLAP), multidimensional processing (MOLAP), relational processing (ROLAP), and hybrid processing (HOLAP). In fact, all that has changed is a move from multidimensional cubes to tabular cubes. There’s also a direct correlation with:
- Import ~ MOLAP
- DirectQuery ~ ROLAP
- Dual ~ HOLAP
Power BI Direct Query vs. Import Mode: The Perks
In theory, leaving the data within source systems — or within a single “golden” dataset — and having multiple reports connect using DirectQuery sounds like a great idea.
First, there is no need for scheduled data refreshes since DirectQuery always brings back the latest data. This means that DirectQuery allows you to work with data in near real-time.
Second, since the data stays within source systems or a single dataset, storage costs are decreased (as duplicate copies of data are not created). This also aids in data governance and security.
Finally, DirectQuery is ideal for very large datasets since import mode datasets are limited to the amount of memory available. In contrast, DirectQuery has no such limitations with regard to data size.
With all of these advantages, one might expect that DirectQuery would be the recommended storage mode, especially for enterprise-level data and reporting scenarios. However, Microsoft’s official guidance on the subject recommends using import mode. In addition, when Brett Powell wrote the first edition of Mastering Microsoft Power BI in 2018, he too recommended import mode and when Brett and I collaborated on the second edition of Mastering Microsoft Power BI in 2022 we again recommended the default choice be import mode datasets. Why?
Power BI with DirectQuery vs. Import Mode: The Challenges
While DirectQuery sounds great in theory, it suffers from a number of disadvantages as well. Chief among these is query performance. These performance issues mirror the performance issues that ROLAP also suffers. The reason is that each interaction with a report requires a round trip to the data source and back again in order to retrieve the data.
Plus, the data query must be translated into a native source system query, retrieved, and then transformed into a model understandable to Power BI. All of this querying, transformation, and retrieval takes time.
A second big issue with DirectQuery models is that DirectQuery models support limited data modeling and this can hinder advanced data analytics. DirectQuery models only support extremely limited DAX equations for calculated columns and even DAX measures cannot employ the full range of DAX functions. Perhaps even worse, the exact DAX functions supported vary depending on the data source.
A third issue involves data transformation limitations. Unlike import mode where you can apply complex data transformations once during a data refresh, DirectQuery data transformations must be performed as part of every query to the source system and these transformations must translate into a native, source system query.
There are many more issues with Power BI DirectQuery vs. import, such as DirectQuery not supporting query caching, date time limitations to only seconds and not milliseconds, offline access restrictions, the ability for visuals on the same page to show different results, and a hard limit of one million rows returned from the source per query. There are also numerous reporting limitations and considerations such as using filters, totals, multi-select slicers, and even the use of median can all lead to degraded report performance due to the additional queries generated.
Are There Any Use Cases for DirectQuery in Power BI?
At this point, you may well be wondering whether anyone should ever use DirectQuery in any situation. What perhaps you don’t know is that the above list of limitations is significantly shorter than it was four or five years ago!
This is not to say that there are not valid use cases for Power BI DirectQuery vs. import — there are. However, with the current feature set of Power BI these use cases are severely limited. In addition, even in valid use cases, the use of DirectQuery in a Power BI solution adds a tremendous amount of complexity and requires careful planning.
Even with Power BI features like aggregation tables and, recently, automatic aggregations, the performance limitations of DirectQuery still remain.