September 20, 2023
Bridging the Gap: PowerBI on DatabricksAs I hinted in my last blog post, “Your Questions Answered: TPC-DS Benchmark for Power BI/Direct Lake”, and due to popular customer demand, I’m happy to announce that we just published our Power BI/Direct Lake benchmark comparison for another data platform: Databricks. You can now download the full report here.
The timing of this report coincides with Microsoft’s 2024 Fabric Community Conference. I’m sure they will make some big announcements. Based on my test results, they have a lot of work to do to make Fabric work for real enterprise workloads.
In this post, I’ll summarize the results comparing Power BI on Databricks to Power BI on Fabric using the Direct Lake interface.
Power BI + Big Data: Not So Fast
Similar to our findings on Snowflake, we found that Power BI’s Direct Lake interface does not hold up to even modest data sizes compared to mature data platforms like Databricks. As you can see in the chart below, when subjected to more than 100GBs of data, the results show that Power BI performance suffers due to a fallback to Direct Query mode.
More concerning than query performance, however, is the high number of query failures at the 1 terabyte and 10 terabyte levels. As you can see in the chart below, most of these query failures are due to query timeouts on Fabric.
The timeout error message, “The XML for Analysis request timed out before it was completed…” is also revealing. It’s clear that the decades-old SQL Server Analysis Services (SSAS) engine is working behind the scenes here. It should not be surprising that when data can’t fit in memory or falls outside SSAS’s data thresholds, the system behaves inconsistently.
Further Thoughts
The lesson here is that there’s no free lunch. It’s all about the architecture. While Microsoft has promised that the Direct Lake interface can replace Power BI import mode, these results show that simply loading data on demand into a processing engine born in the 1990s cannot scale for even modest data sizes.
After all, business users have a job to do. They don’t want to scale down their data or their analysis to make it work for their BI tool. They will not tolerate erratic query performance and they definitely will not tolerate failed queries.
It’s time for Microsoft to really address modern data workloads without resorting to marketing tricks.
How Does Power BI / Direct Lake Perform & Scale on Microsoft Fabric