Bridging the Gap: AtScale and Power BI in the Modern Data Stack

Bridging the Gap: AtScale and Power BI in the Modern Data Stack

AtScale has a strong history of collaboration and innovation with Microsoft, and was the first semantic layer platform to introduce native support for Microsoft’s Data Analysis Expressions (DAX) query language. This support lets Power BI users connect to AtScale in live connection mode, establishing high performance, live query access to datasets managed on cloud data platforms, including Databricks and Snowflake. AtScale serves as the critical bridge, connecting best-in-class analytics platforms such as Power BI and Excel to the world’s most powerful cloud data platforms.

The Modern Data Stack

The data landscape has shifted, with centralized cloud data platforms becoming essential due to their ability to support high speed and cost effective analytical workloads in the cloud. However, data teams face challenges in how to efficiently stitch together their ecosystem of services and utilities and make data assets accessible to decision-makers. The term the “Modern Data Stack,” describes architectures that address these challenges, ensuring seamless integration of business users with their organization’s data in the cloud.

Andreessen Horowitz’s “Emerging Architectures for Modern Data Infrastructure” provides a clear representation of modern data stacks, illustrating the flow of data from various sources through ingestion and transport services into core data platforms. These platforms manage storage, query and processing, and transformation, ultimately making the data available for analysis and output modalities.

diagram - Emerging Architectures for Modern Data Infrastructure

Power BI’s Role in the Modern Data Stack

While Databricks and Snowflake are the centerpiece of most organizations’s Modern Data Stack, PowerBI has become a frontrunner as the tool of choice for data analysis. Its ease of use and popularity amongst business users lead to its consistent recognition as a Leader in the 2023 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms. A report published by Databricks further confirms PowerBI as the most popular data and AI product of 2023 on top of the Databricks Lakehouse.

volume of data managed, by storage format

The Power BI Disconnect

Despite its popularity, Power BI’s architecture and position in the Modern Data Stack are not so straightforward. Although PowerBI provides a “DirectQuery” option to connect directly to cloud data sources without the need to import data into a separate data model or dataset, it suffers from a number of disadvantages:

  • The Power BI query must be translated into a native source system query (i.e Databricks SQL) retrieved, and then transformed into a model understandable to Power BI.
  • Due to its multidimensional architecture, Power BI often needs to issue dozens of SQL queries against the native source system to construct a simple visualization.
  • DirectQuery is real time, but slow and unable to scale to large data volumes.

To address these limitations, Microsoft recommends adopting “Import Mode” where data is imported into Power BI and stored within a tabular semantic model (Azure Analysis Services). However, this approach comes with its own set of challenges. It creates a data silo outside the centralized cloud data platform, hampering the full utilization of elastic compute capabilities from Databricks and Snowflake. Additionally, any changes in the source data necessitate dataset refreshes, adding complexity to the process.

  • When users interact with an Import Mode report, the local tabular data model is queried directly in-memory and users are working on a data silo outside of their centralized cloud data platform.
  • Once the data is stored and processed locally in Power BI, end users experience a low latency BI experience on subset of their data, but are restricted from making use of the elastic compute provided by Databricks and Snowflake

Power BI import mode diagram

If you choose to use Import Mode, the Power BI engine must first copy the data into the dataset during refresh before being made available to the business user. Any changes at the source in Databricks or Snowflake require a dataset refresh.

This incohesive experience doesn’t align with the Unified Data Infrastructure as defined in the “Modern Data Stack” as organizations are not able to leverage their investment in Databricks and Snowflake. While attempting to bridge the gap between the Data Warehouse and business users, Import Mode has instead turned these elastic data platforms into data sources and bypassed their elastic compute in favor of in-memory processing which is limited to 400GB of in-memory data.

Direct Lake to the Rescue?

Microsoft’s recent announcement of Fabric, which aims to simplify the Power BI architecture making and create a more cohesive data story. At the heart of the announcement Microsoft introduced “Direct Lake,” an approach that tries to bypass the need for Databricks and Snowflake. With Direct Lake, Power BI users can load data directly from OneLake, without having to query a Data Warehouse or Lakehouse.

Direct Lake diagram

Direct Lake promises the speed of Import Mode combined with near real time data availability similar to DirectQuery. However, using Direct Lake requires a provisioned a Fabric Lakehouse with One Lake as the storage layer. This introduces known limitations regarding the size of data that can be analyzed and manual configuration in PowerBI. For larger data volumes, Microsoft’s official documentation recommends using Direct Query as a fallback mode. This reintroduces Synapse as a SQL endpoint to process PowerBI queries on One Lake data. While Direct Lake is still not Generally Available, Microsoft will continue to build native capabilities around OneLake that bypass organizations existing investments in Databricks, and Snowflake as analytics platforms.

diagram

With this latest announcement, Microsoft has embedded a multidimensional interface / semantic layer to their Fabric Lakehouse. This enables Power BI to query an XMLA endpoint, which is the communication protocol of Azure Analysis Services, while using DAX as the query dialect. It appears that Microsoft has embedded a semantic layer deeply into Fabric and in doing so, they have validated the approach that AtScale has been offering for the last 10 years.

Power BI  + AtScale + Modern Data Stack

Recognizing Power BI’s widespread adoption among the Global 2000, AtScale remains committed to enhancing its Microsoft Analytics cloud integrations. With AtScale’s native DAX support, Power BI can harness the richness of its native formula expression language in exactly the same way as with Azure Analysis Services. This unique capability empowers Power BI users to navigate metadata and use visualizations in their reports without requiring Power BI needing to translate DAX into a SQL dialect or other query languages, thus avoiding associated performance impacts.

Power BI + AtScale + Modern Data Stack - diagram

AtScale offers a compelling solution that delivers on the promises of Direct Lake, while leveraging your existing investments in Databricks and Snowflake. By using AtScale’s live connection mode, Power BI users can access up-to-date data at full fidelity, without the need for pre-aggregation or data engineering work. This ensures a seamless and high performance experience for decision makers and information workers who demand the the speed-of-thought performance that Power BI offers.

AtScale + Power BI integration - diagram

The seamless integration of AtScale and Power BI within the Modern Data Stack represents a significant milestone in the world of data analytics. As organizations seek to leverage the potential of cloud data platforms, this collaboration serves as a key bridge, facilitating real-time, high-performance data access and empowering businesses to make informed decisions swiftly and effectively.

Power BI/Fabric Benchmarks
TPC-DS Benchmark Result Report Download Now