August 5, 2021AtScale in Action: How to Make Power BI Perform on Snowflake
AtScale has a strong history of collaboration and innovation with Microsoft, and was the first semantic layer platform to introduce native support for Microsoft’s Data Analysis Expressions (DAX) query language. This support lets Power BI users connect to AtScale in live connection mode, establishing high performance, live query access to datasets managed on cloud data platforms, such as Snowflake. AtScale serves as the critical bridge, connecting best-in-class analytics platforms such as Power BI and Excel to the world’s first Data Cloud.
The Modern Data Stack
The data landscape has shifted, with centralized cloud data platforms becoming essential due to their ability to support high speed and cost effective analytical workloads in the cloud. However, data teams face challenges in how to efficiently stitch together their ecosystem of services and utilities and make data assets accessible to decision-makers. The term the “Modern Data Stack,” describes architectures that address these challenges, ensuring seamless integration of business users with their organization’s data in the cloud.
Andreessen Horowitz’s “Emerging Architectures for Modern Data Infrastructure” provides a clear representation of modern data stacks, illustrating the flow of data from various sources through ingestion and transport services into core data platforms..The Data Cloud platform provides the storage, query and processing capabilities necessary to build a world class analytics infrastructure, while Business Intelligence, Artificial Intelligence and Machine Learning tools are the typical chose for creating analytical products and outputs.
The Power BI disconnect
Despite its popularity, Power BI’s architecture and position in the Modern Data Stack are not so straightforward. Although PowerBI provides a “DirectQuery” option to connect directly to cloud data sources without the need to import data into a separate data model or dataset, it suffers from a number of disadvantages:
- The Power BI query must be translated into a native source system query, retrieved, and then transformed into a model understandable to Power BI.
- Due to its multidimensional architecture, Power BI often needs to issue dozens of SQL queries against the native source system to construct a simple visualization.
- DirectQuery is real time, but slow and unable to scale to large data volumes.
To address these limitations, Microsoft recommends adopting “Import Mode” where data is imported into Power BI and stored within a tabular semantic model (Azure Analysis Services). However, this approach comes with its own set of challenges. It creates a data silo outside the centralized cloud data platform, hampering the full utilization of elastic compute capabilities from Snowflake. Additionally, any changes in the source data necessitate dataset refreshes, adding complexity to the process.
- When users interact with an Import Mode report, the local tabular data model is queried directly in-memory and users are working on a data silo outside of their centralized cloud data platform.
- Once the data is stored and processed locally in Power BI, end users experience a low latency BI experience on subset of their data, but are restricted from making use of the elastic compute provided by Snowflake
If you choose to use Import Mode, the Power BI engine must first copy the data into the dataset during refresh before being made available to the business user. Any changes at the source in Snowflake require a dataset refresh.
This incohesive experience doesn’t align with the Unified Data Infrastructure as defined in the “Modern Data Stack” as organizations are not able to leverage their investment in Snowflake. While attempting to bridge the gap between the Data Warehouse and business users, Import Mode has instead turned these elastic data platforms into data sources and bypassed their elastic compute in favor of in-memory processing which is limited to 400GB of in-memory data.
Direct Lake to the Rescue?
Microsoft’s recent announcement of Fabric, which aims to simplify the Power BI architecture making and create a more cohesive data story. At the heart of the announcement Microsoft introduced “Direct Lake,” an approach that tries to bypass the need for Snowflake. With Direct Lake, Power BI users can load data directly from OneLake, without having to query a Data Warehouse or Lakehouse.
Direct Lake promises the speed of Import Mode combined with near real time data availability similar to DirectQuery. However, using Direct Lake requires a provisioned a Fabric Lakehouse with One Lake as the storage layer. This introduces known limitations regarding the size of data that can be analyzed and manual configuration in PowerBI. For larger data volumes, Microsoft’s official documentation recommends using Direct Query as a fallback mode. This reintroduces Synapse as a SQL endpoint to process PowerBI queries on One Lake data. While Direct Lake is still not Generally Available, Microsoft will continue to build native capabilities around OneLake that bypass organizations existing investments in Snowflake as their analytics platform.
With this latest announcement, Microsoft has embedded a multidimensional interface / semantic layer to their Fabric Lakehouse. This enables Power BI to query an XMLA endpoint, which is the communication protocol of Azure Analysis Services, while using DAX as the query dialect. It appears that Microsoft has embedded a semantic layer deeply into Fabric and in doing so, they have validated the approach that AtScale has been offering for the last 10 years.
Power BI + AtScale + Modern Data Stack
Recognizing Power BI’s widespread adoption among the Global 2000, AtScale remains committed to enhancing its Microsoft Analytics cloud integrations. With AtScale’s native DAX support, Power BI can harness the richness of its native formula expression language in exactly the same way as with Azure Analysis Services. This unique capability empowers Power BI users to navigate metadata and use visualizations in their reports without requiring Power BI needing to translate DAX into a SQL dialect or other query languages, thus avoiding associated performance impacts.
AtScale offers a compelling solution that delivers on the promises of Direct Lake, while leveraging your existing investments in Snowflake. By using AtScale’s live connection mode, Power BI users can access up-to-date data at full fidelity, without the need for pre-aggregation or data engineering work. This ensures a seamless and high performance experience for decision makers and information workers who demand the the speed-of-thought performance that Power BI offers.
The seamless integration of AtScale and Power BI within the Modern Data Stack represents a significant milestone in the world of data analytics. As organizations seek to leverage the potential of cloud data platforms, this collaboration serves as a key bridge, facilitating real-time, high-performance data access and empowering businesses to make informed decisions swiftly and effectively.