A Business-Oriented Semantic Layer for Your Databricks Lakehouse

Delivering Deep Insights

A semantic layer strategy lays the foundation for a scalable business intelligence and enterprise AI program and complements the power of modern cloud data platforms.  Key benefits include:

  • Business metrics stay consistent across the organization. 
  • Analysts can access a broader range of data. 
  • Promotes self-service with business-friendly semantics. 
  • Insights are more easily disseminated through preferred BI tools. 
  • Security, compliance, and governance policies are enforced.

With a scalable analytics infrastructure in place, organizations can begin benefiting from a flywheel effect, where new insights enable a deeper understanding of data and the potential to leverage AI/ML to an even greater extent. Helping our teams of analysts and data scientists become more productive and decreasing time to insight is fundamental to the roadmap of any data driven team.

AtScale has written in the past about the potential for leveraging a semantic layer with data lakes to deliver powerful business intelligence (BI) capabilities on highly efficient cloud architectures. Gartner Magic Quadrant Leader Databricks has pioneered the notion of a lakehouse that brings the benefits of data lake and data warehouse architectures together to support enterprise data science and analytics programs. The lakehouse approach is being adopted by many organizations as the foundational data architecture supporting digital transformation. One of the features of Databricks is to centralize disparate data sets in a highly scalable, cloud-based infrastructure as the foundation for democratizing data across organizations.

An analytics semantic layer that establishes a single view of critical business metrics and common analytics vocabulary across all data consumers is a strong complement to lakehouse architectures. The combination has the potential to bridge business intelligence and data science teams through a clean representation of key business metrics and important analysis dimensions that ensures consistency across all users – even if underlying data sources change. The key benefit is that teams spend more time delivering insights, and less time manipulating and prepping data. This is achieved by establishing an integration layer within the enterprise data fabric.

In a prior post, I discussed the general benefits of a semantic layer for modern data architectures.

AtScale delivers a business-oriented semantic layer for the Databricks Lakehouse solution that provides live, high performance query access to data stored in the data platform while forming a single source of governed analytics for all data consumers to leverage. It accelerates end-to-end analytics performance with a unique approach that dynamically orchestrates aggregate creation while pushing all queries to the lakehouse in order to leverage the powerful compute of the Databricks platform.

AtScale provides a no-code modeling environment that supports business users and data teams in designing views of data, complicated metrics, and analysis dimensions. This approach radically accelerates modeling and feature engineering while encouraging the use of broader sets of data, collaboration across different teams, and reusability of existing model components – all without relying on complicated SQL.

AtScale analytics can be presented to a broad range of BI tools including Excel, Power BI, and Tableau. Additionally, AtScale has the ability to move and manipulate data from the semantic layer with Python scripts. This helps data scientists to move data from AtScale into their models or AutoML platforms – simplifying feature engineering and supporting consistency for production models. Further, the ability to write-back model results through the semantic layer lets BI teams publish model results to analysts and managers using existing dashboard and reporting tools.

Key Benefits of AtScale and Databricks 

There are a few key benefits of combining AtScale and Databricks

  1. Eliminate data movement: there is no need to create a separate query layer such as a data warehouse, data mart, or cubing solution to accelerate analytics performance. No data movement means there are no copies of data to manage and no concerns over presenting stale data or incomplete subsets. All data stays in the Lakehouse infrastructure.
  2. Modernize legacy OLAP: AtScale can deliver the speed of thought analytics performance traditionally delivered by OLAP “Cube” architectures like SSAS.  AtScale’s modern approach to optimizing analytics performance from cloud data to analytics tools is a powerful complement to Databricks and fully leverages the powerful query engines of the Lakehouse.
  1. Create a “Diamond Layer” for analysis ready data: AtScale can form a diamond layer on Databricks that can be accessed across popular BI, data science, and ML services that allow customers to build views of data with customized metrics and dimensions that support high performance, self-service analytics to different data consumers across the organization.
  2. Extend analytics protocol support for Lakehouse data: AtScale supports integration with analytics tools by speaking the dialects they use, including DAX, MDX, Python, and SQL. AtScale translates analytics queries to Databricks Spark or Databricks SQL and manages the delivery of data to data consumers using tools like Power BI, Tableau, and Excel. This simplifies integration and lets users get the full benefit of their investments in BI platforms and their data lake infrastructure.   
Where AtScale Fits in the Databricks Stack

System Design Before and After AtScale

System Design Before and After AtScale Implementation

An Architects view of reference Architecture for implementation

AtScale + Databricks/AWS: Reference Architecture

To learn more about AtScale + Databricks watch this overview & tech talk.