Data Mesh is a framework and architecture for delivering data products as a service supporting federated, domain-driven uses and users, enabling de-centralized insights created from centralized infrastructure configured to deliver data product components as micro services supported by governance.
The purpose of the Data Mesh is to increase the speed, scale and cost effectiveness to deliver insights to business users, focusing more on insights created by and for domain-driven use cases and users. The data mesh architecture is a modern evolution of data warehouses and data lakes (as well as hybrid data lakehouse), focusing on delivering data products and their components as a micro service to domain-focused, users and uses focusing on decentralized insights creation
Principles to Consider When Implementing a Data Mesh
- Enterprise data mesh includes the data and all the end-to-end analytics components as part of the data product
- Data Governance and core technology infrastructure are centralized to ensure that data and applications are governed and synchronized
- Data mesh assigns a data product to a domain (e.g. business function / area)
- Data products are created and owned by each domain. Even data that is surfaced from the existing IT infrastructure is created as a data product by the domain.
- Data Products are built and managed by the domain, and which may be shared with other domains to avoid duplication and improve productivity and consistency
- Data Products are registered (cataloged), governed and available / interoperable for inspection and reuse, including having all aspects of creation documented / shared – this includes artifacts such as table/schemas/structures, and metrics as well as data pipelines and transformations
- Creating data products by each domain (or as a group of domains) requires resources such as data analysts / business analytics expertise and BI application engineering armed with skills to be able to translate conformed data (from the data lake or warehouse) into actionable insights via data wrangling and BI tools.
- Scale is achieved by expanding the presence of data analysts / business analytics experts staffed within / across the domains
- Technology infrastructure is managed by a small team supporting the basic services required to create a data product
A visual depiction of a typical data mesh architecture is provided below:
Primary Uses of a Data Mesh
Data Mesh is used to increase speed, scale and cost effectiveness of delivering data, insights and analytics across the enterprise. The data mesh framework and architecture is most appropriate when the business domains have diverse needs in terms of the data, insights and analytics that they use, such that centralized data, reporting and analysis is not required or beneficial. The data mesh requires the following capabilities:
- Domain-focused business ownership for insights creation
- Data analysts embedded within the domains with sufficient skills to translate confirmed data sources into Business Intelligence / Insights
- Centralized data governance, including centralized data catalogs and semantic layer
- Centralized IT team focused on providing and supporting conformed data availability (e.g. from data lake / data warehouse)
- Centralized IT providing and supporting data product capabilities
Data Products are a self-contained dataset that includes all elements of the process required to transform the data into a published set of insights. For a Business Intelligence use case, the elements are data set creation, data model / semantic model and published results, including reports, analyses that may be delivered via spreadsheets or BI application. Examples of data products are shown below.
Key Business Benefits of Data Mesh
The main benefit of Data Mesh is increased speed, flexibility and scale to deliver data, insights and analytics to business owners at the domain level. The key is for the business domain to embed the skills necessary to develop actionable insights as data products.
Challenges Associated with Implementing a Data Mesh Framework and Architecture
- Domain owns the insight creation and usage process
- Sufficient skilled resources to create insights from conformed data (e.g. data sufficiently accurate and structured to be considered useful for data integration and insights creation). This would include data analysts and BI engineers (if BI work is significant)
- Centralized data governance, including centralized data product catalogs and semantic model are available and utilized to ensure consistency and reusability
- Centralized IT team responsible for ensuring conformed data is available and sufficient services available to enable self-service insights and analytics creation.
- Please note that analytics team typically remain centralized, though they are focused on self-service analytics creation with results delivered as a data product
Common Roles and Responsibilities associated with Data Mesh
Roles important to Data Mesh are as follows:
Insights Creators – Insights creators (e.g. data analysts) are responsible for creating insights from data and delivering them to insights consumers. Insights creators typically design the reports and analyses, and often develop them, including reviewing and validating the data. Insights creators are supported by insights enablers.
Insights Enablers – Insights enablers (e.g. data engineers, data architects, BI engineers) are responsible for making data available to insights creators, including helping to develop the reports and dashboards used by insights consumers.
Insights Consumers – Insights consumers (e.g. business leaders and analysts) are responsible for using insights and analyses created by insights creators to improve business performance, including through improved awareness, plans, decisions and actions.
BI Engineer – The BI engineer is responsible for delivering business insights using OLAP methods and tools. The BI engineer works with the business and technical teams to ensure that the data is available and modeled appropriately for OLAP queries, and then builds those queries, including designing the outputs (reports, visuals, dashboards) typically using BI tools. In some cases, the BI engineer also models the data.
Business Owner – There needs to be a business owner who understands the business needs for data and subsequent reporting and analysis. This to ensure accountability, actionability as well as ownership for data quality and data utility based on the data model. The business owner and project sponsor are responsible for reviewing and approving the data model as well as the reports and analysis that OLAP will generate. For larger, enterprise-wide insights creation and performance measurement, a governance structure should be considered to ensure cross-functional engagement and ownership for all aspects of data acquisition, modeling and usage: reporting, analysis.
Data Analyst / Business Analyst – Often a business analyst or more recently, data analyst are responsible for defining the uses and use cases of the data, as well as providing design input to data structure, particularly metrics, business questions / queries and outputs (reports and analyses) intended to be performed and improved. Responsibilities also include owning the roadmap for how data is going to be enhanced to address additional business questions and existing insights gaps.
Common Business Processes Associated with Data Mesh
The process for developing and deploying Data Mesh is as follows:
- Access – Data, often in structured ready-to-analyze form and is made available securely and available to approved users, including insights creators and enablers.
- Profiling – Data are reviewed for relevance, completeness and accuracy by data creators and enablers. Profiling can and should occur for individual datasets and integrated data sets, both in raw form as was a ready-to-analyze structured form.
- Preparation – Data are extracted, transformed, modeled, structured and made available in a ready-to-analyze form, often with standardized configurations and coded automation to enable faster data refresh and delivery. Data is typically made available in an easy to query form such as database, spreadsheet or Business Intelligence application.
- Integration – When multiple data sources are involved, integration involves combining multiple data sources into a single, structured, ready-to-analyze dataset. Integration involves creating a single data model and then extracting, transforming and loading the individual data sources to conform to the data model, making the data available for querying by data insights creators and consumers.
- Extraction / Aggregation – The integrated dataset is made available for querying, including, including aggregated to optimize query performance.
- Analyze – Process of querying data to create insights that address specific business questions. Often analysis is based on queries made using business intelligence tools using a structured database that automate the queries and present the data for faster, repeated use by data analysts, business analysts and decision-makers.
- Synthesize – Determine the key insights that the data are indicating, and determine the best way to convey those insights to the intended audience.
- Storytelling / Visualize – Design of data storyline / dashboards and visuals should be prepared and then developed based on the business questions to be addressed and the queries implemented. Whether working in a waterfall or agile context, it is important to think about how the data will be presented so that the results are well understood and acted up.
- Publish – Results of queries are made available for consumption via multiple forms, including as datasets, spreadsheets, reports, visualizations, dashboards and presentations.
Trends / Outlook for Data Mesh
Key trends for the Data Mesh are as follows:
Semantic Layer – The semantic layer is a common, consistent representation of the data used for business intelligence used for reporting and analysis, as well as for analytics. The semantic layer is important, because it creates a common consistent way to define data in multidimensional form to ensure that queries made from and across multiple applications, including multiple business intelligence tools, can be done through one common definition, rather than having to create the data models and definitions within each tool, thus ensuring consistency and efficiency, including cost savings as well as the opportunity to improve query speed / performance.
Automation – Increase emphasis is being placed by vendors on ease of use and automation to increase ability to scale data governance management and monitoring. This includes offering “drag and drop” interfaces to execute data-related permissions and usage management.
Observability – Recently, a host of new vendors are offering services referred to as “data observability”. Data observability is the practice of monitoring the data to understand how it is changing and being consumed. This trend, often called “dataops” closely mirrors the trend in software development called “devops” to track how applications are performing and being used to understand, anticipate and address performance gaps and improve areas proactively vs reactively.
AtScale and Data Mesh
AtScale’s semantic layer improves data mesh implementation by enabling faster insights creation via self-service data modeling for AI and BI, including performance via automated query optimization. The Semantic Layer enables development of a unified business-driven data model that defines what data can be used, including supporting specific queries that generate data for visualization. This enables ease of tracking and auditing, and ensures that all aspects of how data are defined, queried and rendered across multiple dimensions, entities, attributes and metrics, including the source data and queries made to develop output for reporting, analysis and analytics are known and tracked.