June 21, 2022Rethinking Feature Stores to Better Align AI/ML to Business Impact
This is the seventh blog in my blog series, The Semantics of the Semantic Layer, where I discuss the seven core capabilities of a semantic layer. In this blog, I will dive deeper into the analytics governance and security capabilities that make a semantic layer a single control plane for implementing secure data access for all users and groups, both internal and external.
While there are platforms and tools focused on just data governance, it’s impossible to truly secure data without integrating governance rules with the semantic layer’s logical view of data. While the semantic layer must respect and integrate with the underlying data platforms’ native, physical security policies, it must extend those policies to the derived business calculations and constructs defined in the semantic layer data model.
Who’s Who in the Zoo?
Governance and security must start with the user’s identity. Without knowing who is running a query, it’s not feasible to apply data access policies to users or groups. No governance controls can work without user identity management. As a result, table stakes for any semantic layer is to support semantic layer integration with enterprise directory services like Active Directory (AD), LDAP, Okta, and more.
With enterprise directory integration, the semantic layer can identify a user and then run queries “as that user” against the native data platform. By doing so, the semantic layer will respect the underlying data platform’s physical security policies — something not possible if the semantic layer uses a proxy or service account to run queries instead. The semantic layer must also synchronize its users and groups with the directory service users and groups to avoid a duplicative, shadow governance infrastructure when applying data access policies.
This requires a semantic layer to support deep integration with IT services and data platforms — nothing else will do.
Key Takeaway: A semantic layer must support deep integration with IT identity management services and respect underlying data platform security policies by running queries with the user’s account.
Enterprises are in constant motion, adding new employees, new business groups and new data sets all the time. For a semantic layer to serve as the governance control plane for the enterprise, it must react to these changes in real time.
Starting with user and group management, a semantic layer must seamlessly integrate and sync with enterprise identity management services. As new employees join, depart or change groups, the semantic layer’s governance policies will instantly reflect those changes.
With policies defined in the semantic layer and user identity up to date, the semantic layer platform can then intercept consumer queries and rewrite them to enforce governance policies in real time. Since the semantic layer can connect to any consumer, data platform or user persona, it can deliver comprehensive and consistent coverage with confidence.
Key Takeaway: A semantic layer must enforce data governance in real time for every query in order to provide comprehensive coverage and respond to frequently changing policies.
Who Sees What
With the ability to identify end users by name, the semantic layer can apply the following governance functions that are critical to securing data access in a semantic layer:
- Row-level security
- Column-level security
- Object-level security
The first critical governance feature for a semantic layer is row-level security, or the ability to apply a filter (or WHERE clause) to each outbound query to select only the rows of data that users should see. For example, row-level security can be used to automatically restrict data for a sales team in the West so they only see their data and not the data from other regions. As long as the salesperson is mapped to the Western Region’s group, the semantic layer can automatically generate the proper WHERE clause to restrict data access to just the West region’s data for that user.
The second critical governance feature for a semantic layer is column-level security, or the ability to hide sensitive data columns or mask their contents. For example, a column-level security rule can hide or mask personally identifiable information (PII) fields for the marketing team while making them visible for the HR team.
|Column||Rule||What HR Sees||What Marketing Sees|
|Social security number||Visible to HR, masked for everyone else||123-45-6789||XXX-XX-6789|
|Customer Address||Visible to HR, hidden for everyone else||123 Anywhere St.||Not Visible|
By dynamically adjusting the view of the semantic layer based on user access rules, the semantic layer can be defined for everyone, but appear customized to users and groups.
The final key governance function is object-level security. This layer of governance allows for users and groups to own and share modeling components (i.e. conformed dimensions, hierarchies, calculations, models and much more). This functionality is critical to supporting the concept of a data mesh, a popular topic of discussion in the data and analytics community. The key principles for a data mesh architecture are supporting domain ownership of data objects and a decentralized system of data stewardship. Without modeling object-level security, backed by RBAC, achieving the vision of federating the creation and management of data products just isn’t feasible.
Key Takeaway: A semantic layer must apply query governance with dynamic filtering, column-level security and object-level security based on the query user’s identity. Semantic layer solutions that lack row, column and modeling object controls are not suitable for use cases where data access restrictions are required.
Data and Analytics Governance Together
Besides serving as a metrics hub, a semantic layer must apply data access controls and governance policies to every query using the query user’s identity. By avoiding duplicative governance tools and shadow user management, enterprises can apply both physical and logical data access policies to ensure that data is only visible to those who have authorized access. With the trust that data is secure, organizations can confidently share data more broadly both internally and externally.
In my next and final post, part eight of eight, we’ll dive into the importance of the semantic layer’s data platform integration features.
In the meantime, If you are looking to skip ahead, I encourage you to read the white paper, “The Semantics of the Semantic Layer”.