December 8, 2020Five Ways to Improve Your Analytics ROI on Snowflake
Modeling Multi-Level Facts
In our last Advanced Modeling blog post we looked at using bridge tables to resolve many-to-many relationships. In this post, we are going to expand that use case and look into how to model facts that apply differently at various levels in a hierarchy.
This is another of those business rules that we see often that on the surface sounds pretty straightforward and makes logical sense. But trying to implement a “simple” business rule can frequently pose some challenges for users trying to manipulate SQL to do what they want. AtScale solves this challenge by creating that single source of actionable information with the logic already defined, regardless of the BI application that is consuming it. AtScale’s Intelligent Data Fabric models those advanced rules and isolates them from the ultimate consumers, allowing easier consumption of complex rules and information.
This is another of those rules we see often in retail and sales geographic analytics, but it can also generally apply as the “multi-level fact” or “multi-fact” issue. Basically, this is the challenge that arises when you have a fact, say Sales Target, that does not get aggregated up hierarchies, such as Geography. In our use case for this exercise, there are a few notes for the calculation:
- Sales Targets are valid for Territory level only, and can be expressed additionally as a percentage of the target for the Territory.
- Sales Targets should NOT be aggregated to Area level. Therefore, Area is separate dimension. Area Sales Targets should be set to 0 in our use case.
- Sales Targets are set at Product and Territory granularity; the calculation does not make sense at lower granularities i.e. SKU and/or Zip
Representing that via a logical model looks similar to the below diagram. Note that we have a fact table for storing Sales Targets. In order to track Area Sales Targets (which will be 0), Area and Territory are separate dimensions and we don’t use the Area field from the Territory dimension table. Also, the Product dimension table is related to the fact at the ProductID instead of the lowest level, SKUID.
A completed model to support this multi-fact rule looks like the below diagram. The model includes the previous blog’s salesfact dataset and dimensions on the left side of the diagram.
Taking a deeper look at the right side of the diagram:
We have two fact datasets here – salestargets and areasalestargets. The salestargets dataset carries the target fact at the Territory, Product and Month levels. It also has a relationship to the Area dimension that we will use in the calculation later.
The areasalestargets fact dataset is simply defined to set the value of area_salestarget to 0:
select 0 as area_salestarget, productid, monthid, area
It carries the Product, Month and Area levels – note that Territory is not defined.
So the model looks pretty straightforward – one fact table supplies territory-level targets, and the other supplies a 0 for the Area targets. How does AtScale know when to use which fact table in a multi-fact scenario such as this? The “magic” is in how we define the Measures, and the critical setting called Unrelated Dimension Handling.
For Measures, we create two measures, one from each of the Sales Target facts. Area Sales Target comes from the areasalestargets fact, while Base Sales Target comes from salestargets:
Both of these measures have a setting toggled on called Unrelated Dimension Handling. This setting is used to assist the engine when planning the query on how to handle the situation when a business user tries to query Measures and Dimensions that are not “related” to each other.
In our case, an example would be querying area_salestarget by Territory. Note in the model that Territory is not “related” to the area_salestarget fact; if you “follow the arrow directions” you can’t make a relationship between those two entities – they are “unrelated dimensions” to the fact.
The valid values for this setting are:
- Display an error message and return nothing
- Include empty cells in query results
- Include repeating values in query results
Both of the measures will have this set to “Include empty cells in query results” so that we can test for an empty value in our final calculated measures, Sales Target and Sales vs Target.
Sales Target is defined as an empty string test for Area Sales Target and Base Sales Target. This captures what happens what the unrelated dimension setting kicks in and returns empty cells.
IIF(IsEmpty([Measures].[m_area_sales_target_sum]) ,[Measures].[m_target_sum] ,[Measures].[m_area_sales_target_sum] )
Then the final Sales vs Target measure can be defined with the previous Sales Target measure, including the requisite divide-by-0 and empty string checks.
IIF([Measures].[Sales_Target]=0, 0, iif(isempty([Measures].[Sales_Target]),0,[Measures].[m_salesunits_sum] / [Measures].[Sales_Target] ))
Cube in Action
Let’s take a look at some pivot tables in Excel that use this model. For the first example, we select the Time and Product dimensions as filters, and the Area and Territory dimensions as rows. Sales Units, Sales Target and Sales vs Target are selected as the measures. Note that the target measures are both empty for the Area members.
Now if we switch the rows to be Territory and Zip (recall that Sales Targets are not set at the Zip level), the Sales Target measures are both empty.
Lastly, if we switch the filter from the first example to look at a lower level SKU (again, not included in Sales Targets), the Sales Targets are empty.
AtScale allows for complex business rules to be modeled as part of its Design Center interface. For this sample rule, a designer’s initial instinct might be to create some complex SQL in a query dataset to perform a bunch of checks or union together subqueries or some other advanced SQL technique. However, leveraging the power of AtScale’s modeling capabilities such as Unrelated Dimension Handling allows the designer to deliver an efficient, aggregate-aware solution for easy consumption by the end users. They are able to simply drag measures like Sales Units, Sales Targets and Sales vs Targets to a report without having to worry about the complexities of the SQL used to generate those measures.