October 20, 2021AtScale + Excel: Working with Blended Data Sets
Microsoft Excel has been the workhorse analytics tool for generations of business analysts, financial modelers, and data hacks. It delivers the ultimate flexibility to manipulate data, create new metrics with cell calculations, build live visualizations and slice and dice data. It is the ad hoc analysis tool for the 750+ Million Excel users worldwide. It lets teams rapidly prototype new analytics before integrating into traditional BI platforms. It powers the majority of financial models used by corporate finance and investment advisory teams. It is so embedded in the psyche of modern data geeks that, despite its limitations, it is hard to see a reality where it is not part of the enterprise analytics technology stack.
Challenges of Using Excel to Analyze Big Data Sets
Any analyst worth their salt at some point has Control-C=>Control-Ved (i.e. copy-pasted) a data set to an excel workbook to be greeted by the message, “This data set is too large for the Excel grid. If you save this workbook, you’ll lose data that wasn’t loaded.” With a limit of 1,048,576 rows and 16,384 columns, interacting with larger data sets presents a challenge.
The typical data hack technique for getting around this is to do some pre-work on your data before you bring it into excel. This usually means querying the relevant subset of your data from its source and potentially aggregating to some level based on the dimensions relevant to your analysis. This forms an extract of data that can be worked on in excel – creating new calculated metrics, cleaning up categorical dimensions, and summarizing data with formulas or pivot tables.
While techniques like these let analysts quickly zero in on the insight they are looking for, it sets up a few important challenges:
- Your analysis workbook is now disconnected from source data. As new data hits the source, there is no easy way to update the analysis. The analyst must reproduce their pre-processing of the data, create a new extract, and re-load to the workbook.
- Doing complex transformations in excel (like calculating new metrics or refining dimensional categories) need to be reproduced for the new data set. The first time transformations were executed they were innovative analytical techniques. Subsequent reproduction of the same transforms now feel like manual time-sucks. And if you forget how you did the original transforms, you are stuck.
- Finally, if you pioneer a technique for pre-processing and summarizing data, you want to be able to share your techniques with colleagues. If definitions of unique metrics and dimensions are “hardwired” in an excel spreadsheet, it is very difficult to share with other analysts and maintain consistency.
The Power of Excel PivotTables
PivotTables are Excel analysts’ bread and butter BI tool. We use them all the time to quickly summarize data using an intuitive dimensional analysis interface. They let you efficiently slice and dice data and create summary statistics for larger data sets. Once you have a clean tabular representation of your data (e.g. defined field names/column headings, conformed dimensions, and clean categorical values), you can build a PivotTable in seconds and be exploring data sets in your workbook.
PivotTables are also the go-to tool for interacting with Microsoft SQL Server Analysis Services (SSAS) OLAP cubes. SSAS has been the most popular OLAP engine for doing high performance dimensional analysis on large data sets. SSAS analysts spin up cubes for the data sets they are interested in and then analyze with Excel or Power BI. SSAS uses a dimensional query language called MDX and Microsoft provides a native connector for it within the Get Data utility.
AtScale also supports MDX that can be used to connect a PivotTable to a dimensional model within a semantic layer on live cloud data sources without the need to build a physical cube or be constrained by large data.
Simplifying Excel Access to Cloud Data
Creating a PivotTable connection to an AtScale model is the simplest and most performant approach to querying live cloud data within an Excel workbook – hands down. AtScale dimensional models contain the business intelligence needed for analyst to confidently interact with raw data in a cloud data warehouse (e.g. Snowflake, Google BigQuery, Amazon Redshift, Microsoft Azure Synapse) or data lake (e.g. Databricks, Amazon S3, Hadoop). A well designed model has pre-blended disparate data sources (e.g. CRM and ERP data), and clearly defined business metrics (e.g. Revenue and Costs), and hierarchically structured analysis dimensions (e.g. Time, Geography, Product).
The semantic layer exposes the model structure to PivotTables, while maintaining a live link to the cloud data source. As an analyst defines cuts of the data within the PivotTable, Excel sends MDX queries to AtScale. AtScale translates the inbound MDX into the appropriate dialect of SQL for the cloud data platform. AtScale leverages its knowledge of the data structure and understanding of user behavior patterns to intelligently create aggregates that accelerate query performance. While similar in concept and use to SSAS cubes, AtScale does not rely on data extracts or large cube structures. And there are no practical limitations on the size of the data on which AtScale dimensional models are built. This means users can leverage Excel to directly analyze the largest cloud data sets.
Embedding Cloud Data into Excel-based Financial Models
PivotTables are a really useful tool for performing ad hoc analysis in Excel. Excel’s CUBE functions, on the other hand, are the real power tools for linking live cloud data with spreadsheet calculations to create powerful, updatable Excel-based models.
There are analysts across all industries that have built Excel-based models that have become mission critical to their organization. Equity analysts build complex valuation models in Excel. Corporate finance analysts aggregate KPIs from different sources into board-level reports based in Excel workbooks. Inventory analysts maintain supply chain and job scheduling models in Excel. Marketers plan advertising mix in Excel. Most models incorporate summary statistics of historical data with assumptions and modeled relationships encoded into the logic of Excel’s flexible spreadsheet design.
AtScale lets analysts use Excel’s CUBE functions, like CUBEVALUE, to mix cloud-sourced data from AtScale models with local, cell-based formulas. By integrating data at the cell level, analysts can exploit all of Excel’s functionality to build sophisticated, composite models. Even more, when data needs to be refreshed, users can click on Excel’s Refresh menu option to refresh cloud data automatically. By leveraging Excel’s CUBE functions linked to live cloud data through AtScale, users can update Excel-based models without the typical offline data prep and copy-pasting into their live models. This simplifies the process of updating models and reduces the chance of mistakes.
A Unique Solution for Connecting Excel to Live Cloud Data
While there are other solutions for importing data from cloud platforms to Excel, AtScale provides unique capability for executing high performance, multi-dimensional queries against live cloud data platforms. Excel jockeys can more efficiently access data sources without time consuming data prep. The ability to align Excel based analysis with more structured BI programs keeps organizations in sync.
In reality, Excel will be the world’s most popular data and analytics platform for the foreseeable future. AtScale lets organizations provide guardrails on self-service data access that let Excel users maintain their analytics agility while creating more scalable analytics programs.