September 28, 2020Gaining a Competitive Edge: Business Forecasting with Third-Party Data and Microsoft Excel
In a recent webinar with Ryan Squire, Senior Data Scientist at SafeGraph, we tackled the challenges that aren’t uncommon to the Data Scientist, BI Analyst, and other Analytics Leaders. Excel was designed for creating sophisticated models, but data extracts, joins, splicing and data wrangling can be painful and risky. What can you do with Excel despite these challenges? In this blog post, I’ll share how to turn Excel into a crystal ball for your business forecasting by using Excel pivot tables against billions of data points for data exploration.
For this webinar, the AtScale team and I were excited to engage our participants with a live poll. Here are the results.
Live Poll Results
In the first question, “What is your biggest challenge at the moment with Excel forecasting?,” the poll reflected that 46% of our audience answered “scale”. Why? Excel can’t handle the amount of data that is needed to analyze. We also found that 38% of the audience answered “consistency” from report to report.
The results for the second question, “How long does it typically take to get a new measure added to your datasets?”, were better than I expected them to be. With 38% of our respondents answering “A week or less”, that is still too long. What was interesting about this question, is that 23% of our respondents answered, “unpredictable.” This is the worst answer of the bunch. If the timing is unpredictable, you’re probably not going to ask, because you’re not going to know whether you’re going to get an answer.
Tackling the Problems
So, where do we go from here? For this use case, I used Excel to forecast inventory and quantities of products.
Loading Foot Traffic Data
In this example, we used SafeGraph’s third party data to account for the foot traffic difference between March and August of 2019 as well as foot traffic data between those same months but a year later during COVID-19 (where we witnessed how retail was affected).
*On the SafeGraph website, we used their “Weekly Patterns” third party data for the purpose of this use case.
Here are a few of my takeaways:
- The first problem is that Microsoft Excel is not a database. It’s limited to 1M rows of data.
- CSV files are difficult to parse and load.
- Aggregating data is difficult and error prone. For this example, I am a retail store with my own retail calendars and of course my holiday calendar from this year is going to be different from my holiday calendar from last year.
Loading Baseline Sales Data
Using Tableau and AtScale, I took a look at my data for the year and then loaded it into Excel.
- The problem is that the data needs to be refreshed. Updating the data is a manual, error-prone process. As soon as I pasted something into Excel, it’s game over. I might make a mistake when I paste it, I might forget what I did last month. If I’m doing this monthly and even forget where I went to go get that data to begin with, I might get different data the second time around.
- Data is only as fresh as the last file loaded.
- New data elements may break existing models. When this happens, I have to change the whole process. And it’s a different manual process. Trying to splice in data means I have to go and manipulate the raw data – a risky proposition.
Building a Sales Forecast Model
In this final part of the webinar, I shared the pivot table that I created that is from my first party data. Then I went ahead and I did my forecast. I was able to do my forecast by simply referencing the pivot table data that I referenced.
- Trusting your data isn’t easy. There is no guarantee that your data will match the “corporate truth”. If I’m doing this manually and I’m manually splicing data, I’m almost assured that it’s not gonna match the corporate truth, because manual data preparation steps are error prone and clumsy. The data may be different and may be updated differently. It may have been revised and I may be using old numbers.
- Bespoke calculations are difficult and error prone. This a really simple model, but things can get pretty complicated, fast.
- Conformed dimensions and hierarchies are very important. Using a common set of dimensions and hierarchies across my analysis ensures that folks are rolling up data the same way.