bol.com and Google BigQuery: The Journey, Performance and Cost

Bol.Com With Maurice Lacroix

bol.com Google BigQuery Performance Costs

In our most recent webinar, Maurice Lacroix, BI Product Owner of bol.com and Dave Mariani, Co-Founder and Chief Strategy Officer of AtScale share how Maurice and his team are able to increase the ROI of their Google BigQuery investment while boosting their BI performance.

The Journey to BI on Google Cloud

Maurice recalls bol.com’s journey to Google BigQuery, “Back then, the world was relatively simple for us. We just had one Oracle BI stack and, and that was it. We also had a Hadoop cluster, but we weren’t using it for BI at the time.” Maurice continues, sharing that Oracle wasn’t the solution for them as “We wanted our organization to become much more data-driven and be self-sufficient when it comes to using data. So ideally without being dependent on IT.” This then led the team to begin working with Platfora as their data preparation and visualization tool. While the team was content with the tool, they soon learned of the news that the product was to be discontinued. Enter Google BigQuery.

Determining Performance Challenges

How does Maurice and his team measure performance? “We measure performance by the low times of user requests, which can be anything. It’s actually how our users interact with the dashboards. So, not just initially viewing the dashboard, but also filtering and drilling down on some specific dimensions.” Lacroix states that this is focused on execution times and that they are based on three factors: Capacity, Concurrency and Compute Cost. He makes the comparison of the three with commuting to a big city, “Consider your dashboards, your vehicle, your tools, the highway to work and your fellow commuters, the concurrency. Your Porsche 11 might get you to work faster than a Volvo. But if there’s lots of traffic, that’s not healthy, you need additional lanes on your highway. So more capacity for them to go faster. If you have enough lanes, you’re not going fast. So you need to replace your precious Porsche 11 with a Formula One car to go even faster.”

Optimizing Performance Challenges

“If you’re in a growing company, your data volumes grow and typically your users want more and more functionality and not less.” Maurice shares the growing challenges and how they are resolved with a scalable platform.

Choosing The Right Pricing For GBQ

“What does pricing have to do with performance? Well, more than you might initially think,” says Lacroix. Lacroix goes into detail about Google’s “On-Demand” and “Flat-Rate” pricing models.

Getting Your BI Stack In Shape

“In order to make your tools work, you need to work with one or three to identify any bottlenecks that you have.” Maurice recalls that when you are monitoring your bottlenecks, you should look out for resources exceeding thresholds and signs of queuing.

Understanding Logs And Compute Costs

I always like to think the answers are in the logs. The logs give you an accurate recording of what the user has actually seen and the execution times that the user has experienced, but also what happened in all of the systems,” says Lacroix. Understanding all areas of performance makes it easier for Maurice to make queries run faster. With Google BigQuery “Our team is comfortable working with BigQuery and writing queries that make it easy for us to analyze.”

Maurice also recalls the importance of visualization and gives you a greater insight into performance and compute costs.

A Semantic Layer Is Critical To Success

Dave Mariani shares why you should want a semantic layer, his three reasons being:

  1. Simplicity: According to Dave, a semantic layer “Makes things easy to consume. Your business users can deal with business terms and not have to worry about modeling data every time they want to build a dashboard in Tableau. So you take that modeling burden away from the business users and you let them consume and run analytics.”
  2. Single Source of Truth: Mariani recalls that by having different teams using different BI tools and reporting on those findings, creates extreme confusion and wrecks your ability to build confidence in your data. A semantic layer eliminates this confusion and builds trust.
  3. Governance for All: You want everyone to play by the same rules.

See AtScale in Action

In this two-part demonstration, Dave shows us how to build a virtual cube and how AtScale is running live queries on a TPC-DS data warehouse.

Related Reading: 

Power BI/Fabric Benchmarks
TPC-DS Benchmark Result Report Download Now

Start Building with the Developer Edition

Build and share semantic models with the community