AtScale releases industry’s first comprehensive BI-on-Hadoop Benchmark revealing key insights on performance of major analytical engines for Hadoop
San Mateo, California, February 24, 2016 – AtScale, the first company to provide business users with speed, security and simplicity for BI on Hadoop, released today the results of a comprehensive Business Intelligence benchmark for SQL-on-Hadoop engines. The full benchmark results can be viewed for free at https://www.atscale.com/resource/performance-benchmark-business-intelligence-on-big-data-q4-16
The benchmark tested the industry’s top SQL-on-Hadoop engines over key Business Intelligence (BI) use case queries. The benchmark reveals and rates strengths and weaknesses of the engines, and reveals which ones are ideally suited to various scenarios.
AtScale’s experience with large enterprise customers helped guide the framework and methodology used for the industry’s first comprehensive BI-on-Hadoop Benchmark. “We used real-world enterprise experience to produce a document that every technical evaluator can use as part of their evaluation process”, says Josh Klahr, VP of Product Management at AtScale.
Some surprising findings that surfaced include:
- While Hive is generally a default for SQL on Hadoop, across all scenarios it doesn’t always perform well on its own.
- While Cloudera Impala is known as a strong player when it comes to SQL-on-Hadoop, the benchmark study found “winners” varied depending on the type of query, size of data and other factors. Each engine has its own ‘sweet spot’ and the study reveals which engine is best for different scenarios.
- The upgrades to Spark announced recently made a big difference in performance on smaller data sets. We were surprised to find significant performance improvements between Spark 1.5 and 1.6.
“This benchmark will provide a useful data point for those assessing business intelligence workloads on Hadoop,” said Tom Pringle, Head of Applications Research at Ovum. “We’ve seen an increase in adoption of Hadoop, and most often the focus has been on storage and scale-out capabilities of the new platform. As more organizations consider analytical workloads on Hadoop, it will be important that they assess the capabilities of SQL-on-Hadoop solutions.”
BI on Hadoop: a key workload
As indicated in the latest Hadoop Maturity Survey, Business Intelligence is now a top workload for Hadoop, ahead of Data Science and ETL. The maturation of a number of technologies has enabled Business Intelligence to be deployed broadly, creating a unique opportunity for business users in the enterprise to finally be able to adopt Hadoop.
Until now, the industry has provided little guidance on the performance of Business Intelligence workloads on Hadoop. This has left technology evaluators with a void in measuring each engine against their own needs and workloads. The AtScale Benchmark Study is aimed at helping evaluators understand the differences across the leading SQL-on-Hadoop engines.
The BI-on-Hadoop benchmark paper details the methodology and framework used for the study. The full document can be viewed for free at www.atscale.com/benchmark
- Hadoop is prime for Business Intelligence (BI): All tested engines have passed our tests and are stable enough to support Business Intelligence workloads.
- One engine does not fit all: Depending on their needs (for example, small vs. large data sets, small vs. large amount of concurrent users), enterprises will find that one engine does not accomplish everything. Each engine has its own ‘sweet spot’ and enterprises will find that a blended usage of all engines might fit their company’s goals best.
- Small vs. Big Data: Engines like Spark SQL or Impala perform best on smaller data sets – i.e. tables with thousands or several million rows of data.
- Few vs. Many Users: Impala has shown to be the best concurrency test results, over Hive and Spark-SQL. Companies that anticipate connecting large numbers of business users to Hadoop should look into Impala..
- Constant Innovation: Open source innovation, as seen by Spark SQL’s improvements provides constant innovation. We expect the industry to continue innovating in this space: Cloudera, which has been working on Impala for the last 5 years, proposed to donate the project to the Apache Software Foundation this past November. There is no doubt more innovation will come out from this new development.
To find out more about how AtScale works, simply go to www.atscale.com/try
AtScale makes BI work on Hadoop. With AtScale, business users get interactive and multi-dimensional analysis capabilities, directly on Hadoop, at maximum speed, using the tools they already know, own and love – from Microsoft Excel to Tableau Software to QlikView. Built by Big Data Veterans from Yahoo!, Google and Oracle, AtScale is already enabling the BI on Hadoop revolution at major corporations across healthcare, telecommunications, retail and online industries. To see how AtScale can help you, go to www.atscale.com/try