AtScale releases industry’s most comprehensive BI-on-Hadoop Benchmark and reveals key trends on the performance of major analytical engines for Hadoop
San Mateo, California, October 18, 2016 — AtScale, the first company to provide business users with speed, security and simplicity for BI on Hadoop, released today the results of its reference performance study: The Business Intelligence Benchmark for SQL-on-Hadoop engines.
The benchmark is the world’s most comprehensive test of Business Intelligence workloads on Hadoop. The study reveals the strengths and weaknesses of the industry’s most popular analytical engine for Hadoop – Impala, SparkSQL, Hive and, new in this version, Presto.
“As enterprises adopt Hadoop more broadly, business intelligence (BI) and analytical use cases on Hadoop have expanded from strong, but limited, adoption among data scientists.” says John L Myers, Managing Research Director at Enterprise Management Associates (EMA), “Now, organizations need to make the data within their Hadoop clusters available and ‘business critical’ to a wider business stakeholder audience. BI on Hadoop is a logical use case to help them accomplish that growth in adoption and acceptance.”
Some surprising findings that surfaced include:
- There is rapid innovation in the open source space, as reflected by Spark SQL improvements, even from 1.6 to 2.0: The open source community continues to drive significant and rapid improvements across the board. All engines tested showed between 2x to 4x performance gains in the six months between the first and second edition of the benchmarks. The study shows significant performance improvements between Spark 1.6 and Spark 2.0. Cloudera’s recent decision to donate Impala to the Apache Foundation will benefit the community, Cloudera, and any enterprise connecting business users to Hadoop. This is great news for those enterprises deploying BI workloads to Hadoop.
- Different engines perform well for different types of queries: For large data sets Hive, Impala, Presto, and Spark SQL were all able to effectively complete a range of queries on over 6 Billion rows of data. There is no single “winning engine” for all query types. Depending on raw data size, query complexity, and the target number of end-users enterprises will find that each engine has its own ‘sweet spot’.
- Presto and Impala scale better than Hive and Spark for concurrent dashboard queries: Production enterprise BI user-bases may be on the order of 100s or 1,000s of users. As such, support for concurrent query workloads is critical. Our benchmarks showed that Presto and Impala performed best – that is, showed the least query degradation – as concurrent query workload increased. Presto, new to this edition of the benchmark, showed the best results in our user concurrency testing.
Since the first edition of this study back in February 2016, AtScale researchers noticed significant improvements to the benchmark results: “The increasing demand for BI-on-Hadoop workload has truly driven the community to innovate in a short period of time,” says Josh Klahr, VP of Product at AtScale. “The communities supporting the open source SQL-on-Hadoop projects have been working diligently to advance innovation in this field. We’ve aligned our vision with these open-source engines since day one we are pleased to see that this bet is paying off: by simply supporting the latest versions of Impala, Spark SQL and Hive, AtScale customers are now querying their Big Data up to 4x faster than six months ago.”
Based on the results of the benchmark AtScale contends that vendors competing with open-source will see their any remaining advantages dwindling as the community out-innovates them and vendors like AtScale continue to build on top of the open-source innovation.
BI on Hadoop: A Key Workload
As indicated in the latest Hadoop Maturity Survey, Business Intelligence is now a top workload for Hadoop, ahead of Data Science and ETL. The maturation of a number of technologies has enabled Business Intelligence to be deployed broadly, creating a unique opportunity for business users in the enterprise to finally be able to adopt Hadoop.
Until now, the industry has provided little guidance on the performance of Business Intelligence workloads on Hadoop. This has left technology evaluators with a void in measuring each engine against their own needs and workloads. The AtScale Benchmark Study is aimed at helping evaluators understand the differences across the leading SQL-on-Hadoop engines.
Additional Key Findings
- SQL-on-Hadoop engines are well suited for Business Intelligence (BI): All tested engines – Hive, Impala, Presto,and Spark SQL – successfully executed all of the queries in our benchmark suite and are stable enough to support business intelligence workloads.
- Small vs. Big Data: Impala and Spark SQL continue to shine for small data queries (queries against the AtScale Adaptive Cache). New in this edition, the latest release of Hive LLAP (Live Long and Process) shows suitable “small data” query response times. Presto also shows promise on small, interactive queries.
AtScale’s experience with each engine at large enterprises like Comcast, American Express, Aetna, Macy’s, Home Depot, Groupon and many others helped guide the framework and methodology used for the industry’s most comprehensive BI-on-Big-Data Benchmark.
To find out more about how AtScale works, simply go to www.atscale.com/demo
AtScale makes BI work on Hadoop. With AtScale, business users get interactive and multi-dimensional analysis capabilities, directly on Hadoop, at maximum speed, using the tools they already know, own and love – from Microsoft Excel to Tableau Software to QlikView. Built by Big Data Veterans from Yahoo!, Google and Oracle, AtScale is already enabling the BI on Hadoop revolution at major corporations across healthcare, telecommunications, retail and online industries. To see how AtScale can help you, go to www.atscale.com/try