If you're trying to do business intelligence (BI) on big data and the capability to handle large number of concurrent queries is a key issue for you, Google BigQuery may well be the way to go, according to a new Business Intelligence Benchmark released Thursday by AtScale, a startup specializing in helping organizations enable BI on big data.
"Concurrency has been an Achilles' Heel, a challenge for SQL-on-Hadoop," says Josh Klahr, vice president of product management for AtScale.
But AtScale's benchmark found concurrency to be BigQuery's greatest strength. Its serverless model means concurrent query performance on small data sets shows no query degradation, even at query volumes above 25 concurrent BI users.
"Concurrency, I think, was the biggest one," Klahr says. "But the user experience with BigQuery was also really nice. Maybe this isn't a surprise because Google has focused so much on consumer products over the years: Everything about using the product was really nice. The thing that actually took the longest was loading the data from our local network onto the cloud. Once we had the data there, the creation of the tables was really easy."
For its benchmark, AtScale used the same model it deployed last year for its benchmark tests of SQL-on-Hadoop engines on BI workloads. For that test, the idea was to help technology evaluators select the best SQL-on-Hadoop technology for their BI use cases. The goal was the same for the Google BigQuery benchmark.
"The AtScale benchmark provides enterprise leaders with useful comparisons they need to make BI work on big data," Doug Henschen, vice president and principal analyst at Constellation Research, said in a statement Thursday. "As the data grows more complex and diverse, these benchmark stats help enterprises understand leading big data query options and make better decisions critical to supporting BI infrastructure.
AtScale's testing team used the Star Schema Benchmark (SSB) data set, based on widely used TPCH data, modified to more accurately represent a typical BI-oriented data layout. The data set allowed the test team to test queries across large tables: The lineorder table contains close to 6 billion rows and the large customer table contains over a billion rows.
For the Google BigQuery benchmark, AtScale looked at the same three key requirements it used to evaluate the SQL-on-Hadoop engines last year and their fitness to satisfy BI workloads:
- Performs on big data. SQL-on-Hadoop engines must be able to consistently analyze billions or trillions of rows of data without generating errors and with response times on the orders of 10s or 100s of seconds.
- Fast on small data. The engine needs to deliver interactive performance on known query patterns and, as such, it is important that the SQL-on-Hadoop engine return results in no greater than a few seconds on small data sets (on the order of thousands or millions of rows).
- Stable for many users. Enterprise BI user bases consist of hundreds or even thousands of data workers. The underlying SQL-on-Hadoop engine must perform reliably under highly concurrent analysis workloads.
Sign up for Computerworld eNewsletters.