Interactive, Large Volume AI/ML Queries At Scale And Without Data Movement for Self-Service Analytics Empowerment

AtScale 2019.2

AtScale’s 2019.2 product release introduces a time-series and time-relative analysis capability for large volumes of data across disparate databases and platforms. This enables data analyst and data science teams to easily access large volumes of time-series data and quickly query and configure data for any (BI), artificial intelligence (AI) or machine learning (ML) tool.

Time is relative. Your data is, too.

As concluded by Einstein’s theory of relativity, events that occur at the same time for one observer could occur at different times for another. Time analysis of your data that could take hours or days with other solutions is now available at comparably relativistic speeds. AtScale has continued in its approach of pushing the work to the underlying database. This allows for complex time-based analysis to be performed at the scale of your database.

Using our understanding of the multidimensional model and internal query engine, we understand how to interpret time-relative functions. As opposed to traditional in-memory ways of doing these complex calculations, our semantic understanding allows us to build queries against the underlying data stored in the database, with no extraction or in-memory data manipulation required.

This means we can, and have scaled time-based calculations to the biggest datasets in the world.

With your data and your favorite analytics tools, you can answer questions like:

  • How are sales trending so far this month compared to last?
  • What are my year-over-year metrics by month?
  • What has been the rolling average number of unique customers week by week over the last two quarters?

Mo’ Platforms, Mo Fun! Oracle, Teradata & Postgres

When we started AtScale, we did what all good startups do. We chose a new segment in the market where we could be differentiated and built a beachhead. Our Normandy was Hadoop, and we were successful in taking on that objective, becoming the biggest analytics vendor in the Hadoop platform space.

As the market developed, the battlefield moved from Hadoop to cloud data warehouses and by some accounts over 68% of new database growth was in the cloud space. Even though this growth is the best growth in over a decade in the DBMS space, there is still roughly $50B worth of traditional on-premise database usage, dominated by the likes of Oracle and Teradata.

So as it turns out, supporting interactive BI queries with no data movement on Hadoop is very difficult. We have spent over 150 man-years mastering the beast and during the course of that, we fundamentally reinvented how you drive performance for multidimensional analytics.

When I get asked “Why don’t you support 50+ RDBMS’s?” I respond that for a real deployment to perform, a simple dialectic translation for targeting SQL is not even close to being enough for the job. Today’s scale, complexity and security requirements require a new database to completely restructure how queries are written while taking advantage of a platforms’ native optimizations.

Imagine a situation where Tableau generates a (famously suboptimal) SQL query against a large amount of data. It’s not pretty, is it? Now imagine you need to solve that problem. You kick off an IT-led software development project that looks at the dashboard, or ad-hoc requirements. You execute the project, test it and roll it out. Now, you probably have to go back to your users and ask them to change their behavior (e.g., don’t query that big raw fact table anymore, use this table our data engineering team built). How long does that process take? It’s a good thing data engineers are easy to find (spoiler alert, they aren’t).

The good news is that AtScale will do this automatically and it’s completely transparent to the end users. The autonomous data engineering generates acceleration structures in response to signals from natural user-generated workloads. AtScale’s semantic understanding of the query rewrites it to use the new acceleration structures, while taking advantage of database platform-specific optimization. They be hints, advanced runtime filtering, transitive closure, join elimination, join reordering, the use of estimates for expensive operations, incorporation of lifted partition inclusion, or even advanced granular elastic scaling.

You Be You: End-to-End Impersonation with Tableau Server

Our guiding framework for product development is PSA (Performance, Security & Agility). A big part of security is Identity. Last quarter we released Federated Identity Management (FIM) support with SAML 2.0, and this quarter we deliver a complete end-to-end secure impersonation with Tableau server. Impersonation, with regards to Tableau Server, enables one user account to act on behalf of another user account. This meaning the Tableau workbook user authenticates over the AD/LDAP domain and that user maps to the same user on the Tableau Server, and then maps directly to an appropriate user on the database system.

I use the term appropriate, specifically because you have options. It could be the same user mapped into AD/LDAP, or you could be mapping to a service account for specific security enforcement. Quoting Bruce Schneier, “complexity is the worst enemy of security,” and this feature’s main benefit is it allows administrators to implement and control security policy in one place thus simplifying the deployment. Replicating security and data entitlements is a great way to leak data.

Looking Forward to AtScale’s Q3 Release!

What’s next? AtScale is about to release software we’ve been working on for close to five years. Software built on our unique insight built over the years of working with the biggest and best brands in the world. We believe what we built is an evolution of our platform that will mean a significant change in the economics and agility of self serve analytics. Stay tuned!

Power BI/Fabric Benchmarks
TPC-DS Benchmark Result Report Download Now