10 AtScale Scientific Observations
Five years ago we had a hypothesis that Business Intelligence (BI) needed a reboot. We planned to take the best parts of original BI ideas and merge them with modern engineering and data analytics to build a platform for delivering self-serve, secure, curated and fast analysis to the entire business. Our strategy, we believed, would build a bridge from the old world to the new world while giving us a stage to present new concepts that made Business Intelligence truly intelligent.
“The good thing about science is that it’s true whether or not you believe in it.”
― Neil deGrasse Tyson
While we achieved incredible product-market fit early on, our original hypothesis required some refinement; namely the concepts around the Universal Semantic Layer (USL) emerged as a critically important part of the solution. Supporting every major BI tool for live connecting to data via the USL turned out to be a major differentiator and scratched an itch that almost every customer had.
I believe science is best in an open community where ideas and observations are shared freely, so I’m going to share the observations we’ve made:
1. AtScale ensures you get the right answer
SQL for real world analytics can be quite difficult. Querying hierarchical structures via GROUP BYs with subtle errors can lead to incorrect answers. Aggregating non-additive measures will really skew your results. Business users should not have to know relational algebra to answer conceptually simple questions. Quick – what is the last day of each quarter for your company? Are you using a 4-4-5 Calendar? Under this method the company’s fiscal year is defined as the final Saturday (or other day selected) in the fiscal year end month. Details ARE important and getting them wrong translates into bad outcomes. Abstraction encapsulates behavior and lets your people focus on higher order problems without requiring them to know the minutiae. One (virtual) schema to rule them all!
2. AtScale enables self-service
Ah Self Service, a dream we’ve all had and almost given up on. The keys to self-service are Simplicity, Collaboration, and Trust. At more established companies or sophisticated use cases, total self-service requires oversight from your data and analytics teams. Simplicity and ease of use have a massive range of impact from modeling to scaling for new use cases. If data modeling (BI ease-of-use perennially ranks as a critical criteria). Simplifying Data Lineage (by removing extraction and movement). The AtScale Design Center is a collaborative multi-user environment that codifies the pipeline from modeling to publishing and querying while making it easy enough that self-service modeling is possible. The collaboration elements help to prevent re-work and duplication, while the combination of managed deployment coupled with the adaptive cache mean you can point and click your way to new use cases with no ETL required.
3. AtScale makes you money and saves you money
You don’t need to choose between money and doing something you love! Business Intelligence exists to enable companies that have data to use it to optimize outcomes; Make more money/be more profitable and/or avoid losing money. The majority of data used in BI systems takes a long time to land in a collection store, get cleansed, Extracted, Translated, and Loaded (ETL) into an operational store, and potentially yet another movement into a data mart. By the time the data is ready for consumption, there is a good chance the optimal time to make the decision has passed. AtScale focused on big data analytics because only with a truly massive database can you store, cleanse and query without moving the data. Coupled with the lack of a formal “cube build” phase, data lands and is queryable in time to make a near instant decisions. Not having a large ETL project benefits speed to insight and also means you don’t need expensive data engineers from your IT group every time you introduce new use cases.
In cases where the back end datastore is one of the super exciting new cloud databases such as BigQuery from Google, Redshift from Amazon, Azure SQL Datawarehouse from Microsoft or Snowflake you have actual cost per consumption that ties back in some way to your query activity. For example BigQuery has slots, Redshift and Snowflake have clusters of different size. Ultimately if you can reduce the use of resources on these databases while getting the same answers, you can both save money and increase concurrency. In practice we have observed almost an order of magnitude decrease in cost mostly related to caching common queries, optimizing sub-optimal queries generated from BI tools to use database features such as partitioning, and by adaptively summarizing data and transparently using those summary structures.
4. AtScale improves performance & concurrency
The Adaptive Cache & our Optimizing SQL reduces system resources on both Hadoop and non-hadoop data sources. By transparently generating summary aggregate tables and using those in BI queries we have improved concurrency by 10x and reduced query time from 10X to over 100X. See our benchmark for the data.
5. AtScale lets you use ALL your BI tools
This isn’t your first rodeo, and apparently some of the bulls from previous rodeos have stuck around. On average, IT professionals interact with about ~3.5 BI/Analytics tools. It’s not unnatural for different tools to serve different consumers better in some departments than in others. Excel is a tool every person in the universe uses.
It’s clear that businesses are supplementing their existing larger deployments (think enterprise-wide legacy deployments, or homegrown solutions) with smaller, more agile and tactical solutions for specific pain points. For instance, the majority of users also interact with dedicated data visualization tools, or solutions designed specifically for certain function (such as reporting on Salesforce.com data).
It makes sense that end users will find some analytics tools to be more productive for use cases than others and because of this, chances are you will continue to have plenty of tools to support.
6. AtScale enables agile, collaborative analytics
Agility and Collaboration are essential to driving ROI on BI projects. Imagine a data project with a team of eight analysts. Seeing eight people working on the same data, you expect the data cleaning and ETL process – which takes usually up to 80% of the time – to be done just once. Then you would see them collaborating on the data, reusing the code written previously by others, and being productive on the focus of the analysis. But in reality, they work as individuals. Each of them having a very high-end computer filled with downloaded data, cleaning the data locally and feed it to Tableau dashboards online, creating an endless stream of separate and independent ETL processes – and each one of those with the potential for (different) bugs. Usually no one knows what has been done to the original data and why. This reduces trust, and without trust analytics project adoption is stunted.
7. AtScale makes data secure & audited
Extracts are scary and should scare you. Raw queries against Hadoop based systems should scare you. New Data and privacy standards such as GDPR should scare you. Honestly you should probably not sleep at night. Security is tough – Kerberos is not trivial. Sentry & Ranger and new and have learning curves. SAML is sophisticated. End-to-End security has no less than 8 major elements including Transport (TLS end to end), Authentication, 2 factor authentication, Authorization, Row Level, Column Level, XSS, SQL Injection, Change Control, and the list goes on.
8. AtScale helps you monetize your data
Maybe you are trying to make decisions with Data. The fresher the data, the faster you can make the decision and move your company in the right direction. Maybe it’s analysis of where to spend advertising dollars where every day of misdirected spend could cost you thousands or even millions of dollars. Maybe you are a service bureau type organization that collects data that in aggregate has immense value. Organizations depend on shared data to create new opportunities and streamline business operations. However, traditional data sharing methods – e-mail FTP, EDI, APIs – are cumbersome, time consuming, and static, preventing organizations from creating the fastest time-to-value from data. Snowflake is tackling the data sharing problem head-on at the database level because they understand the value of sharing without moving. AtScale goes a few steps further by allowing you to define both the schema and very fine grained ACL security for accessing shared data. Additionally, because AtScale was built entirely on documented and tested APIs, it is possible to drive the platform entirely headless and thus integrate it into your data offering. We have numerous customers that follow the pattern of:
- Collect Data
- Process/Clean/Enhance Data
- Package Data for consumption by customers.
By using AtScale, these embedded customers are able to quickly build new use cases without having to build ETL and write super complex queries. For instance in the Search Marketing space, showing data sliced and diced by mobile/desktop, by country, by keyword, or any other dimensional attribute is quick and easy. Multi-tenancy, end-to-end security and instant scalable support for Excel and Tableau amplify the value of your offering.
9. AtScale adapts as your workloads change
A fundamental problem in database systems is choosing the best physical design, i.e., a small set of auxiliary structures that enable the fastest execution of future queries. Modern databases come with designer tools that create a number of indices or materialized views, but they find designs that are sub-optimal and remarkably brittle. This is because future workload is often not known a priori, and these tools optimize for past workloads in hopes that future queries and data will be similar. In practice, these input parameters are often noisy or missing. AtScale solves this by evolving the physical structures without altering the virtual structures.
10. AtScale lets you address semi-structure (NOSQL type) data from BI tools that have no idea what JSON is.
Real (modern) data is curvy. Ragged hierarchy dimensions, degenerates, key-value pairs, JSON, the list goes on. Querying this data in current generation Business Intelligence tools is mostly impossible as they are designed specifically to deal with relational data. AtScale allows you to model semi-structured data via our virtualized schema-on-read technology and make it appear to BI tools as simple columns and rows.
Thank you for indulging my tongue-firmly-in-cheek pseudoscience, I’m very aware this was done without the requisite academic rigor and would not hold up to peer review. Joking aside I am very comfortable standing behind these class as we have observed these benefits in the wild and the outcomes are quantifiable.
Please stay tuned for my next scientific endeavors: “An Investigation into code quality – Rockstars, Ninjas or Pirates, who writes the best code?”, “Low Hanging Fruit: How low is too low?” and “Micro-Managing: Are you thinking too big?”
We invite you to learn more about AtScale today!