February 27, 2020Data News Roundup – Thursday, February 27th
AtScale partner DataRobot today announced DataRobot Core, a program including platform enhancements, resources, and community support targeting code-first data scientists. Code-first data scientists and enterprise AI developers have a specific set of needs from technology providers – requiring the flexibility to interact with data and AI services programmatically, leveraging the full range of code-first resources available to modern developers.
AtScale brings its semantic layer technology to this partnership, enabling DataRobot to connect and push down computation to elastic cloud data platforms and providing a powerful modeling canvas for feature engineering. AtScale is critical to operationalizing AI / ML workloads with support for a wide range of data platforms, protocols and visualization tools.
With a widespread and well-documented shortage of data science and enterprise AI skill sets in the hiring market, organizations need to maximize the productivity of their teams. Often, data scientists struggle with long deployment timelines and far too much time spent configuring and deploying AI. It is estimated that data scientists spend more than 50% of their time on noncritical tasks including data wrangling, data prep, and data pipeline management.
AtScale helps data scientists minimize time spent on data management while giving greater flexibility to engineer sophisticated features based on a wider range of data sources. AtScale AI-Link enables data scientists to access fully governed business KPIs and metrics within the semantic layer via Python. This helps them accelerate feature engineering and more easily manage the data pipelines supporting DataRobot. Ultimately, this means they can spend their time on more value-added activities.
Simplified Data Engineering
AtScale’s data modeling canvas lets both data scientists and business subject matter experts design features using a no-code visual modeling tool or custom SQL. Users leverage a composable analytics design to build on existing models and features to more efficiently accomplish their goals. AtScale’s query virtualization engine makes it possible to access a broad range of data, regardless of where it is stored. With disparate data sources logically modeled with conformed dimensions (e.g., time, geography, product, etc.), users can more confidently engineer sophisticated feature sets.
Features defined in AtScale are queried at run time from source data, so there is no need for managing a separate data pipeline. This eliminates data wrangling and data preparation, as well as establishing a foundation for more robust data pipelines for serving features at runtime.
As DataRobot pulls features from AtScale at runtime, they are dynamically queried from source data — avoiding the traditional data movement and manipulation required in feature engineering. By pushing query execution down to cloud data platforms, there is a significant performance benefit and overall efficiency gained.
Data scientists are no longer required to transform and materialize the data in yet another platform prior to analysis. This approach leverages the power of SQL-based data warehouses (e.g. Snowflake) and emerging Lakehouse technologies (e.g. Databricks) where computation is elastic and data gravity continues to expand. Data science workloads tend to require greater scalability, performance and range of ecosystem integrations (Scala, Java, Python, etc.) along with a variety of deployment engines including Spark, Livy, SQL, etc. — all of which AtScale supports natively.
The semantic layer provides a context-rich repository of governed business metrics. This capability is foundational to any “metric store” or feature store strategy. AtScale supports programmatic feature discovery by DataRobot users, as well as user-driven discovery within the modeling utility. AtScale can serve as a feature library or can integrate with other cataloging services. Standardizing on AtScale as a single source of metrics in no way limits access to the broadest range of raw data sources. On the contrary, AtScale simplifies incorporation of a broader range of data sources while ensuring consistency of key dimensions (e.g. time).
Bridging AI and Business Analytics
One of AtScale’s most powerful capabilities for the data science community is the ability to publish model-generated insights to business users in existing dashboards and reporting processes. DataRobot developers can programmatically write-back model results to data stores managed by the AtScale semantic layer. Business users can access model-created data leveraging the same governance, discoverability, and flexibility they would get for analytics based on historical data.
The AtScale semantic layer brings a common vocabulary to all data consumers, whether they are interacting with data assets with no-code visual tools or programmatically. DataRobot Core brings a powerful set of resources to the code-first data science community with the potential to make this sophisticated community highly efficient and even more productive in driving business outcomes.