What is a Cloud Data Warehouse?

← Back to Glossary

Definition

A Cloud Data Warehouse is a database of highly structured, ready-to-query data managed in a public cloud. Typically, cloud data warehouses represent the following features:

  • Massively parallel processing (MPP): Cloud-based data warehouses typically support big data use cases and apply MPP architectures to provide high-performance queries on large data volumes. MPP architectures consist of many servers running in parallel to distribute processing and manage multiple input/output (I/O) loads.
  • Columnar data stores: MPP data warehouses are typically columnar stores — which are typically the most flexible and economical for doing insights and analytics. Columnar databases store and process data by columns instead of rows and make aggregate queries, the type often used for reporting, run dramatically faster.

Purpose

The purpose of cloud data warehousing is to realize the benefits of modern cloud architectures that typically combine the scale and availability of data warehousing, the flexibility of big data platforms and the elasticity of the cloud at a fraction of the cost of traditional solutions.  

Typical benefits of cloud data warehouses are as follows:

Faster Speed-to-Insights – A cloud data warehouse provides more powerful computing capabilities, and will deliver real-time cloud analytics using data from diverse data sources much faster than an on-premises data warehouse, allowing business users to access better insights, faster.

Scalability –  A cloud-based data warehouse offers immediate and nearly unlimited storage, and it’s easy to scale as needs expand. Increasing cloud storage doesn’t require the purchase of new hardware as an on-premises data warehouse does, so cloud data warehouses typically cost a fraction of an on-premise cost.

Overhead – Maintaining a data warehouse on-premises requires a dedicated server room full of expensive hardware, and experienced employees to operate, manually upgrade, and troubleshoot issues. A cloud data warehouse requires no physical hardware or allocated office space, making operational costs significantly lower.

Cloud data warehouses also offer additional benefits such as automation to improve speed, scale and flexibility. Further, applying a model-driven approach, including utilizing tools such as a semantic layer and data catalogs will make the process of finding data and creating data products more effective and efficient, supporting federalization of insights creation supported by centralized infrastructure, tools and governance. These additional capabilities and benefits are highlighted below:

  • Real-time data ingestion and updates –  simple, universal solution for continuously ingesting enterprise data into the cloud-based data warehouses in real time.
  • Automated workflow – model-driven approach to continually creating, sharing and refining data products and managing data warehouse operations.
  • Trusted, analysis-ready data –  fully structured, integrated, dimensionalized analysis-read data with reliable data management and governance tools, including semantic layer and data catalogs to improve data access, sharing and refinement.

Primary Uses of a Cloud Data Warehouse

Cloud data warehouses are used to provide highly structured, integrated, dimensionalized analysis and analytics-use ready data to insights creators and consumers. Cloud data warehouses provide all of the benefits of an on-premise data warehouse with the extra benefits of being managed in a public cloud.  These additional benefits include the following: faster speed-to-insights, scalability, flexibility, automation at a lower cost, both for infrastructure as well as operational management, including compute, storage and network costs. Cloud data warehouse software (databases) provide the power, scale and flexibility to handle the variety, velocity and volume of data sources, ideally with minimal manual coding and maintenance.   

A complete set of services support Cloud Data Warehouse include a data storage layer (databases), compute layer or query processing layer and a cloud services layer as depicted below:

Services that support Cloud Data Warehouse

Benefits of a Well-Executed Cloud Data Warehouse

The benefits of using a cloud data warehouse to deliver highly structured, integrated dimensionalized data that is available for querying, analysis and analytics are listed below:

  • Speed – Insights created from cloud data warehouses enable actions to be taken faster, because the insights are structured to address business questions more timely and effectively.
  • Scale – Cloud data warehouses support an ever increasing number of data sources, users and uses that are also increasingly diverse across functions, geographies and organizations.
  • Cost Effectiveness – With more data, comes more cost for data storage, compute and resources to manage. Cloud data warehouses are designed to ensure that costs are minimized by focusing on providing consistent, effective shared infrastructure with tools and processes that are automated, low code / no code self-service oriented and reusable with minimal resources and hand-offs.  
  • Flexible – Cloud data warehouses should be capable of addressing the myriad options impacting data source types, including volume, variety and velocity. 

Common Roles and Responsibilities – Cloud Data Warehouse Users 

Cloud data warehouses, including the resulting creation of highly structured, dimensionalized data products and subsequent actionable insights delivered to business users involves the following key roles:

  • Data Engineers – Data engineers create and manage data pipelines that transport data from source to target, including creating and managing data transformations to ensure data arrives ready for analysis. 
  • Analytics Engineers – Analytics engineers support data scientists and other predictive and prescriptive analytics use cases, focusing on managing the entire data to model ops process, including data access, transformation, integration, DBMS management, BI and AI data ops and model ops.
  • Data Modelers – Data Modelers are responsible for each type of data model: conceptual, logical and physical. Data Modelers may also be involved with defining specifications for data transformation and loading.
  • Technical Architect – The technical architect is responsible for logical and physical technical infrastructure and tools. The technical architect works to ensure the data model and databases, including source and target data is physically able to be accessed, queried and analyzed by the various OLAP tools.
  • Data Analyst / Business Analyst – Often a business analyst or more recently, data analyst are responsible for defining the uses and use cases of the data, as well as providing design input to data structure, particularly metrics, topical and semantic definitions, business questions / queries and outputs (reports and analyses)  intended to be performed and improved. Responsibilities also include owning the roadmap for how data is going to be enhanced to address additional business questions and existing insights gaps.  

Key Business Processes Associated with Cloud Data Warehouse

The processes for delivering data and insights from a cloud data warehouse include the following:

  • Access – Data, often in structured ready-to-analyze form and is made available securely and available to approved users, including insights creators and enablers.
  • Profiling – Data are reviewed for relevance, completeness and accuracy by data creators and enablers. Profiling can and should occur for individual datasets and integrated data sets, both in raw form as was a ready-to-analyze structured form. 
  • Modeling – Creating logical and physical data models to integrate, structure, dimensionalize and semantically organize data for ready-to-analyze use.
  • Extraction / Aggregation – The integrated dataset is made available  for querying, including, including aggregated to optimize query performance.

Common Technologies Categories Associated with Cloud Data Warehouse

Technologies involved with a cloud data warehouse are as follows:

  • Data Engineering – Data engineering is the process and technology required to move data securely from source to target in a way that it is easily available and accessible.
  • Database – Databases store data for easy access, profiling, structuring and querying. Databases come in many forms to store many types of data.
  • Data Warehouse – Data warehouses store data that are used frequently and extensively by the business for reporting and analysis.  Data warehouses are constructed to store the data in a way that is integrated, secure and easily accessible for standard and ad-hoc queries for many users.
  • Data Lake – Data lakes are centralized data storage facilities that automate and standardize the process for acquiring data, storing it and making it available for profiling, preparation, data modeling, analysis and reporting / publishing. Data lakes are often created using cloud technology, which makes data storage very inexpensive, flexible and elastic.

Trends / Outlook for Cloud Data Warehouses

Key trends to watch in the Cloud Data Warehouse arena are as follows:

Semantic Layer – The semantic layer is a common, consistent representation of the data used for business intelligence used for reporting and analysis, as well as for analytics. The semantic layer is important, because it creates a common consistent way to define data in multidimensional form to ensure that queries made from and across multiple applications, including multiple business intelligence tools, can be done through one common definition, rather than having to create the data models and definitions within each tool, thus ensuring consistency and efficiency, including cost savings as well as the opportunity to improve query speed / performance.

Real-Time Data Ingestion and Streaming (CDC) – Create and manage  data as live streams to enable and support modern cloud-based analytics and microservices.

Extend enterprise data into live streams to enable modern analytics and microservices with a simple, real-time and universal solution.

Data Governance – Implementing effective, centrally managed data governance, including data usage policies, procedures, risk assessment, compliance and tools for managing secure access and usage such as semantic layer, metric stores, feature stores and data catalogs.

Agile Data Warehouse Automation – Quickly design, build, deploy and manage cloud data warehouses without manual coding.

Managed Data Lake CreationAutomate complex ingestion and transformation processes to provide continuously updated and analytics-ready data lakes.

Automation – Increase emphasis is being placed by vendors on ease of use and automation to increase speed-to-insights. This includes offering “drag and drop” interfaces to execute data-related preparation activities and insights creation / queries without having to write code, including reusing activities and processes, both for repeating use as well as sharing.

Self-service – As data grows, availability of qualified data technologists and analytics are very limited. To address this gap and increase productivity without having to lean 100% on IT resources to make data and analysis available, Self-service is increasingly available for data profiling, mining, preparation, reporting and analysis. In addition tools like the Semantic Layer offered by AtScale, Inc are also focused on enabling business users / data analysts to model data for business intelligence and analytics uses.

Transferable – Increased effort is also underway to make data easier to consume, and this includes making data available for publishing easier, including using api’s and via objects that store elements of the insights.

Observable – Recently, a host of new vendors are offering services referred to as “data observability”. Data observability is the practice of monitoring the data to understand how it is changing and being consumed. This trend, often called “dataops” closely mirrors the trend in software development called “devops” to track how applications are performing and being used to understand, anticipate and address performance gaps and improve areas proactively vs reactively.

AtScale and Cloud Data Warehousing

AtScale is the leading provider of the Semantic Layer – to enable actionable insights and analytics to be delivered with increased speed, scale and cost effectiveness. Research confirms that companies that use a semantic layer improve their speed to insights by 4x – meaning that a typical project to launch a new data source with analysis and reporting capabilities taking 4 months can now be done in just one month using a semantic layer. 

AtScale’s semantic layer is uniquely positioned to support rapid, effective use of cloud data warehouses: AtScale provides the ability to ensure that data used for AI and BI are consistently defined and structured using common attributes, metrics and features in dimensional form, including automating the process of data inspection, cleansing, editing and refining as well as rapidly adding additional attributes, hierarchies, metrics / features, and extracting / delivering ready-to-analyze data automatically for multiple BI tools, including  Tableau, Power BI and Excel. Moreover, this work only requires one resource who understands the data and how it is to be analyzed, eliminating the need for complexity and resource intensity. This approach to data operations automation eliminates multiple data hand-offs, manual coding, the risk of duplicate extracts and suboptimal query performance.

Additional Resources: