March 2, 2020Data Lake Intelligence With Amazon S3 and Redshift Spectrum
Using AtScale to establish a semantic layer on Amazon Redshift delivers several important benefits to modern data and analytics teams. As a single source of governed metrics, and dimensions, AtScale extends the value of Redshift for business intelligence and data science use cases. By ensuring high performance query with no ETL/ELT or extracts, AtScale on Redshift simplifies data pipelines and improves end user experience.
Summary of benefits of using AtScale on Amazon Redshift using the TPC-DS 10TB benchmark:
|Query Performance12x faster||User Concurrency61x faster|
|Compute Cost2.6x cheaper||SQL Queries Complexity76% less complex|
For a detailed overview of results and methodology of this benchmark study, please follow this link.
In this blog, we cover how to connect Amazon Redshift to AtScale, and by doing so, accelerate query times and reduce compute cost. To connect AtScale to a live Redshift database, there are just a few steps to follow:
- Add a new data warehouse connection in AtScale Design Center.
- Import a sample XML data model or create a new one from scratch.
- View the imported data model in AtScale Design Center.
How to Connect Amazon Redshift to AtScale
Create a NewData Warehouse
First, to connect Amazon Redshift to AtScale, in AtScale Design Center, navigate to the “Settings” tab and click on the “Data Warehouses” menu item. Then press the “Create Data Warehouse” button, which will trigger the following pop-up:
From here, select the desired data warehouse (eg. Redshift) from the dropdown list. Next, add a name for the data warehouse to make it easy to identify within AtScale’s data warehouse library. For this example, let’s name our new warehouse “Redshift Data Warehouse”.
Next, select the Redshift “Database” where we want AtScale to work and choose an “Aggregate Schema” in that database where AtScale will store its aggregate tables. Additionally, you can optionally select an S3 bucket location for storing temporary files for loading sample data.
From there, you can click on the “Test S3 Connection” button to verify that AtScale can read and write to the S3 bucket. Then, click on the “Save” button below to save these settings.
Create a Connection
Now that we have created a data warehouse set up for Redshift, we can define one or more connections. AtScale data warehouse connections allow users to specify different resources for querying and creating aggregates.
In this example, we’ll create a single data warehouse connection for both queries and aggregates. To create a connection, click on the arrow (“<”) on the right side of the screen to open the connections panel.
Then, click on the “Create Connection” button to define a new connection.
Then select a “Name” for the connection (let’s just use “Redshift”), fill in the the Redshift server “Host” (you can find this in your AWS Redshift console), the “Port” (default is “5439”) and the “User” and “Password” (or Access Key). By clicking on the ‘Test Connection” button, you can make sure that your credentials are good to go. Then click “Save” to save this connection.
Import the Sales Insights Sample Project & Data:
Now that we’ve connected AtScale to the Redshift data warehouse, we’re ready to start data modeling. A good way to learn firsthand is to load the AtScale “Sales Insights” sample project and model. To load this data and model, click on the AtScale icon in the upper left hand corner of AtScale Design Center to land on our home screen.
Then, click on the “Use Sample to Explore” tile to launch the sample import wizard.
If you haven’t already downloaded the “Sales Insight” sample bundle, click on the “our sample data repository” link in the wizard title bar to download the bundle.
Then click on the “Download” button for the “Sales Insights Application (AtScale versions >= 2019.2.x)” bundle and download it to your desktop and navigate back to the Design Center wizard screen.
From there, select the “Data Warehouse” (“Redshift Data Warehouse”) name you just created, select the “Schema” to store your sample data, the “Destination Directory” for storing the temporary files (same as the “Bucket Name” you defined in the “S3 Bucket Details” for the Data Warehouse creation dialog. Then, select the “Sales Insights” sample bundle you just downloaded and click the “Next” button. Finally, choose the defaults for the remaining wizard pages and click the “Finish” button to import the data and the model definition.
View Imported Model From Amazon Redshift
After you have imported the XML model and sample data into your Amazon Redshift data warehouse, you will see your new model in the AtScale Design Center canvas.
You can click on the “Arrange Items” button on the canvas to auto arrange your model visually.
If you doubleclick on the “Factinternetsales” Dataset title (the panels with the blue title ribbons), you should see the sample data in the Dataset composer: this will let you preview the raw data in this Redshift table and create calculations and transformations.
Connecting AtScale to a cloud data warehouse like Amazon Redshift is an intuitive process with just a few steps. Following the steps above, we’re now ready to publish our model and consume it in a variety of BI tools (i.e. Excel, Power BI, Tableau, Looker) and data science platforms (i.e. DataRobot, SageMaker, Jupyter notebooks).
As you can see, the AtScale semantic layer establishes a single source of governed metrics with no need to extract or transform data from your Redshift data warehouse.