Developer Edition

How to Connect to Google BigQuery

This Demo is part of a How To video series to help users get started with the AtScale Developer Edition.

Video play button

How to connect to Google BigQuery in the AtScale Developer Edition

Overview:

When you download AtScale Developer Edition, you get a pre-installed connection to Postgres. However, you can also connect it to other data platforms that you may want to use.

This article will give you a step-by-step guide to connect the AtScale Design Center to BigQuery.

Prerequisite:

Install:

For this guide, we will use the tutorial models that come pre-installed with AtScale Community Edition as a sample.

Open the GitHub repo sml-models. There, you will find the three tutorial models in the Model Library. These are:

  • Internet Sales – a simple, single-fact model derived from the fictitious AdventureWorks retail dataset.
  • World Wide Importers – a more complex, multi-fact model that represents a fictional wholesale and distribution company
  • TPC-DS – a complex, multi-fact model that encodes the TPC-DS benchmark model in SML.

Copy the Model Data

    1. Open the GitHub public tutorials repository called sml-models.
      GitHub public tutorials repository - SML Models
    2. Click on the directory data/loaders/bigquery. The three tutorial data models will be displayed.
      GitHub Public Repository - Three SML Models
    3. Select the load-as-adventure.sql model. The load scripts for this model will be displayed.
    4. Select, copy, and save the load script.
      Note: This data is all in the Google Cloud Platform and is publicly accessible.
      GitHub Repository - BigQuery - load script
    5. Repeat this process for the load scripts of the other two models.

Set Up BigQuery Project and Create a JSON File

  1. Switch to BigQuery.
  2. Create a project by opening the menu in the top navigation bar and clicking New project.
  3. Enter “atscale-tutorial-data” in the Name field.
    Note: Be sure to use dashes (-) as BigQuery does not support underscores (_). This is important to remember when creating the connection in AtScale later.
  4. In the left sidebar, go into IAM & Admin.
    Google BigQuery - create a new project
  5. Select Service accounts.
  6. At the top, click on + Create service account.
    Google BigQuery - Create Service Account
  7. In the Service account details page, enter “atscale-tutorial-data” as the service account name.
    Note: A service account ID will be generated based on this name. You can edit the ID at this point if necessary, but you cannot change the ID later.
  8. Click on Create and continue.
  9. Under the option Grant users access to this service account, select BigQuery→BigQuery Admin.
  10. Click Continue.
  11. Under the option Create key, click on + Create key.
  12. In the menu on the right, select JSON. The .json file will begin downloading.
  13. Save the file. You will use this file to create your AtScale BigQuery data warehouse connection.

Load the Data

  1. Go to the BigQuery console.
  2. In the Explorer, expand the atscale-tutorial-data project.
  3. Create a data set called “aggregates”. This is where AtScale will automatically create and send the aggregate tables in BigQuery.
    Google BigQuery - Create a data set
  4. Click + Create SQL query
  5. Right-click and paste it into the as-adventure load script.
    Google Big query - paste into script
  6. Once the data has loaded, click the Run button.
    Google BigQuery - Run scripts
  7. Repeat this process for the load scripts of the other two schemas — TPC-DS and ww_importers.

The data is now loaded into BigQuery.

Set Up a Data Warehouse Connection with AtScale

There are two stages to creating a connection to a cloud data warehouse.

  1. Create a Data Warehouse group.
  2. Establish the connection.

Data warehouse groups allow different resources for different activities, such as serving queries or building aggregates, which might require warehouses of various sizes. In this example, we will simply create a single connection.

Create a Data Warehouse group

To repoint your tutorial SML models to this data in BigQuery, go to AtScale Design Center. In the Repo Browser, you will see your three tutorial models.
AtScale Design Center - Repo Browser

    1. Click the ellipsis next to the AtScale icon at the top left.
    2. Choose Settings from the drop-down menu and select Data Warehouses.
      Note: AtScale Developer Community Edition has a pre-installed connection to Postgres. The tutorial models are set up to point to this Postgres connection by default. Instead, we will set up a BigQuery connection for the models to point to.
    3. Click on the BigQuery icon to open the Add Data Warehouse panel.
    4. In the Name field, enter “BigQuery”.
    5. Type “BigQuery” in the External connection ID field also.
      Note: Ensure you use the correct capitalization, as this will be the key for the SML connection object.
      AtScale Design Center - Set External Connection ID
    6. In the Aggregate Project ID, enter “atscale-tutorial-data”. This is where the aggregates will be sent to.
    7. In the Aggregate Schema field, enter “aggregates”.
      Note: The aggregate catalog and schema will define where AtScale will write and store the aggregate tables it generates based on query activity. Ensure you have read/write access to your database and schema.
    8. For the authentication method, use the default: “Use the service account key JSON file”.
    9. Toggle the Google Cloud Bucket switch to disable.
    10. Click Apply.
    11. When the screen refreshes, the new BigQuery warehouse will appear. However, the status will read “No connections defined.”

You have now created the connection group. The next stage is to establish a connection.

Establish the Connection

  1. Click the ellipsis on the BigQuery warehouse.
  2. Click + to add a new connection.
    AtScale Design Center - Establish new connection
  3. Enter “BigQuery” as the connection name.
  4. Click on Choose file.
  5. Select the JSON file you downloaded when you generated the key for a service account.
    AtScale Design Center- Add BigQuery Connection
  6. Click Test connection. A “Connection successful” notification should appear.
  7. Click Apply to complete the process.
  8. When the screen updates, the new BigQuery connection should display Pool and Connection as “UP”.
    AtScale Design Center - Big Query Connection
    Note: if you need to edit the connection at any time, click the ellipsis, then hover over the connection to reveal the pen icon.
    AtScale Design Center - How to edit a connection

BigQuery is now connected correctly.

Change the Connection Objects in SML

You now need to refer to the new SML data warehouse connection for each tutorial data model.

  1. Open the Repo Browser from the AtScale icon.
  2. In the Workspace, go to models→tutorials→internet-sales→connections.
    AtScale Design Center - Repo Browser
  3. Click on Connection – Internet Sales.yml
  4. In the Text Editor, change the AtScale connection object (as_connection) from “Postgres” to “BigQuery”.
    Note: This must match the external connection ID you entered when creating the data warehouse group earlier. Ensure that it is typed in the same format, with the same capitalization as the connection ID.
    AtScale Design Center - Text Editor, change connection object
  5. In the database field, replace the underscores with dashes, so that it reads atscale-tutorial-data, as Google BigQuery doesn’t support underscores.
  6. Click Save file at the bottom of the page.
  7. Click Deploy in the Repo browser.
  8. In the Deploy menu, click Deploy.
    AtScale Design Center - Deploy
  9. Once it has deployed, click the Catalogs icon on the left sidebar.
    AtScale Design Center - Catalog
  10. Go to Repo: AtScale Tutorials→sml-models_main to pull down your connection strings.
  11. Repeat this process for the other tutorial models: tpc_ds and ww_importers.
  12. With the connection deployed, you can run queries on your models against BigQuery instead of Postgres.

Watch all of the Developer Community Edition demo videos on demand.

AtScale Developer Edition