March 6, 2019TECH TALK: AtScale 6.0 brings Universal Semantic Layer Benefits to Google Cloud
I recently hosted a live customer webinar with a Forbes Top 50 FinTech company that uses AtScale to support their data science initiatives.
Trumid’s senior data engineer Colin Reid joined me to share insights into the problems his team is addressing and the best-in-class data stack they built.
Here are some of the most memorable points from our conversation:
Tell Us About Your Organization and Your Role.
Colin Reid is a senior data engineer who joined the company almost a year ago. A veteran of Uber and Telstra Ventures, his time is focused on building ETL pipelines and the systems and tooling to support that. He also does some work on data visualization, reporting analytics, and dashboards.
Trumid itself is a credit trading network involved in trading primarily United States corporate bonds. They have about 130 employees and trade about $2 billion a day in corporate bonds over about 600 onboarded accounts. These include large and small hedge funds, big asset managers like insurance managers and pensions, as well as sell-side institutions at the major Wall Street banks like JP Morgan and Goldman Sachs.
The bond market is currently somewhat analog compared to other parts of the securities market: if you trade a stock electronically, there won’t be a human being involved, whereas most institutional bond trading still takes place by phone and instant messaging. There may be a bilateral negotiation before they come to a price and quantity that they want to execute the trade at.
Trumid is trying to change that by moving as much of the bond market onto electronic platforms as possible, and building the best customer experience for that type of trading.
What Use Cases Are You Focused on in Your Data Program?
Trumid’s use cases are familiar to data people: reporting, analytics, and machine learning.
Reporting activities include periodic reports to investors, content for board meetings, or particular performance-related numbers for clients. The goal is to automate this part so users can answer questions about trade volumes and performance directly.
A separate use case is analytics. For example, if Trumid releases a new feature and wants to know how many people have used that feature, or whether it has changed customer behavior on the platform. The analytics use case typically has a smaller audience than reporting.
Finally, the machine learning use case. At this point in Trumid’s history, they do not have a lot of machine learning in production, but it’s something they are actively working on.
“We have a number of people performing research from the technical side, training models, trying to determine places we can leverage our data to make smarter decisions,” Reid said. “Also from a product angle: where does it make sense to include this in the platform, and where can our users’ experience be improved if we add some intelligence or some automation?”
The Trumid Data Stack
To support these three use cases, Trumid built a data stack that should also look familiar:
Trumid has a number of data sources:
- The majority sits in Postgres databases that support microservices in production. They use Fivetran to bring this into their warehouse on Google BigQuery.
- A lot of data also sits in Publish/Subscribe systems. They built a homegrown service that loads that data from Kafka in microbatches into Google BigQuery.
- Finally, they get a lot of data from external sources like data vendors. These come in a variety of formats – sometimes a CSV, sometimes they have other services that periodically go to vendors’ APIs. They have to deserialize those and get them into BigQuery. This is the least standardized part of the system.
Once data is in BigQuery, they have several ETL layers to process that data into something that makes sense for the reporting and analytics use cases. That is all managed with dbt. Once the data is in a place where they want to use it, it goes through AtScale into Looker.
“We have a couple of AtScale models to present this data to Looker, and at this point almost all of our users’ interaction with data comes through Looker,” Reid said. “This is both curated content that folks have saved in our Looker instance, and also through Looker Exploration.”
Almost all users have access to go into a data model, poke around, and create their own visualizations or even dashboards if they want to tie multiple visualizations together. The AtScale layer means they’re able to get this efficiently from BigQuery to Looker very efficiently.
Their Previous Stack
According to Reid, the Trumid team started building out this stack last year to make reporting available across the country. The stack they had been using had a couple of deficiencies:
1) Modeling: Each time someone wanted to dig into data for the business, they had to come to someone on the data team and ask them to produce a model. That friction meant it took a lot longer to get questions answered.
2) Latency: In previous systems, data would have to be pre-calculated so it would be available for the visualization layer. This meant visualizations were fast, but they were not pulling from the most recent data; they were pulling from the most recent extracts. This introduced a layer of latency beyond the ETL layer which meant users couldn’t look at data until the next day. Any decisions made during the trading day weren’t using the most recent data.
3) Consistency: Trumid has internal users, clients, and investors, as well as their marketing team. They needed to be sure they were reporting consistent numbers to stakeholders.
How Trumid’s Data Stack Facilitates Self Service
Trumid’s data stack facilitates self service from several angles:
- Because Looker is easy to use and AtScale makes data available, anyone can create dashboards, reports, or ad hoc visualizations. They can then share these with a URL.
- The amount of data the Trumid team can expose in a single model is represented by 40-50 tables in BigQuery. For a non-technical user to understand the joins, filters, and aggregations involved would require a steep learning curve. AtScale instead separates the tables from the visualizations, which reduces friction to onboarding data users.
- Making analytics easier: Trumid operates in a highly regulated industry. Being able to implement permissions dynamically using Looker’s permission model allows them to feel comfortable granting people access to Looker, while ensuring they’re seeing the same aggregates and relationships as anyone else.
Ultimately, giving users data upfront and allowing them to use it to their heart’s content (while knowing permissions will be applied appropriately) allows a lot more people to make data driven decisions. Looker’s capabilities mean any user is able to create dashboards and visualization content. And they’re able to create content to present the data without any help in terms of creating models, asking permissions, or sharing.
According to Reid, when business users are able to access and use data without involving his team, they have more time for research. “AtScale and Looker run themselves these days,” Reid concluded. “You guys made it so seamless we don’t have to spend much time on maintenance, it’s just adding features we have to spend time on.”
You can watch the full discussion I had with Colin Reid about How to Power Better Decisions with a Modern Data & Analytics Stack, or you can download our new customer story featuring Trumid. In addition, if you like what Trumid is doing (or their cool data and analytics stack), they’re hiring, and Colin says to reach out to him directly if you are interested because they have several jobs that are not listed.