your123 - stock.adobe.com
In recent years, data build tool technology has been an increasingly popular open source tool for many data stacks.
On Feb. 24, DBT Labs, the leading vendor behind the open source DBT tool, said it had raised $222 million in a Series D round of funding that the vendor will use to grow its platform and go-to-market efforts.
At the foundation of DBT Labs is the DBT core open source project, whose technology provides a programmatic approach to data transformation and is used build data pipelines as part of an ETL (extract, transform and load) process.
DBT Labs' aspiration is to be the standard middle layer through which data can flow from various sources into data warehouse, analytics and business intelligence tools.
In this Q&A, Tristan Handy, CEO and co-founder of DBT Labs, talks about his company and how its technology it fits into the modern data stack.
Why are you now raising money now for DBT labs and data transformation?
Tristan Handy: The decision to raise money on some level is a market decision. We have actually never in our history gone out to the market and said, 'Hey, you know what, we really need capital to fund the next 18 months of our operations' or anything like that. It's very consistently been the market coming to us and saying, 'I think that it's a good time for you to raise money' and then we look at our current situation and see if makes sense for us.
The reason the timing made sense for us now is that we are going through this big shift with a pretty significant run up in our community size and the footprint that we have in the ecosystem.
What does DBT enable for data transformation that was missing from traditional ETL tools?
Handy: The addition of window functions to the SQL specification back in 2008 made it such that it was possible to express essentially all the different types of logic needed to reshape data. When we started thinking about DBT in 2015, most data practitioners I don't think realized you could do that with SQL.
Tristan HandyCEO and co-founder, DBT Labs
Companies have access to cloud data warehouses like Amazon Redshift, but they're still using antiquated processes to get data into their systems. They're writing custom code to load it and they're using heavyweight tools and technologies to reshape it.
I'm a data analyst and not a data engineer. I knew the pain of being asked by my boss to go figure out the answer to some question and then getting stuck, waiting for a data set that I don't have, and then waiting in a queue for days while a data engineer does their job. So I wanted to find a way to give to data analysts the tools to do data transformation.
What do you see as challenges of data transformation?
Handy: On some level, the act of data transformation is kind of mundane. It is about taking a table that looked like one thing and making it look like another. It's not like you're creating new information either, you're just taking the existing information and shaping in a way that lends itself to analysis.
But the fun thing, once you take this process far enough, and you write clean code, you now have a team of people that are all collaborating together to curate a shared data asset.
What you realize you're doing over time with data transformation is you're curating the knowledge of the organization that you work for. The DBT graph is in a sense the cleanest encapsulation of how information flows through a business.
What is the difference between DBT core and the DBT labs commercial platform?
Handy: DBT core is Apache-licensed and you can literally do anything with it you want, and that includes competing directly with us and our commercial offerings
DBT Core is the engine. When someone is writing DBT code, they're writing SQL plus this layer we call Jinja that provides users with more of the constructs that a data analyst would want to use to build pipeline. DBT core knows how to compile the code that we call DBT SQL down into raw SQL that can then be run against a data source.
The capabilities of DBT core can be enough for some users in our ecosystem. Our commercial product DBT Cloud extends DBT in two important ways. One of them is that it provides a persistent way to put DBT code into production. The other part is it provides a user experience that is easier to write DBT code for folks who may not love the idea of getting into all of the nitty-gritty of the developer experience on a local machine.
What's next? And what's your hope for the future of DBT?
Handy: We really want to be a semantic layer for data. We want to be able to push the knowledge that DBT has into the hands of people who want to build reports, dashboards and data products.
The age-old problem in data is you come to a meeting and two people have two reports from two different systems and they both show different numbers for revenue. It's just such a boring thing to talk about, but it's still a problem that few have managed to solve.
Everyone who's tried to make progress on this problem has done it in under the auspices of a commercial software product that they've then tried to get everyone to use, and it doesn't work like that. So what you need is an open layer that all of these tools can interact with to get their data definitions of revenue and of all the different business metrics that you need.
Check back with me in two years, it may well be a windmill to be charging at, but I really think that we have an opportunity to do something that the whole industry has failed to do for a very long time.
Editor's note: This interview has been edited for clarity and conciseness.