Prefect raises $32M for dataflow automation technology
Data startup Prefect is looking to grow its open source project and commercial business to help organizations manage automated data pipeline requirements.
Open source dataflow automation startup Prefect on Thursday said it raised $32 million in Series B funding, bringing the vendor's total funding to $57.6 million.
Prefect, based in Washington, D.C., develops a technology platform that enables organizations to build automation for moving data from a source to a target location so the data can be used for data science, analytics or business intelligence.
Prefect's founder and CEO, Jeremiah Lowin, spent four years working on the open source Apache Airflow project, which provides some data pipeline automation capabilities. Lowin created the open source Prefect project to respond to what he saw as a need for more automation and data engineering capabilities than Airflow provides. Prefect also provides the commercial Prefect Cloud service, a managed service for dataflow automation.
In this Q&A, Lowin explains why the vendor is raising money and what the dataflow automation field is all about.
The funding round was led by Tiger Global with participation from new investor Bessemer Venture Partners.
Why are you now raising a Series B?
Jeremiah Lowin: Our community and the use of our product is exploding so much faster than we can keep up with it at this time.
We have tried to answer every question in our Slack community within 15 minutes from day one. Today, people are joining that Slack community at a rate of almost up to 1,000 a month and we have a team of only 20 people. So we are really straining to continue to deliver that same level of commitment. We need to deploy more resources and more people to help deliver value, not only in our commercial products, but in open source as well.
We see the data stack as a whole just exploding in terms of attention and the need for Prefect is growing exponentially. It was just an extremely opportune moment to scale the company.
So on the expansion front, we're hiring 50 people. We're really taking a holistic look at our products and wondering where we can really move the needle and tackle use cases that tools in our space have been historically unable to tackle. On the partnership front, we've just announced a partnership with Microsoft and we have a lot of ecosystem partnerships that we'll be developing and announcing over the next few months.
What is Prefect's conception of dataflow engineering?
Lowin: My background is as a data scientist and machine learning researcher. I spent most of my career doing risk management for financial services firms and investment companies.
I was on the PMC [Project Management Committee] of Apache Airflow, which was a wonderful tool when I was dealing with daily latencies and slow-moving, regularly scheduled batch processes. But the analytics world is not like that.
We define dataflow automation as extending state-based pipelines, which is to say pipelines that depend on simple criteria, like did this step succeed, or did the step fail? When a data science pipeline fails, you just run it again. There's no concept of resuming or responding to failure in a first-class, fully automated way.
Dataflow automation is about merging the semantics from the data engineering world where the state-based transitions and responding to failure are critical and imbuing those pipelines with the data transmission capability from the data science world.
Jeremiah LowinFounder and CEO, Prefect
Prefect makes software that makes automating data pipelines straightforward. We provide scheduling, monitoring, logging and retries, and other features that really take a significant burden off of data scientists and engineers who are much more focused on business objectives.
Anywhere that there's a need for governing the motion or transmission of data or an analytic, there's an opportunity for a disruptive failure, and consequently, for an orchestrator like Prefect to add value by guarding against those failures.
What is the intersection between dataflow automation and data quality?
Lowin: The dataflow orchestrator is often the place where it's natural to attach the data quality and data governance checks.
We work closely with the team at Superconductive that makes the open source Great Expectations data quality project, which is a really cool tool.
As the data workflow orchestrator, it's very important to us that a tool like Great Expectations has a really natural integration with our product.
What do you see as the key challenges for dataflow automation and Prefect?
Lowin: So one of the biggest challenges we see is actually that homegrown workflow systems are built by users. They're built around a very specific instance of a problem that emerge in a very specific part of the business and then that person will almost become the leader of the orchestrator within that organization.
It's no coincidence that if you look out at the world, you'll see hundreds of data pipelining tools that have been open sourced by engineers and large companies who thought: 'Maybe other people have this exact same problem and use case.' Inevitably, they don't, because these tools are insufficiently generalized to handle the use cases of most companies.
I think a majority of our users, our customers, in fact, are coming off of homegrown systems that have just become way too much of a maintenance burden. So I think one of the greatest challenges here is actually that instinct not to reach for a third-party tool. As a stereotypical engineer, you'd solve the problem yourself. That's what you do.
Editor's note: This interview has been edited for clarity and conciseness.