michelangelus - Fotolia

Superconductive raises $21M for open source data quality

The open source Great Expectations project is becoming increasingly popular, as the commercial vendor seeks to build out a cloud service to expand the project's reach.

Superconductive said it has raised $21 million in a Series A round of funding led by Index Ventures.

The 2017 startup, based in Redwood City, Calif., is the lead commercial sponsor behind the open source Great Expectations data quality technology, which can be embedded into data pipelines as well as on top of different data warehouses and databases.

The Great Expectations project is designed to help ensure that organizations are getting data in a format, structure and quality level that is usable for analytics, data science and business intelligence use cases. Superconductive is now aiming to provide commercial services on top of Great Expectations to help organizations better use their data.

In this Q&A, Abe Gong, CEO and co-founder of Superconductive, provides insight into the challenges of data quality and where the vendor is headed.

Why are you now raising a Series A for Superconductive's data quality efforts?

Abe GongAbe Gong

Abe Gong: The single biggest thing is community momentum. Until the end of last year we had a small team with just over a dozen people.

We're approaching a million downloads per month of the open source project. So one of the things that we want to invest in is just being able to support open source better. We'll also be working on building a SaaS platform on top of that.

You know now that there's more money involved, so there could be a concern about the open source project. We are emphatically open source. Great Expectations will always be an open source community.

Over time, the long-term business model is to build tools on top of that shared open data quality standard and existing open source foundation. The goal is to make it easier to collaborate, to make it so you can just deploy more effectively and integrate with other workflows within an enterprise setting.

Without taking anything away from what the open source project already does, there's a lot of additional useful stuff you can build on top of it.

How do you define data quality?

Gong: In Great Expectations, we think of data quality as something that you expect about data. That is some kind of factual expectations of the data.

For example, which columns exist, how many rows are there, what are the ranges for those columns, what are the distributions and what are the correlations? What you can do is you can express exactly how the data should look and what properties it should have at any given point throughout its lineage.

Data quality with Great Expectations allows you to automatically test and verify that the data does in fact have those properties, and then you can build a lot of other things on top of it. There is also a distributional expectation where you can say, 'Here's a histogram of what the data looked like in the past, and future data must continue to have that same shape.' So it's not just a schema thing; it's really getting into the properties and kind of the contours and texture of data.

The interesting thing about data quality is that it's clear that it's important everywhere... There's a role for data quality to play in almost every kind of infrastructure.
Abe GongCEO, Superconductive

The interesting thing about data quality is that it's clear that it's important everywhere. There's nobody who's going to say 'Well I'd like the system, but I want it to be low quality.' There's a role for data quality to play in almost every kind of infrastructure.

What types of data source do you commonly sees as being used with Great Expectations for data quality?

Gong: We take a very expansive view on data sources. Great Expectations is set up so that it can work against structured data in data warehouses or relational databases. We also execute and work within a lot of things that aren't as tightly structured as relational databases. We also do Spark

We can't build the whole universe, but we very much want to collaborate with other tools, and we see ourselves as a shared open standard for data quality that people will be using in lots of places.

Editor's note: This interview has been edited for clarity and conciseness.

Dig Deeper on Data governance

Business Analytics
Content Management