Sergey Nivens - Fotolia
Data observability startup Monte Carlo on Tuesday marked the next phase of its evolution, raising $25 million in a Series B round of funding.
Repoint Ventures and GGV Capital led the round with participation from Accel, bringing total funding to date up to $40 million.
Based in San Francisco, the vendor was founded in 2019 with the mission of enabling enterprises to work with good data that provides correct and accurate information in order to make good business decisions.
Data observability is a class of technology that addresses the challenge of data reliability, providing capabilities that help improve the quality and usability of data for analytics and business intelligence applications.
Monte Carlo is among a group of vendors that develop data observability technology. The company's flagship offering is the Monte Carlo Data Observability platform, which provides features that help users ensure that information coming in through different data pipelines, including from data warehouses, cloud data lakes and data event streaming, is reliable.
In this Q&A, Barr Moses, co-founder and CEO of Monte Carlo, discusses the challenges and opportunities of data observability.
What is Monte Carlo Data all about?
What we realized is that most often when organizations actually want to use their data, it's not sufficient to just be able to store, aggregate and collect it. You actually need to be able to trust it. You actually need to be able to know that it's accurate, and that it's reliable, and that's really what Monte Carlo does.
Our mission is to help organizations become data-driven by reducing what we call data downtime. Data downtime is a term that we've coined to help describe times when your data is wrong or inaccurate or otherwise erroneous, which is something that anyone in the data space has likely experienced.
How do you define data observability?
Moses: The technology that we call data observability is a corollary to application observability in software engineering and DevOps. In the software engineering space, the concept of observability is very well understood. Every engineering team has a technology like New Relic or AppDynamics to manage the reliability of their applications and infrastructure. It's a no-brainer, right?
However, in the data space for some reason, we are flying blind, we don't have the same set of solutions. So what we've done is taken those concepts of application observability and applied them to data.
Data observability is sort of a best practice to manage the reliability of your data. It gives you an understanding of the health of your data, and helps you minimize data downtime.
What are the key challenges for data observability?
Moses: Data can be unreliable for many different reasons. It could be the data hasn't arrived intact, or that perhaps it has duplicate values.
There are actually five pillars of data observability to help organizations gain the confidence that is needed to make sure that data is reliable.
The first pillar is freshness. Freshness is everything about the timeliness of your data. The second is volume and whether you are getting more data or less data than you expect.
Barr MosesCo-founder and CEO, Monte Carlo
The third is distribution at the data field level. Perhaps in the data field where you're tracking credit cards, you have letters instead of numbers. There's a whole slew of metrics that you can look at for distribution at the field level, whether it's uniqueness, null values or negative values.
The fourth pillar is schema. So this is around the structure of the data, whether tables are added, removed or changed.
The fifth pillar is lineage. We automatically reconstruct the lineage both within a particular system, say a data warehouse and also across systems. We can tell you if there was a schema change that happened upstream in a data lake that caused a table in your data warehouse to have a freshness problem, and now causes your business intelligence reports to be inaccurate.
How do you apply data governance to observability?
Moses: I think most data governance initiatives in organizations today are driven by a lot of manual work. That work requires people to actually manually map out their lineage or define custom manual thresholds for their data.
We think that with our approach to data observability, you can actually start with a baseline of machine-learning-driven, automated insights for your data governance initiatives.
With strong data observability technology you can answer questions like: Where's this data coming from? Who's relying on this data? Who's using this data? When was this data last used? Those are all data governance questions.