Gorodenkoff - stock.adobe.com
Fivetran, Monte Carlo target data observability at ingestion
An integration between the vendors enables joint users to get visibility into their data as it's ingested into the data pipeline and before bad data can cause downstream problems.
Fivetran and Monte Carlo unveiled a new integration that will enable joint customers to get observability into their data at the beginning of the data pipeline.
Based in Oakland, Calif., Fivetran is a data integration specialist whose array of integrations and connectors enable users to extract, transform and load data from a variety of data sources into their data storage repositories.
Monte Carlo, meanwhile, is a data observability vendor based in San Francisco whose platform helps customers monitor their data throughout its lifecycle to make sure it's reliable when needed to inform an organization's analytics and data science operations.
Now the two vendors have developed a formal partnership that was revealed during Fivetran's Modern Data Stack conference on Tuesday in San Francisco. The integration is the result of that partnership and is free to joint customers as part of their existing subscriptions to Fivetran and Monte Carlo.
Data observability used to be a relatively straightforward process. Many organizations generated most of their own data and housed it in on-premises databases, which made monitoring that data somewhat simple.
Now, however, organizations generate and collect data from a vast network of sources and store some of that data in data warehouses, some in data lakes and some in databases. Each type of data repository can be in the cloud, be on premises or exist as a hybrid of the two.
The combination of the upturn in data sources and places where organizations store their data makes observability far more complex than it once was. It also makes it more critical.
Meanwhile, data analysis itself is becoming more widely accepted as a necessary part of the decision-making process, noted Lior Gavish, co-founder and CTO of Monte Carlo.
Therefore, in recent years, specialists such as Monte Carlo and Acceldata have emerged with entire platforms that address organizations' data observability needs.
However, data observability tools are generally applied after data has been ingested and once it's begun being transformed for analytics and data science use.
But that can lead to complex problems that require significant time and effort to address. By the time data has moved beyond ingestion and into the data pipeline, it has often been combined with other data. And bad data points or anomalies can be more difficult to find and fix.
So Fivetran and Monte Carlo teamed up to apply data observability capabilities at the point data gets imported from its source and injected into the data pipeline.
That is a significant strategic move, according to Kevin Petrie, an analyst at Eckerson Group.
"To [quickly] spot and fix issues with data quality, observability tools need to check quality at each point of a data pipeline," he said. "They also need to integrate with the modern enterprise's ecosystem of data management tools. By integrating with Fivetran at the point of data ingestion, Monte Carlo aims to help users find errors or other quality issues and start the remediation process that much faster."
Gavish said applying data observability at the point of ingestion is key to the integration between Fivetran and Monte Carlo.
The cost of bad data can be significant, he noted. Videogame maker Unity Technologies and Twitter are two high-profile enterprises that have been hurt by bad data. Gartner estimates that poor data quality costs organizations an average of almost $13 million each year.
"If data coming in is wrong, everything downstream is going to be wrong and broken," Gavish said. "So, that's a particularly impactful part of the stack, and it's extra important to monitor reliability there."
While both Fivetran and Monte Carlo have integrations with other vendors, their integration with one another is the first for each that enables data observability at the earliest stage of the data pipeline.
Each vendor could have selected other vendors to partner with to apply data observability at point of ingestion, but the two have numerous joint customers, which made this alliance more intuitive.
Kevin PetrieAnalyst, Eckerson Group
And though not formally established through a partnership until now, the two have been working together for some time, according to Michael Bull, director of technology alliances at Fivetran.
"We've been working with Monte Carlo for a while now, but before this, more in the context of the modern data stack," he said. "Our users have independently used our tools together for quite some time. So this is our opportunity to expose the metadata that we have access to and provide that value back down to our shared customers."
But while the integration between Fivetran and Monte Carlo is the first for each that aims to provide data observability at the start of the data pipeline, it is not exclusive, according to both Bull and Gavish.
Bull noted that Fivetran also maintains integrations with data catalog providers and could potentially integrate with a Monte Carlo competitor, such as Acceldata.
Monte Carlo, meanwhile, has forged integrations with data transformation vendors including DBT Labs and could potentially form integrations with Fivetran competitors, such as Qlik and Talend.
That said, the integration between Fivetran and Monte Carlo is part of a larger partnership that includes a joint go-to-market strategy.
With Fivetran and Monte Carlo are now partners, Petrie said he expects Monte Carlo to continue adding integrations with data management vendors.
He noted that Monte Carlo differentiates itself from other data observability vendors by integrating with a variety of different capabilities along the data pipeline. In fact Fivetran makes sense as in integration partner because Monte Carlo didn't yet integrate with any data ingestion specialists.
"Fivetran is a natural integration partner because so many hybrid and cloud environments use them for data ingestion," he said.
So as Monte Carlo continues to work with tools that specialize in the different stages of the data pipeline, Petrie said he expects integrations with more vendors to come.
"Ultimately data observability is part of larger data management processes," he said.
For example, it aids data governance processes such as data cataloging, Petrie continued. He noted that Monte Carlo's existing integration with Databricks' Unity Catalog enables organizations to better govern their data.
"I would expect more such partnerships in the future to assist data governance," he said.
Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.