Sergey Nivens - Fotolia
With data spread across multiple sources and locations, getting visibility into what data an organization possesses can often be challenging.
Among the vendors that provide data observability tools is Soda, based in Brussels. The 2018 startup has been busy in 2021 expanding its product portfolio.
On April 1, Soda released in general availability its Soda Cloud platform, which provides a managed service for providing organizations with data quality and data collaboration capabilities. Soda is unrelated to the SODA Foundation, an open source data effort operated by the Linux Foundation.
The launch of the Soda Cloud follows a series of notable events from the company in 2021.
On Feb. 9 the vendor released its open source Soda SQL tools, which enables users to test data sets to ensure that data is properly configured and structured. Rounding out the busy start to the year, on Feb. 2 Soda said it had raised €11.5 million (approximately $17.7 million) in a Series A round of funding led by leading European venture investor, Singular, based in Paris.
Soda enables data observability for Cloud Academy
Among the early users of Soda's technology is Alessandro Lollo, a senior data engineer at technology training platform Cloud Academy, based in San Francisco.
Lollo said Cloud Academy uses Soda SQL, Soda's open testing and monitoring tool, to apply tests and metric directly to operational data sources. With the launch of Soda Cloud, Cloud Academy is now working on integrating the platform into its environment to help provide insight into the overall quality of the data.
Cloud Academy has many microservices that communicate with each other, exchange and transform data, Lollo explained.
"Each microservice is responsible for a specific domain of the Cloud Academy platform or product, so we have many different data sources to mix together when doing analytics," Lollo said. "Combining many different data sources to create meaningful analytics is a challenging task, especially when it comes to assessing the quality of the analytics."
How Soda enables data observability and monitoring
Maarten Masschelein, co-founder and CEO of Soda, said the vendor's platform helps data teams discover data problems and then guides them to efficiently prioritize and resolve them.
Soda aims to be useful from the time that data is ingested into any data platform, Masschelein said. To that end, Soda is designed so that it can be embedded into streaming data and Spark workloads to help enable automated data monitoring.
Soda can help monitor and track data updates to help identify if a given data set is complete. For example, Soda could alert a business intelligence tool user if only a certain percentage of the average volume of data has been processed when they are conducting an analysis.
"We try to communicate as far as possible into tools that the analysts are using, so that we can bring the analysts closer to the owners and the producers of data,"Masschelein said.
Alessandro LolloSenior data engineer, Cloud Academy
The intersection of data observability and data fitness
A core element of Soda's data observability platform is a concept that Masschelein referred to as data fitness.
"Many people consume data, but it's not always clear to the data producer who is consuming the data and for what purposes, so they get what they actually need," Masschelein said. "For us, fitness comes from fit for purpose and that's really about bringing context to the use cases of data."
Enabling data fitness overlaps with concepts often associated with data governance as well. Masschelein noted that from Soda's perspective data governance means connecting people to data in complex organizational settings.
"For us, it's really about bringing people together so they can do the right thing when it comes to data," he said.