
Getty Images/iStockphoto
Monte Carlo adds observability for unstructured data
With AI tools requiring large amounts of high-quality input to be accurate, the vendor is making it easier to monitor all types of information as users build modern applications.
Monte Carlo launched unstructured new data observability capabilities, enabling users to natively monitor assets such as documents and chat logs so they can be trusted when used to inform data and AI applications.
Data products like dashboards and reports, as well as AI-fueled tools such as chatbots and machine learning models, have historically been fed with structured data such as financial and transaction records. Meanwhile, unstructured data including text, audio files and images that is more difficult to operationalize has often been dumped in data warehouses and lakes and gone unused.
Recently, however, as interest in developing generative AI (GenAI) applications surged after OpenAI's November 2022 launch of ChatGPT represented a significant advancement in GenAI technology, unstructured data has taken on greater importance.
Unstructured data now makes up the vast majority of all data, with a 2023 study by IDC placing it at 90%. Therefore, for AI tools -- including GenAI assistants and agents -- to be as accurate as possible, they need unstructured data in addition to structured data.
Monte Carlo customers could previously use the vendor's platform to monitor unstructured data. However, it required complex configuration on the part of the user, while the new capabilities can be applied without requiring users to write a single line of SQL.
As a result, Monte Carlo's addition of native observability capabilities for unstructured data on May 28 is significant, according to Stephen Catanzano, an analyst at Enterprise Strategy Group, now part of Omdia.
"Monte Carlo's addition of unstructured data monitoring capabilities represents a significant advancement for its users," he said. "What makes this particularly valuable is the no-code approach that democratizes access to these capabilities. Data observability across structured and unstructured data provides a comprehensive view of data health that wasn't previously possible."
Based in San Francisco, Monte Carlo is a 2019 startup whose platform enables customers to observe data over its lifetime, monitoring characteristics such as freshness and lineage to ensure it can be trusted to inform decisions and actions. In April, the vendor unveiled Observability Agents, a set of autonomous agents that take on data observability tasks.
Observing unstructured data
As data volume exploded over the past 15 years, making it impossible for even teams of engineers to monitor data quality, vendors such as Monte Carlo, Acceldata, Metaplane and Soda Data emerged with platforms that automate monitoring massive amounts of data for quality and alerting users to problems.
However, observability was limited to structured data unless the user configured it to also include unstructured data. Now, Monte Carlo is making a concerted effort to meet the growing need for high-quality unstructured data required by GenAI development.
Customer demand, in fact, was the impetus for Monte Carlo, according to Shane Murray, the vendor's head of AI.
"The advent of generative AI has turned unstructured data into a critical input powering analytics, data products and AI applications," he said. "If the quality of that data is compromised, the AI outcomes are too. Our customers were clear that high-quality unstructured data isn't just important; it's foundational to building powerful, reliable AI."
Regarding the timing of Monte Carlo's addition of native unstructured data observability, the maturity of enterprises' AI development projects was a driver, Murray continued. Following more than two years of experimentation and pilot projects, a growing number of enterprises are ready to use GenAI more extensively, and that requires high-quality data.
"As organizations move AI from pilots to production, they need to be able to easily trust that their unstructured data is reliable, Murray said.
Therefore, the new capabilities target a real need, according to Donald Farmer, founder and principal of TreeHive Strategy.
"Adding observability for this data addresses a critical blind spot," he said. "Because unstructured data now powers the majority of generative AI and advanced analytics applications, without observability, organizations risk feeding unreliable data into their AI models."
Monte Carlo's unstructured data monitoring is now integrated into to the vendor's monitoring engine, enabling users to apply AI-powered observability capabilities to observe unstructured data fields.
Supported data management platforms include Databricks, Snowflake and Google's BigQuery, with Monte Carlo's unstructured data observability tools natively integrating with each platform's large language model or AI function libraries.
Monte Carlo is not the first vendor to add unstructured data observability capabilities. Anomalo unveiled such capabilities in beta testing in July 2024. However, Monte Carlo is among the first to provide a valuable feature that is now needed as technology has made unstructured data more accessible, according to Farmer.
"Unstructured data has been growing exponentially, but … the computational resources required to process and analyze large volumes of unstructured data were prohibitive until recent advances in cloud computing and distributed processing frameworks," he said.
Catanzano noted that other data observability vendors provide strong platforms that can be configured to monitor unstructured data. However, Monte Carlo's targeted unstructured data observability capabilities and no-code approach now are differentiators.
Looking ahead
"Monte Carlo appears to be positioning itself as a pioneer in this space [by closing the gap] between structured and unstructured data monitoring," Catanzano said.
Following the launch of unstructured data observability capabilities, Monte Carlo will continue to focus on helping customers prepare data for AI by ensuring their data can be trusted, according to Murray.
He said the vendor plans to develop new AI features to assist AI teams, add integrations with AI-native platforms and provide observability capabilities not only for data pipelines but also for AI products themselves.
Focusing on monitoring AI models and applications would be a logical way for Monte Carlo to broaden the breadth of its platform, according to Catanzano. So would adding AI-generated explainability features and data and AI governance capabilities.
Farmer, meanwhile, suggested that possible ways Monte Carlo might expand include adding proactive anomaly prevention and self-healing data pipelines, automated data quality improvements and AI-driven prioritization of incidents based on business impact.
Eric Avidon is a senior news writer for Informa TechTarget and a journalist with more than 25 years of experience. He covers analytics and data management.