metamorworks - stock.adobe.com

Monte Carlo boosts data observability with generative AI

The vendor unveiled tools aimed at improving engineering efficiency, one enabling data engineers to more easily fix code and another enabling code generation with natural language.

Monte Carlo unveiled two new tools that incorporate generative AI capabilities aimed at making the vendor's data observability capabilities more efficient.

Fix With AI and Generate With AI were both developed by integrating generative AI and large language model (LLM) technology from OpenAI -- the developer of ChatGPT -- and were launched within the past two weeks. Both are now generally available.

Based in San Francisco, Monte Carlo offers a data observability platform designed to help users monitor their data throughout the data management lifecycle to ensure it's of high quality when needed to inform analytics and data science projects.

Recently, the vendor integrated with Fivetran to enable organizations to begin observing data at the point of ingestion rather than later in the data pipeline.

Observability and AI

Data observability was a relatively simple process when organizations collected data from a small number of sources and stored all their data in an on-premises database.

But with both the complexity of data and the amount of data organizations collect on the rise and the systems needed to handle all that complicated data also becoming more elaborate, monitoring data quality is becoming more difficult.

As a result, vendors like Monte Carlo and Acceldata now specialize in data observability.

But even their specialized tools require lots of manual labor. While data observability platforms like Monte Carlo and Acceldata can automatically monitor data workflows for quality issues and alert users to any problems, it often falls on data engineers to fix those problems.

That requires engineers to spend copious amounts of time writing and rewriting code and prevents them from building out their organizations' data and analytics operations.

Generative AI, however, has the potential to change that and make data experts more productive, according to Kevin Petrie, an analyst at Eckerson Group.

Petrie recently surveyed data practitioners and found nearly half already use ChatGPT to help with data engineering applications such as documenting their environments, building starter pipelines and debugging pipeline code.

"Language models can help data engineers get a much-needed productivity boost," he said. Similarly, Lior Gavish, co-founder and CTO of Monte Carlo, said generative AI can make data observability and other data management tasks more efficient.

He noted that the main applications of data observability are detecting data problems, helping data teams fix problems when they arise and preventing future problems with data.

Seven data observability benefits for organizations
Monte Carlo recently launched two tools incorporating generative AI technology to improve the efficiency of its data observability platform.

"Generative AI can impact all of them," Gavish said.

For example, a lot of data observability platforms' detecting and alerting capabilities are available upon deployment. But organizations often do considerable customization on top of those prebuilt capabilities as their data pipelines become more complex. Generative AI can automate some of that customization by generating code, according to Gavish.

Another way generative AI can factor into data observability is by helping users collaborate and explain in simple terms how to fix problems and optimize systems, he continued.

Meanwhile, just as generative AI can improve data observability, the opposite is true, according to Petrie: data observability can improve generative AI.

LLMs need to be trained with accurate and reliable data to reduce the risk of inaccuracy. Data observability is designed to increase the reliability and accuracy of data.

"Data quality observability and generative AI technologies such as language models have a symbiotic relationship," Petrie said. "Language models desperately need accurate, complete, consistent and reliable training inputs so they can reduce the risk of their outputs. Data observability can improve training inputs in these ways."

New capabilities

Fix With AI and Generate With AI are Monte Carlo's start with generative AI and LLMs.

Generate With AI enables users to create SQL code using natural language. The tool can translate natural language into SQL so users can type commands and queries in natural language that then gets converted into the SQL code that Monte Carlo's platform understands.

Fix With AI, meanwhile, is related to detecting problems in data pipelines.

Data quality observability and generative AI technologies such as language models have a symbiotic relationship. Language models desperately need accurate, complete, consistent and reliable training inputs so they can reduce the risk of their outputs. Data observability can improve training inputs in these ways.
Kevin PetrieAnalyst, Eckerson Group

As customers customize their Monte Carlo deployments, they often write code in SQL to expand the platform's capabilities. Sometimes, however, they find problems with code they need to fix.

Fix With AI not only alerts users to problems with the code but uses generative AI to suggest the fix rather than forcing the user to manually figure out the code needed to fix the problem and then write that code.

In addition, like Generate With AI, Fix With AI responds to natural language. It similarly converts natural language into SQL, which enables data engineers to use natural language rather than code to address problems in their data pipelines.

"Even data people are sometimes scared of SQL," Gavish said. "They have a love-hate relationship with SQL. We saw an opportunity because generative AI is good at translating natural language to SQL and at debugging SQL."

Ultimately, the result is improved productivity, according to Petrie.

"Fix With AI will help Monte Carlo users become incrementally more productive because they can put effective data quality controls into place faster," he said.

Meanwhile, Fix With AI and Generate With AI improve on what customers can do on their own with Monte Carlo and generative AI, according to Gavish.

When users develop their own applications combining the capabilities from generative AI and LLM platforms with a platform like Monte Carlo, they must configure their own security and governance capabilities. There are security concerns with ChatGPT and other generative AI tools. An individual organization may not be able to create the same walled garden a vendor like Monte Carlo can create with a formal integration.

"They're ready for business," Gavish said. "They meet the security and compliance requirements that our customers have so they can be used for work that some companies would be careful to do on their own with ChatGPT."

In addition, given that Monte Carlo has metadata about its customers' data, it can securely feed that information back to the generative AI tools. That, in turn, enables the chatbot to return more accurate responses to customer queries, Gavish noted.

"ChatGPT knows nothing about an organization's tables or lineage. But if it uses [generative AI] within Monte Carlo, it suddenly knows what tables exist and what fields they have and other important information that makes the process faster and more accurate," he said.

Looking ahead

When ultimately deciding to make Fix With AI and Generate With AI its first forays into generative AI, Monte Carlo brainstormed about 70 capabilities that combine generative AI and data observability, according to Gavish.

The vendor's roadmap, therefore, includes some of those features.

"Our first set of features was focused on where generative AI is good out of the box. It's good at SQL because OpenAI trained it to be," Gavish said. "The next wave of features will be taking a closer look at how we can fine-tune and train models to accomplish data engineering tasks better than generic tools."

Petrie, meanwhile, said he'd like to see Monte Carlo and other data management vendors build their own governed language models that they train on curated data inputs.

The goal would be greater accuracy than what's provided by LLMs that are trained by data curated from the internet.

"These new language models will be smaller and distinct from large language models such as ChatGPT," he said. "Because they train on curated data inputs, these new governed language models will provide more accurate and less risky assistance to data teams as they address specific use cases. Monte Carlo seems to be moving this direction."

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data management strategies

Business Analytics
SearchAWS
Content Management
SearchOracle
SearchSAP
Close