New consortium to aid AI by standardizing semantic modeling
An open standard for defining data would simplify data integration and discovery to reduce one of the barriers preventing organizations from successfully developing AI tools.
A consortium of prominent data management and analytics vendors, including Snowflake and Salesforce, on Tuesday unveiled plans to develop an open source standard for semantic data modeling.
While enterprises face numerous barriers when developing AI and analytics tools to inform decisions, one of the biggest is inconsistent data. Semantic modeling makes data consistent so it can be searched, discovered and used to inform models and applications.
A semantic model is a set of common definitions of data so that data's characteristics -- its metadata -- are classified consistently whenever data is ingested or transformed. Many enterprises, however, do not have semantic data modeling frameworks, nor do many data management and analytics platforms provide semantic layers within their platforms.
Specialists such as DBT Labs and Cube focus on semantic modeling, while ThoughtSpot and Google's Looker are among the platforms providing semantic layers to underpin their broader set of tools. However, each vendor's semantic modeling capabilities are proprietary tools that differ from one another. Therefore, if an organization uses more than one platform for its data management needs, its data becomes fragmented.
In addition to Snowflake and Salesforce, Alation, Atlan, Cube, DBT Labs, Mistral AI, Sigma and ThoughtSpot are among others participating in the Open Semantic Interchange (OSI).
The OSI aims to change that by standardizing semantic modeling. As a result, its establishment is significant, according to Stephen Catanzano, an analyst at Enterprise Strategy Group, now part of Omdia.
"The formation of a consortium to develop a standard for defining and sharing metadata is highly important," he said. "It addresses the challenge of fragmented data semantics across tools and platforms that create major roadblocks for both human and AI-enabled analysis. This standardization should [enable] organizations to scale AI and BI with greater confidence, speed, and trust."
Beyond inconsistent data, poor data quality, outdated systems, talent shortages and organizational culture are barriers to successfully developing AI and analytics tools.
A new standard
While poor semantic modeling -- or a complete lack of semantic modeling -- has long hindered integrating data and using all available data to inform analytics and AI applications, fragmented data has taken on greater significance over the past few years.
The formation of a consortium to develop a standard for defining and sharing metadata is highly important. It addresses the challenge of fragmented data semantics across tools and platforms that create major roadblocks for both human and AI-enabled analysis.
Stephen CatanzanoAnalyst, Enterprise Strategy Group
OpenAI's November 2022 launch of ChatGPT marked a significant improvement in generative AI (GenAI) technology. Since then, because GenAI can fuel applications that make workers better informed and more efficient, many enterprises have increased their investments in AI development.
GenAI, however, requires large amounts of data to be accurate.
GenAI outputs are based on aggregations of data rather than individual data points. As a result, when there is a large quantity of high-quality data to train a GenAI model or application, it's more likely to deliver an accurate output.
Now, agents are the latest evolution in AI. Unlike GenAI tools that require inputs before delivering an output, agents can act autonomously. However, similar to GenAI tools, agents require large amounts of relevant training data to properly perform their prescribed tasks.
Semantic modeling makes it easier for developers to discover the requisite volume of relevant data to train agents. An open standard for semantic modeling applied to all data would further simplify data discovery and development by enabling organizations to more easily integrate data stored in different systems.
Kevin Petrie, an analyst at BARC U.S., noted that poor data quality is the top obstacle to analytics and AI success. As a result, the plan to develop a standard for semantic modeling is significant.
"This is a good step forward for the industry," Petrie said. "A unified semantic layer can help overcome [data quality problems], enabling AI applications to consume diverse inputs to generate rich outputs."
Surging interest in AI development, meanwhile, provided the impetus for forming the consortium to develop an open standard for semantic modeling, according to Josh Klahr, director of analytics product management at Snowflake.
"Every company has struggled with fragmented, inconsistent semantic definitions for years," he said. "But until now, the pain was largely hidden inside BI tools and analytics teams. What's changed is the explosive demand for AI and agentic analytics. Suddenly, those inconsistencies aren't just slowing down dashboards. They're undermining the accuracy and trustworthiness of AI systems."
Snowflake's role as one of the co-leaders of the OSI stemmed from its June launch of Semantic Views, Klahr continued. In conversations with customers, Snowflake repeatedly heard that semantics were fragmented, and organizations wanted an interoperable way to integrate them across their entire data estate.
That feedback led Snowflake into discussions with partners, which Snowflake discovered were hearing similar feedback from their customers, according to Klahr.
"Those conversations quickly coalesced into the idea of the Open Semantic Interchange initiative, bringing together leaders from across data, AI, BI, analytics and industry verticals to create an open, vendor-neutral semantic model specification," he said. "We formed OSI as an industry initiative to address this shared problem collectively, and we're continuing to bring more partners into the initiative."
While the OSI has not yet released a standardized semantic modeling framework, a working group has been formed to develop the open standard "quickly," according to Klahr.
Meanwhile, Catanzano noted that the sooner the OSI can develop a universal semantic modeling framework, the more beneficial it will be for organizations struggling to successfully develop AI models and applications.
"As AI increasingly becomes the primary way businesses leverage data, inconsistent interpretations of business metrics and metadata across different tools are causing confusion, slowing adoption and eroding trust in AI-driven insights, making standardization critical for successful AI implementation at scale," he said.
The OSI's specific goals include the following:
Improve interoperability across tools and platforms through a shared semantic standard to make integrating and preparing data easier.
Accelerate developing and deploying AI and analytics applications by standardizing how semantics are defined and exchanged.
Streamline operations by reducing the need to reconcile conflicting semantic definitions and duplicate work across platforms.
The OSI's intent to develop an open standard for semantic modeling follows the recent launches of open standards to simplify how agents are trained and interoperate.
Model Context Protocol, created by AI vendor Anthropic and launched in November 2024, is an open framework that addresses how agents interact with data sources, including proprietary data sources such as databases and public data sources such as large language models. Agent2Agent Protocol, developed by Google and released in April, addresses how agents autonomously interact with one another once they are deployed.
As agents become more ubiquitous, other processes, such as integrating data from external sources and AI model evaluation, could benefit from open standards, according to Catanzano.
"We need ways to audit AI," he said.
Looking ahead
Once the OSI launches its standardized semantic modeling framework, whether it serves its intended purpose will depend on how it's received. To truly reduce fragmented data, it will need more than the current group of vendors that make up the consortium to buy in, according to Petrie.
The current group represents a good start with participants from different data management segments such as data catalogs, data transformation and semantic modeling. But without participation from AWS, Databricks, Google, Microsoft and other leading data management vendors, the OCI's framework could be another tool that fragments data.
"Snowflake has assembled a solid group of supporting vendors, [but] to be successful, rather than creating another incompatible silo, OSI will also need the support of the hyperscalers such as AWS and other cloud data platforms such as Databricks," Petrie said. "It also will need to address on-premises platforms, because our research shows that one-third of AI workloads reside on-premises."
Catanzano noted that for the consortium to succeed, it needs widespread support.
"What makes this initiative powerful is the collaboration among industry leaders," he said. "Rather than addressing semantic standardization in silos, these companies are creating a vendor-neutral specification that marks a decisive shift away from closed, single-vendor approaches. ... But we will see if others join in or introduce something competitive to this one."
Ultimately, however, the OSI is attempting to provide a helpful framework. If it succeeds, like MCP, it will be a major step toward simplifying the development of analytics and AI applications, according to Catanzano.
"A standard for defining and sharing metadata will … streamline operations by eliminating the weeks currently spent reconciling conflicting definitions or duplicating work across platforms," he said. "This allows data and AI teams to focus on innovation rather than troubleshooting semantic inconsistencies."
Eric Avidon is a senior news writer for Informa TechTarget and a journalist with more than 25 years of experience. He covers analytics and data management.