Getty Images/iStockphoto

Why enterprise AI depends on the semantic layer

Semantic layers are moving from BI tools into the core analytics stack as AI agents query enterprise data, requiring governed definitions for consistent interpretation.

The semantic layer, historically embedded in business intelligence tools, is becoming central to enterprise analytics as AI agents begin querying enterprise data directly. Without governed definitions, agents can produce answers that are consistent in execution but inconsistent in meaning.

That arrangement worked as long as a person sat at the end of every query. A human analyst carried the institutional knowledge to pick the right definition and catch a number that looked wrong. Agents carry none of that. They act on whatever definition they are handed or infer at machine speed across systems that might never have an analyst in the loop.

AI has turned the semantic layer from something taken for granted into something enterprises have to decide deliberately. An enterprise putting agents into production could, in theory, take a shortcut, letting the model approximate what the data means rather than defining it, but that has significant downstream consequences. An approximation no one can trace to a governed definition is an answer no one can defend when questioned.

Agents changed the layer that was always there

For most of its history, the semantic layer did its work quietly and in one place. It fixes what a business term means before anyone queries it, so that revenue or active customer resolves to one agreed definition rather than whatever the query writer assumes. Those definitions lived inside a company's BI tool, and that was enough for the time. The people querying the data worked closely with it, and the arrangement never had to travel.

Now, that has changed. The definitions can no longer stay where they have always lived. "Having those [definitions] tied up in your BI tools doesn't work anymore, because you need AI agents to access them," said Chris Child, vice president of product for data engineering at Snowflake. The semantics themselves did not move, but who or what needs to access them is fundamentally different now.

An enterprise agent works across an estate no one person holds in their head at once: object storage in one cloud, a warehouse built a decade ago, operational systems like ServiceNow and Salesforce, data lakes federated across acquisitions. Sergio Gago, chief technology officer at Cloudera, put the limit plainly. "The human, with all the domain knowledge and expertise inside the company, is able to navigate this complexity," he said. "But an agent cannot do that." The difference between the data an agent can reach and data it can correctly interpret is the gap the semantic layer fills. It is why a concept that held steady for years is being pulled into a part of the stack it has not occupied before.

A query that runs is not a query that's right

The shortcut for building a semantic layer is to let the model write its own definitions on the fly. LLMs can produce SQL that executes, and an executing query feels like a successful answer, which is why the shortcut is easy to ship yet hard to trust. The distance between a query that runs and one that returns what the business asked for is where the semantic layer earns its place.

Snowflake learned this on its own data, training its Arctic models on years of customer SQL, only to find they "were still not great at actually answering your real business questions," said Child. The models had the syntax and still missed the meaning. The turn came not from more data but from providing definitions. "We gave them access to semantic models, and they got dramatically better at answering the real question."

The ambiguity runs deeper than edge cases. A question as basic as a company's customer count carries several correct answers at once -- the public figure, the internally tracked number, or the count of active accounts. No model resolves this ambiguity on its own. The number it returns will be confident and defensible in the context of whichever definition it chose, but no one can say which definition produced it. This is the failure mode that should worry a governance lead, because it does not announce itself as a failure. An agent given broad access will produce what looks like an answer, including tables never meant to be authoritative, and return it with the same confidence it brings to everything else.

None of this is new work, which is the part enterprises keep missing. Defining what data means, documenting it and governing its use was always the standard practice, but human analysts absorbed the cost of skipping it, holding definitions in their heads through institutional knowledge and catching each other's errors. Agents take the definition they are handed or invent one, faster than any dashboard could expose that same data.

Which format are businesses using?

If the semantic layer is now the layer that matters, the contest becomes who controls its shape. A definition locked in one vendor's proprietary format is one the customer cannot take anywhere else, and a semantic layer's value is that an agent can read it wherever the data sits. That is the logic behind the Open Semantic Interchange, the standard many vendors are backing to make semantic definitions portable across platforms.

The problem it solves is duplication that compounds with each tool. An enterprise running two BI tools already maintains two semantic models, and pointing an agent at the same data demands a third. A standardized format lets a company define a metric once and use it everywhere instead of redefining it for each system that touches the data. Customers are already pushing in that direction, and not toward any single vendor. The typical enterprise now spreads its data across several platforms, Gago said, with “some data in Snowflake, some data in Databricks, some data in Cloudera,” and that sprawl is producing demand for open systems and open standards “that don’t lock them in.” A portable semantic layer is the version of that demand aimed at the layer agents actually consume. Many BI tools that once guarded their semantic definitions have signed on to the open standard, said Child, having concluded the definitions must be usable across many tools, not locked inside one.

The contest over which format prevails is not settled, but the direction is. The vendors aligning to open standards become the obvious choice, and the holdouts betting on their proprietary definition being better are betting against a shift that their own peers have already joined. In the end, a definition is only worth as much as the number of places an agent can carry it.

Scott Thompson is the Site Editor for TechTarget's Data Technologies group, covering data management and business analytics topics for senior enterprise data leaders. He has edited data and analytics content for TechTarget since 2021.

Dig Deeper on Data governance