AI-ready data needs its own set of rules, experts say
At Gartner's 2026 Data and Analytics Summit, AI-ready data was top of mind. Explore these best practices from experts on achieving data readiness for AI systems.
ORLANDO, Fla. -- As businesses increasingly prioritize AI initiatives, the data needed for AI success is more critical than ever.
"To get started with AI, your data needs to be ready, your people need to be ready and your technology needs to be ready," said Melody Chien, senior research director in data management at Gartner, in an interview with TechTarget Editorial. "There's a lot that needs to come together, but the most important is data. With no data, there's no AI."
To address AI's pressing data needs, many chief data and analytics officers and data teams are expanding their responsibilities to include data management specifically for AI processes. But it's not as simple as applying traditional data-readiness practices to the data needed for AI systems. Having AI-ready data requires its own set of rules, tools and best practices -- many of which were on display at Gartner's Data & Analytics Summit this week.
Why data readiness for AI is unique
Due to AI's complexity and autonomous nature, AI systems need data that depends on context and use cases in a way that other technology applications don't.
"AI-ready data means that data is ready to support certain AI cases," Chien explained. "'Certain' AI cases means that readiness can be very contextual. Data could be ready for one AI case but not for another AI case. You can't just put a label that says, 'data is AI-ready.' You need to look at the bigger context, including what exactly you're trying to implement."
Businesses and AI vendors alike often don't carefully consider the data needs of AI systems, said Roxane Edjlali, senior director of data management strategies at Gartner, in an interview with TechTarget Editorial. In fact, a recent Gartner AI-ready data survey found that only 32% of organizations with AI initiatives had an AI data-readiness process.
"This means that 68% aren't doing it systematically, which is quite concerning," Edjlali added. "If you apply the same ratio [to AI agent use cases specifically], that would be even further concerning."
AI's need for context becomes more prominent in the age of agentic AI. AI agents often operate autonomously or with limited human-in-the-loop capabilities, so understanding context is essential for them to function properly.
Without AI-ready data, an agent can struggle to identify if it's being prompted in the right context for the right use case, Edjlali said in her Gartner session, "AI-ready data: Lessons learned become practices to follow."
"Context needs to be able to provide [an agent] with sufficient information to verify whether it is or is not operating as planned," Edjlali said in an interview. "If it's not operating as planned, you cannot expect to get the same level of accuracy or precision that the AI use case was designed for."
How to get your data AI-ready
In an interview with TechTarget Editorial, Arun Chandrasekaran, VP analystat Gartner, identified three main pillars for getting data AI-ready: data quality, data integration, and data lineage and classification.
Ensuring data quality means transcending conventional techniques because there's so much data fed into AI systems and much of it is unstructured, Chandrasekaran explained. Creativity is essential here, such as using tooling to support data labeling and synthetic data to fill data gaps.
Data integration means getting data into your pipelines, he said. This can involve many new techniques, such as using AI models to chunk, retrieve and add metadata as new data comes in. It can also involve engaging with tools such as the Model Context Protocol.
You can see unstructured data as a liability, or you can see it as an asset.
Melody ChienSenior research director, Gartner
Data lineage and data classification specify the data's origin, lifecycle and characteristics, which involves citations, verified sources and often a context layer. "The context layer includes everything from the active metadata management, the semantic layer, which is the business definition of the data, the ontology, which is understanding and representing the relationship that exists between data, and perhaps also memory, which is particularly needed as AI agents become more pervasive," Chandrasekaran said.
With a variety of processes needed to ensure AI-ready data, businesses can verify their data readiness with AI readiness assessments, such as the 26-point checklist Edjlali presented in her session.
Not every business needs to move through its readiness assessment in the same way, Edjlali said in an interview. The key is that whenever embarking on a new AI project, experts and stakeholders should collaborate and define the data needed to execute the use case. From there, teams can proceed to a proof of concept and complete a readiness assessment that works for their use case.
In her session, Edjlali presented a 26-point checklist that organizations can use to ensure AI-ready data.
Consider the value of unstructured data
Because many generative and agentic AI applications today are multimodal, unstructured data accounts for a large share of the data needed to train and maintain AI systems.
Unstructured data lacks the predefined structure necessary for easy storage and analysis in traditional databases. It can be in the form of images, audio, PDFs, social media, emails and more. While unstructured data can be more challenging for organizations to store and analyze, its use in AI applications can create business value.
"You can see unstructured data as a liability, or you can see it as an asset," Chien said in an interview.
In her session, "How to unlock the value of unstructured data for AI: Start with governing it first," Chien noted that 70%-90% of enterprise data is unstructured. Therefore, Chien spends more time today educating clients about unstructured data and how to analyze it, she said in an interview. But now, increased literacy and access to tools mean organizations can more often engage with the unstructured data vital to many generative and agentic AI applications.
While understanding the importance of unstructured data for AI use cases is a first step, leaders must also properly govern it if they want that data to be ready for AI, Chien said. Governance involves multiple steps, including tagging and classifying unstructured data, along with extensive metadata management.
"The type of processing and metadata that you need for unstructured data is very different," Edjlali said in an interview. "If you don't have enough labels that are going to distinguish everything, it is likely going to be much more difficult to get accuracy for the use case."
In her session, Chien explained the steps to governing unstructured data, including tagging and classifying data.
Prioritize metadata
The metadata used to describe unstructured data can be inconsistent, Chien said in her session. There are often no predefined standards or clear ownership over data, which can create a major roadblock to AI data readiness.
"Metadata is everything that helps you answer the question, 'Is my data AI-ready?'" Edjlali said in an interview. "Questions regarding what the model is designed to do, who should use it, what type of agent it is, what are the use cases outside its scope -- metadata tells you where the data is, where the data came from and the statistical distribution of the data."
Importantly, metadata, typically managed through AI model cards, also helps teams identify data drift, which means the AI system can no longer perform its use case adequately, Edjlali added. That's the big difference between metadata for AI and traditional metadata, which is often considered static. "'Did anything change?' is the core question that you should be trying to answer," she said in an interview. "Answering that question is metadata."
Metadata is everything that helps you answer the question, 'Is my data AI-ready?'
Roxane EdjlaliSenior director analyst, Gartner
Automate where appropriate
From metadata collection to operationalization, AI-ready data practices are too time-consuming to be entirely manual. Organizations can use tools to automate certain business processes.
There are three main categories of tools to consider, Edjlali said in an interview. Businesses should focus on metadata management, data observability and data governance tools to improve data readiness. When selecting the right AI tools, it's important to understand the project's use case as well as data and metadata needs.
"Clients want a simple solution," Edjlali said. "They are looking at technology to solve the problem for them, and so they would much rather have a single vendor do it for them, but it might not deliver on its promise. This is where your knowledge and context come into play."
Olivia Wisbey is a site editor for Informa TechTarget's AI & Emerging Tech group. She has experience covering AI, machine learning and software quality topics.