Modernized big data architecture a must for AI to deliver

Many enterprises put AI into production in 2025 but found their legacy data stacks stalled progress. See what it takes to modernize big data systems for better AI results.

AI made its move from pilot projects to production use for most enterprises in 2025, but many deployments will lag -- or outright flop -- until legacy big data architectures get a refresh.

While enterprises are eager to put AI to work, most still run data environments built for another era. Traditional data warehouse architectures support basic reporting and business intelligence. Twenty years ago, organizations began implementing big data architectures built on data lakes containing diverse data sets to support advanced analytics applications, as well as robotic process automation. But such architectures often lack real-time data access and other modern data technologies. As a result, the promise of AI remains out of reach.

More than just new technology is needed. To make the required data modernization a reality, data management leaders must tackle the fundamentals: breaking down data silos that formed either from acquisitions or out of business necessity, streamlining processes that slow data delivery and tightening security and privacy controls so organizations can scale data access with proper protections in place.

These moves should be pragmatic rather than wholesale. A phased approach to an upgrade protects systems of record while moving the most important business areas to a more flexible foundation. During this process, organizations must also weigh when to keep data on-premises and when it belongs in the cloud, embed data governance into everyday work and understand what signs of progress look like.

Why AI needs a big data upgrade

Organizations worldwide are bullish on the transformative power of AI, believing the technology is essential to stay competitive today and in the future. That belief is driving numerous AI implementations across various enterprise functions.

A survey commissioned by data platform software company Cloudera found that in 2024, 88% of enterprises were using AI. A 2025 follow-up reported 96% had integrated AI into their core processes to at least some extent.

But a lot of organizations are struggling. "[E]vidence suggests that transformative value [from AI] remains elusive for many companies primarily due to the limitations of the outdated data infrastructures that are powering AI tools," Cloudera wrote in a report on its 2024 survey.

To support AI at scale, enterprises are finding they must revamp their big data architecture and operations. Leaders increasingly see a modern data foundation as essential to achieve that goal, said Niranjan Ramsunder, CTO and head of data services at UST, a technology consulting firm.

"Data is critical these days. It's central to an organization's success, and a good data architecture and a good data strategy are both critical for an organization to succeed," Ramsunder said.

Ramsunder said many organizations did not allocate money for big data infrastructure modernization efforts over the past decade because they did not anticipate seeing a positive return on these investments. However, AI has changed those ROI perspectives.

"Expectations have changed," Ramsunder said, "and modernization has to be done."

Others give similar advice.

"To stay competitive, enterprises must evolve their data architectures to be more agile, scalable and intelligent, " Noel Yuhanna, vice president and principal analyst at Forrester Research, wrote in a 2025 report. "As these pressures mount, organizations encounter critical challenges that hinder their ability to deliver insights at speed and scale."

What are the limitations of legacy big data architecture?

Making data ready for AI matters. Studies consistently link poor data quality to lost revenue and missed opportunities. Gartner has estimated this issue costs organizations at least $12.9 million annually on average.

There are other problems with legacy big data architectures. Another Forrester report published in 2025 said most organizations keep data in separate systems. Much of it is unstructured and lacks essential basics, such as metadata, lineage and governance, all of which AI depends on to function.

"Without a unified foundation of clean, connected, well-managed data, AI initiatives often remain in the pilot phase and fail to deliver business value at scale," Forrester wrote.

These dated big data pipelines were often designed for batch processing, not real‑time AI workloads that pull from many data types and sources. The environments are costly and difficult to manage, weakening data governance and hindering efforts to scale AI projects.

In 2025, Gartner projected that through 2026, 60% of AI projects will be abandoned if they aren't supplied with AI-ready data. Legacy big data architecture often struggles to support broad AI deployments or agentic, cross-application workflow automation without substantial upgrades to data quality, governance and real-time integration.

Other research pointed to similar challenges. In a 2025 IBM survey of 1,700 CDOs and other senior data and analytics leaders, only 26% said they were confident their organization's data capabilities can support new AI-enabled revenue streams. The researchers found that adopting AI uncovered limits in legacy systems: Data was scattered across tools, common definitions were missing and governance relied on outdated policies.

The study also showed that AI efforts required additional spending on the underlying data architecture to make it fit for purpose.

Why modernization is back on the agenda

The pressure is on enterprise leaders to use AI to launch new services and products, pushing organizations to adapt their big data infrastructure to support those initiatives.

"They're looking into what types of data they have and how to put it in a better state to serve the customer better. That's the driving force for modernization for most companies," said Geeta Sandeep Nadella, a senior member of IEEE, an organization of technology professionals that also defines global technical standards.

The push to upgrade is also driven by uneven adoption across the enterprise, Nadella said. It's common to see a modern data environment in one department or business unit while others -- especially those picked up through mergers or acquisitions -- remain on older systems that now need to be integrated into unified big data architecture.

Many data teams modernize to reduce costs, as legacy environments tend to be more expensive to run and maintain. They also seek to simplify the environment and lower exposure to security risks, which are typically higher in aging platforms.

Many enterprises also plan to modernize to improve strategic responsiveness, even if they're laggards on AI. Organizations that modernize their big data architecture report stronger returns. In 2020, for example, management consultancy McKinsey found that "high-performing data organizations" were three times more likely to say their data and analytics initiatives had contributed at least 20% to the company's earnings before interest and taxes.

"Business agility is something every organization needs to look into these days," Nadella said.

What ultimately drives an infrastructure refresh, experts say, is preparing the data layer for AI to ensure that information is available when and where it's needed and that it can be used safely.

"AI needs consistent, trusted data to be driving more results. And legacy architectures and platforms haven't kept up with the demands for consistent, trusted real-time data for AI," Yuhanna said in an interview with TechTarget.

All the layers that form a modern data stack

Enterprise big data architecture -- whether modern or legacy -- consists of multiple layers to move and refine data from the source to a usable state. While the labels for these layers vary, modern environments include the following:

  • Data sources. Applications and systems that produce data, such as CRM and ERP systems, logs, sensors, files and connected devices.
  • Data ingestion. Batch or streaming processes that move data from sources into the big data platform.
  • Storage. On-premises, cloud or hybrid repositories that can store a mix of structured, unstructured and semistructured data -- one of the key elements of big data environments.
  • Data processing, integration and transformation. Cleans, validates, standardizes and enriches data to convert it into usable formats through ETL processes or ELT ones, which invert the load and transform steps and are often used in big data systems.
  • Data delivery and consumption. Delivers the data for advanced analytics and AI applications, as well as BI, reporting and data visualization uses.

Data pipelines automate the movement of data between layers.

Leading organizations design, build and manage data environments with integrated governance, security, privacy and metadata practices, Yuhanna added.

Experts say several common technology elements combine for a modern big data architecture.

  • Data mesh. An approach that groups data by function, such as sales and finance, and gives each team ownership of its data, with a level of company-wide governance policies.
  • Data fabric. A framework that makes it easy to find and use data across all systems, usually through metadata to automate data discovery, lineage and policy management.
  • Data lakes and data lakehouses. Scalable storage for both raw and refined data, with data lakehouses -- which combine elements of data lakes and data warehouses -- becoming an increasingly popular platform.
  • Open table formats. Open specifications for storing tables in a data lake or lakehouse so different tools can organize, manage and query these large data sets.
  • Vector databases. Systems used to store, manage and search vector embeddings -- the numeric versions of text, images and other unstructured data -- to quickly find close matches in generative AI, machine learning and other applications.

How medallion architecture clarifies data quality

Ramsunder said a common way to organize a modern data lakehouse is to use the medallion architecture, which gives teams a simple layout and a shared understanding of the state of individual data sets. As data moves from one layer to another, its quality and usefulness improve at each step.

Medallion architecture collects data into three logical layers:

  • Bronze. This layer holds raw data.
  • Silver. This layer cleans and validates data that might be used for small tasks, applying fixes and business rules.
  • Gold. This layer publishes curated data that is ready for a wide range of business needs, such as reporting, analytics and machine learning.

How to future-proof your AI ambitions

As with any IT decision, the starting point in creating an effective big data architecture is business need: design the environment to fit specific needs, then choose the required data technologies.

"The real key components depend on what you want to do," Yuhanna said.

In practice, many teams prefer cloud services for scale and speed, but some big data workloads still benefit from on‑premises deployments for tighter control or lower latency in specific AI scenarios, Nadella said.

To avoid vendor lock-in and extend the life of the big data architecture investments, Yuhanna said organizations should use open standards, modular designs and highly automated offerings. He also recommended a phased approach to modernization so organizations get incremental benefits as they progress.

Nadella said it's also important to recognize the work is never done.

"It's something that is constantly getting updated," he said. "You have to continually look at yourself and look at the new requirements."

Mary K. Pratt is an award-winning freelance journalist with a focus on covering enterprise IT and cybersecurity management.

Dig Deeper on Data management strategies