NetApp aims to transcend storage roots with vision to make data AI-ready

NetApp is redefining its role beyond storage with new AI-focused data infrastructure unveiled at its Insight conference.

The AI revolution is forging new strategies, capabilities and relationships across the entire AI technology stack. Nowhere are these new dynamics playing out more than in the data infrastructure arena, where fast-evolving requirements around optimizing enterprise data effectively are creating a whole new category of capabilities.

These dynamics took center stage at NetApp Insight, the company’s annual customer conference, which was recently held in Las Vegas. As part of its broader mandate to enable 'intelligent data infrastructure,’ NetApp detailed an evolving vision and unveiled a range of new capabilities designed to help enterprises enhance their chances of AI success. For NetApp, it also represents a significant effort to transcend its storage roots and build more strategic relationships with its customers.

NetApp CEO George Kurian laid out the challenge of driving fruitful AI initiatives for regulated enterprises in simple terms. The fact that ‘data is the fuel of AI’ is not in dispute. The challenge, he said, is that enterprise data is not AI-ready, but the roots of the issues in turning regular enterprise data into AI-optimized data are anything but simple.

Challenges of creating an effective AI data pipeline

An effective AI data pipeline requires the careful setup and calibration of a range of data-centric management workflows that need to come together in a cohesive, efficient and scalable process. These include:

  • Identifying, classifying and organizing the data in the first place.
  • Applying governance and privacy policies.
  • Transforming the data through vectorization.
  • Applying semantic data search.

This process is significantly complicated by two additional factors. First, AI is an inherently hybrid workload. The data that organizations often want to leverage in AI workloads exists in many different forms -- such as structured, unstructured and hybrid data. This data lives in many different places, both on-premises and often across multiple public clouds. Second, it’s not sufficient to build an AI pipeline once -- to be effective, there needs to be a rinse and repeat process capable of rapid updates to ensure models reflect changes to the core data. Otherwise, hallucinations and other inaccuracies will persist at the inference stage.

Many organizations face profound challenges at some or even every stage of this process -- to the extent that most of their time is spent ‘data wrangling’ across expensive, inefficient, brittle and insecure data pipelines. NetApp points to issues around exploding data capacities -- noting that an average of six copies of data are made for each pass through the data pipeline. This data bloat is supported by Omdia research data; in a recent study, 87% of respondents said AI is driving substantial data growth for their organization.

Data awareness at the storage layer addresses pipeline challenges

NetApp’s view is that many of these challenges can be addressed by introducing data awareness at the storage layer. Again, this perspective is validated by our research, which found that 71% of enterprises struggle with a lack of integration between storage and AI data pipelines. At Insight, NetApp unveiled more details about its strategy to help make customer data AI-ready, spanning core storage capabilities as well as new data management features, falling under the broader umbrella of the NetApp Data Platform. These new capabilities include:

  • ONTAP AFX. A new parallel variant of the company’s venerable storage file system, running on a disaggregated hardware architecture that physically separates storage compute from back-end data storage. With the two connected by a high-speed backplane, the architecture enables separate scaling of storage and compute resources, enabling customers to run high-performance AI workloads with optimal resource efficiency.
  • AI Data Engine (AIDE). A separate, but complementary, data engine providing AI-centric data services that optimize and secure the AI data pipeline. AIDE comprises of several components, including a metadata engine to find data in the first place, a data sync capability to keep data current, a data guardrails feature to apply security and governance, and a data curator to optimally transform it. Importantly, AIDE integrates with NVIDIA NIM services, while the overall AFX offering with AIDE has been SuperPOD certified by NVIDIA.

Though the initial versions of AIDE run with AFX in on-premises locations, NetApp’s ultimate vision is to help customers apply this approach across their entire environment; on-premises, in multiple public clouds and other locations -- such as GPU-centric ‘neoclouds’.

We aim to do for enterprise data what Google has done for internet data.
George Kurian CEO, NetApp

“We aim to do for enterprise data what Google has done for internet data,” stated Kurian. The NetApp Data Platform can be used to create a ‘metadata fabric’ across a global namespace that turns all enterprise data into a ‘knowledge graph’ that can be leveraged by a growing raft of AI agents and other emerging capabilities -- e.g. MCP and AI-centric APIs -- all under a federated data control model that keeps certain data secure and private.

In this sense, the development of AIDE is particularly notable for NetApp, which represents an important development for a company that has spent most of the last 30 years creating very storage-centric products. By and large, storage vendors have stayed in their storage swim lanes, focusing on software that help customers store, manage and protect their data, without paying particular regard to the nature of the data itself.

Storage providers look to develop data-centric services

However, AI is changing the dynamics here, encouraging many storage providers to develop certain data-centric services that historically have existed higher up the data management stack. This isn’t to say that NetApp is seeking to create an entire data management platform a la SnowFlake or DataBricks; rather, that its proximity to the data affords it insights into the specifics of the data that can be used to optimize the broader AI pipeline. Indeed, NetApp is building integrations with an ecosystem of data management ISVs -- such as Informatica -- that are designed to simplify the overall data pipeline.

However, NetApp believes its approach with its Data Platform will enable it to transcend its storage roots, providing customers not only with unified storage across the hybrid/multi-cloud, but also a unified data model. Clearly, NetApp also hopes this will enable it to drive more strategic conversations -- and relationships -- with its customers as a result.

While the scope of NetApp’s ambitions is bold, they are part of a broader trend we see playing out across different storage vendors, though the specific implementations do vary widely. VAST Data, for example, is taking a much more comprehensive approach; it’s building a fully integrated AI ‘operating system’ that extends from the foundational storage layer all the way through the data layer; including higher-order features that orchestrate databases and application runtime.

Dell, on the other hand, is taking a more partner-centric approach with its AI Data Platform, which combines its own storage capabilities with software from partners such as Starburst -- data lakehouse -- and Elastic -- search.

Where customers ultimately find value is still an open question. Most enterprises are still very early in their AI journeys and are still understanding what their requirements will be, while the ultimate decision will often come down to multiple factors. However, NetApp is developing an intriguing set of capabilities that will attract the interest of many customers looking to leverage more of their data for AI in a more strategic manner. In this sense, it’s a fascinating set of additions to the rapidly developing AI landscape.

Simon Robinson is principal analyst covering infrastructure at Omdia.

Omdia is a division of Informa TechTarget. Its analysts have business relationships with technology vendors.

Dig Deeper on Storage system and application software