Intelligence and advice powered by decades of global expertise and comprehensive coverage of the tech markets.
Published: 24 Mar 2026
Nvidia's GTC (GPU Technology Conference) is now a centrepiece event for AI infrastructure, and this year's show didn't disappoint. But aside from the more outré announcements -- data centers in space, robot Olaf and more -- it was developments in the storage arena that caught my eye. Indeed, I believe it is consequential for data, especially regarding storage announcements. This is true both for Nvidia itself and the broader storage ecosystem. In aggregate, they underscore that the storage industry is undergoing a period of innovation, and in some cases reinvention, that we haven't seen in many years.
The shift to inference, and its impact on storage
What's driving this? Nvidia's headline mission is to power the mass commercialization of AI inference; to do this, it needs to address mounting data and storage limitations head-on. Though inference at scale is a multi-faceted challenge, it's also true that 'there is no AI strategy without a data strategy;' as such, the storage environment remains a major roadblock that organizations must address as they begin to scale. This is supported by our own research, where 70% of IT leaders said storage-related challenges are a significant barrier to AI success.
Moreover, the shift to agentic AI is only serving to supercharge data and storage challenges. The demands of agentic AI are changing the data conversation in three ways.
Access to data has to be instant; autonomous agents don't have a tolerance for idleness.
The volume of tasks that agents can perform will dwarf that of any set of humans, so the ability to massively scale on demand becomes a prerequisite.
For the entire agentic proposition to work, it must be able to access 'real time' data; there's no trust in a system that is making consequential business decisions based on old, stale and quite possibly inaccurate information.
Given these demands, it's no surprise Nvidia is leaning in and calling for a 'reinvented' data and storage stack that can meet this new reality.
At extreme scale -- i.e., organizations building frontier models, LLMs and the like -- the storage issues focus on managing rapidly expanding context windows. It's a subject I've discussed previously, but continues to evolve rapidly.
Essentially, it's about how to keep GPUs fully and efficiently utilized as inference prompts become more sophisticated. These challenges compound as the GPU estate continues to explode and agentic use increases. Maintaining and coordinating a coherent view of data across the entire environment becomes exponentially more difficult.
At GTC, Nvidia made two notable announcements to address this challenge. The first is Context Memory Storage (CMX, previously known as ICMS) initiative to manage KV cache across HBM and connected fast (NVMe-based SSD) storage. Powered by BlueField-4 DPUs and running Nvidia DOCA (data center on a chip architecture) software, Nvidia says it will drive up to 5x tokens per second by keeping context data accessible without forcing round trips through slow storage. CMX also utilizes other Nvidia elements, including its Dynamo software for inference orchestration across the GPU cluster and Spectrum-X low-latency (RDMA) networking.
It's storage, Jim, but not as we know it
It's worth emphasizing that Context Memory Storage is an entirely new type of storage tier, optimized for the unique requirements of KV cache. Think of KV cache as the long-term memory of the model; it cares deeply about performance, but is inherently ephemeral; if context is lost, it can be recomputed. Hence, data durability -- a defining aspect of traditional storage systems -- is less important.
Additionally, AI factories prioritise one ratio above everything else: power efficiency. Ideally, every watt that enters the system should be used to generate tokens -- idle GPUs, redundant computations and unnecessary operations anywhere in the overall system amount to a waste that drives up power usage and drives down efficiency.
These considerations have profound architecture implications when building an entire AI system designed for optimal inference performance. KV Cache started out living in GPU-level memory (utilizing HBM), and spills over into DRAM. As data volumes increase, it obviously becomes prohibitively expensive to scale memory. Spilling KV cache data into local/rack-level SSDs partially addresses this, but it's not sharable across the broader GPU cluster or pod.
By contrast, traditional external storage is both shareable and lower cost, but was not designed for KV cache. Although KV cache needs large amounts of super-fast, shareable, lower-cost storage, its ephemeral nature means it doesn't care primarily about data resiliency, durability and all the 'heavy' data management services that traditional storage arrays offer. Such services also incur compute and power overhead that negatively impacts tokens per watt.
As 'tier 3.5' (or 'G3.5' in NVIDIA-speak) storage that sits between HBM/DRAM and shared external storage, CMX is designed to provide petabytes of super-fast, shared capacity across an entire GPU pod -- initially for the new Nvidia Vera Rubin system. This enables long-context workloads to retain history after eviction from HBM and DRAM. And, as a storage tier that doesn't need data durability, it's also power-efficient.
So where does the storage ecosystem fit in?
Given the above, it may come as a surprise to learn that, far from de-emphasizing its third-party storage partnerships, Nvidia is doubling down on them. Though the precise needs of KV cache differ from those solved by traditional arrays, when building an AI factory, the total system really matters. It's no use having a super-fast and scalable KV Cache if there's no effective mechanism to get the latest enterprise data into the system in a timely fashion. Modern organizations are extremely distributed, and so is their data; what's more, this data is fragmented into a patchwork of storage silos.
Accordingly, Nvidia also used GTC to announce a new, modular reference architecture for storage -- STX -- designed to enable a range of enterprises, cloud providers and AI specialists to deploy accelerated storage infrastructure for agentic AI. The program has received near-universal support from the storage and manufacturing vendor ecosystem, with no fewer than 15 partners signing up at launch.
A Cambrian explosion of storage innovation
So how precisely will the storage vendors work with STX, and to what effect? This is still very much an embryonic space and work in progress, and it will be down to each individual storage partner to demonstrate their ability to add value to an STX environment. Nvidia itself says that 'G4' external storage, which includes object and file storage, can be reserved for data that truly needs durability and to persist over time. This includes inactive multiturn KV state, query history, logs and other data that may need to be recalled in the future.
While all of this is certainly valuable, it's also true that many storage vendors see a deeper role for themselves in the inference data stack. After all, storage technology companies understand IO and are experts at optimizing it across multiple dimensions.
Hence, we're going to see a whole raft of innovation around KV cache optimization for CMX and potentially other implementations over the coming months. Some storage players have already started down this path. For example, VAST Data has integrated elements of its distributed data platform software with BlueField-4 DPUs and Dynamo across tier 3 and tier 4 storage, intending to support KV Cache in tier 3.5 in future releases. This would enable data services such as data reduction, security and lifecycle. WEKA's Augmented Memory Grid offers an alternative approach to CMX (and one that is available today), though the company has joined STX and plans to support Nvidia's approach as well.
Meanwhile, there are some fascinating developments in the object storage world that promise to elevate S3-compatible storage as a first-class storage performance tier as part of STX. MinIO has integrated its AIStor software with BlueField-4 and DOCA, and is working on advanced capabilities such as S3-compatible object storage for Nvidia GPUDirect RDMA, wire speed erasure coding and hardware-accelerated encryption. Cloudian is another object specialist that is working on standardizing RDMA for S3-compatible storage, and has multiple efforts under way to integrate with CMX using the STX reference architecture.
These are just a few examples of the work that is happening here. All of the major storage vendors -- Dell, HPE, IBM, Hitachi Vantara, Nutanix, NetApp, Everpure and more are supporting STX -- and without exception are investing significantly in their capabilities across the data AI lifecycle.
Conducting the data orchestra
One final area of development worth mentioning here is data orchestration capabilities that can help ensure the most relevant, recent data (with appropriate guardrails) is made available to AI workflows. This is a hot area that blurs the lines between 'classic' storage and 'classic' data management and is another example of how the market is evolving at pace; this transition is a big part of Pure Storage's recent rebranding to Everpure (along with its pending acquisition of 1touch.io), and is driving product innovations such as NetApp's AI Data Engine (AIDE) and Dell recent Data Orchestration Engine (a product stemming from its recent Dataloop acquisition).
AI is no longer simply about deploying models, but is about transforming data into insight and intelligence.
It's also worth noting that -- while CMX is targeted chiefly at very large AI environments -- model builders, hyperscalers, GPU clouds and so on -- the data management challenge is a much more widespread issue faced by almost every large organization. I'll be exploring the role and implications of these efforts in a future blog post.
For now, though, the AI revolution continues to redefine customer expectations around data infrastructure, which in turn is driving a new golden age of storage and data infrastructure innovation. AI is no longer simply about deploying models, but is about transforming data into insight and intelligence. The storage innovations unveiled at GTC offer a foundation for enabling this transformation.
Simon Robinson is principal analyst covering infrastructure at Enterprise Strategy Group, now part of Omdia.
Enterprise Strategy Group is part of Omdia. Its analysts have business relationships with technology vendors.