Getty Images

Nvidia's new KV cache makes waves in enterprise storage

Nvidia unveiled a KV cache system with its Vera Rubin and BlueField-4 chips that raises competitive and memory shortage concerns in the industry.

Whether they're running AI infrastructure or not, enterprise storage buyers could be affected by Nvidia's Vera Rubin platform this year, particularly its new key-value cache for AI inference.

Nvidia's previewed a BlueField-4 data processing unit (DPU) and Inference Context Memory Storage (ICMS) Platform last week during CES and expects to ship them in the second half of 2026. Long-running AI inference processes among multiple AI agents require that data be kept in memory in the form of a key-value cache (KV cache). Nvidia touts its new platform as a better design for cloud providers and frontier model companies running agentic AI inference at large scale.

"There's a whole new category of storage systems … because this is a pain point for just about everybody who does a lot of token generation today," said Nvidia President and CEO Jensen Huang during a CES keynote presentation Jan. 5. "They're really suffering from the amount of network traffic that's being caused by KV cache [data] moving around."

Instead of juggling KV cache data among GPU host memory buffers, ICMS pools and extends the KV cache for an entire Rubin GPU cluster using BlueField DPUs and NVMe SSDs connected by Nvidia Spectrum-X Ethernet.

"This is very different to traditional storage architectures -- the performance and latency assumptions when running AI inference at scale are very different because the data requirements can be very different, such as the need to have very large data sets in memory, the need to frequently recompute data, the need for memory-class performance, etc.," Simon Robinson, an analyst at Omdia, a division of Informa TechTarget, wrote in an email. "So I think the traditional enterprise storage buyer is not the primary focus here -- it’s more for those looking to build very large inference systems."

Jensen Huang CES 2026 keynote
Jensen Huang presents Nvidia's new KV cache and Inference Context Memory Storage platform as part of its Vera Rubin system during a CES keynote.

The NetApp question

However, just as the broader Vera Rubin platform has potential indirect implications for enterprise IT organizations, Nvidia's new KV cache could increase friction among Nvidia and its storage partners, some of whom have already developed similar products, according to industry experts.

"While storage partners talk about how this is aligned with their own approaches, they now have to figure out how they complement this architecture, how they explain their differentiation and value-add to the market, and how they work with Nvidia in the field," Robinson said. "Some providers, such as Weka and Vast, have already developed comparable capabilities, such as Weka's Augmented Memory Grid. ... Vast announced an interesting integration with ICMS, with its software running directly on BlueField4 DPUs."

While BlueField-4 and ICMS won't ship until the second half of the year, these products are available now, potentially giving Nvidia storage partners a brief window of opportunity. But while most of Nvidia's typical partners lined up to declare support for ICMS last week, including AIC, Cloudian, DDN, Dell Technologies, HPE, Hitachi Vantara, IBM, Nutanix, Pure Storage, Supermicro, Vast and Weka, one name was conspicuous in its absence to industry observers -- NetApp.

Industry analysts suspected this is due to overlap with similar products, such as AI Data Engine (AIDE), that NetApp already has waiting in the wings.

"If you think about the changes NetApp has made to [its] OnTap [data management operating system] over the past couple years, essentially, you have the file and object platform within OnTap, then you have metadata layers and a layer of data services on top of that, as part of AIDE," said Brent Ellis, an analyst at

NetApp, I think, had [its own] plans and this may have thrown a wrench into them.
Rob Strechay,Analyst, TheCube Research

Forrester Research. "Those data services at the AIDE layer are essentially what Nvidia is talking about with ICMS. The difference would be that for NetApp, they have enabled that as a cross-platform deployment."

Another analyst speculated that OnTap might be more difficult to adapt to support Nvidia's KV cache than other vendors' data management software.

"The way the file system works, disaggregating the cache will be difficult, and making it another type of node is not NetApp's style," said Rob Strechay, an analyst at TheCube Research. "NetApp, I think, had [its own] plans and this may have thrown a wrench into them."

Reached for comment, a NetApp spokesperson said in an emailed statement to Informa TechTarget, "The omission of NetApp in Nvidia’s press release was due to an abundance of caution by NetApp to keep our product plans confidential, but we have every intention of supporting these new architectures as per customer needs and timelines."

Ellis said he was skeptical about NetApp's 'abundance of caution,' "but I am sure they will be working to include the technologies in their AIPod- and SuperPod-certified systems."

Nvidia and the global memory shortage

Nvidia's new KV cache will require massive amounts of NAND flash memory -- 16 TB per Rubin GPU in the rack-scale Vera Rubin platform, which will contain 144 Rubin chips, according to Huang's keynote. Meanwhile, over the past two months, AI infrastructure demands for memory have created a global supply shortage, resulting in higher prices across consumer-level and enterprise-grade products.

Dell officials specifically cited a shortage of NAND flash memory in comments to financial analysts during its Nov. 25 earnings call as a key factor in "the cost basis … going up across all products," after its fiscal fourth quarter, according to Jeffrey Clarke, COO & vice chairman. "Everything that uses a CPU has DRAM, has storage in it. … I don't see how this will … not make its way into the customer base."

Financial analysts also downgraded stock price forecasts for HPE in November, citing the memory shortage.

Forrester's Ellis said the shortage will affect more than just prices for storage capacity.

"In 2026 you're going to see a lot of repercussions to flash and storage showing up, in who is able to actually develop technologies and who is not," he said.

Recent enterprise storage product designs have been based on a longstanding assumption that storage capacity is cheap and widely available, according to Omdia's Robinson.

"That no longer applies, at least for the foreseeable future," he said. "In turn, this will force all storage buyers, including mainstream enterprises, to focus more on how to optimize in a capacity-constrained environment."

Beth Pariseau, a senior news writer for Informa TechTarget, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.

Dig Deeper on Flash memory and storage