Gorodenkoff - stock.adobe.com

Nvidia, Intel team up on energy efficient AI server

Nvidia and Intel go from competitors to partners by jointly developing a system designed to push heavy AI workloads and offer users significant energy savings.

Competitors Nvidia and Intel have partnered to deliver a server that will house both Nvidia's H100 Tensor Core GPUs and Intel's latest Xeon Scalable processors.

The new generation of systems are specifically designed for "energy-efficient AI," according to Nvidia officials. The combination of Intel's CPUs, working in tandem with Nvidia's Tensor Core GPUs, delivers higher performance as well as more computation and problem solving per watt than its predecessors.

The Nvidia and Intel chips will be housed in Nvidia's DGX H100 servers as well as in 60 other servers that contain H100 GPUs sold by Nvidia's business partners.

What inspired the collaboration is the widespread adoption of AI across major markets the past few years, along with the growing complexity of neural networks, Nvidia said. Enterprises have had to acquire more compute power to drive AI-powered workloads more efficiently. These workloads have also placed significantly higher demands on electricity.

Some were surprised by two tooth-and-nail competitors entering into a co-opetition agreement, although one analyst believes the deal will benefit both.

"This is an interesting dynamic here," said Jack Gold, an analyst with J. Gold Associates. "On one hand, Intel and Nvidia are fierce competitors. But on the other hand, Nvidia really needs Intel's high-end CPU that can really drive [Nvidia's] GPUs."

Nvidia is working on its own arm-based CPU. But that chip does that compete with Intel's Scalable Xeon Processor, he said.

Probably the single largest cost for data centers these days is energy. Even if you save a data center 10%, you are talking real money.
Jack GoldAnalyst, J. Gold Associates

In the perpetual game of performance leapfrog, the Nvidia-Intel collaboration figures to put some distance between the two companies and rival AMD's current offerings, Gold added.

Another analyst echoed that sentiment, saying working cooperatively to improve the performance of systems driving a key technology like AI is the preferred way to go.

"It's encouraging to see competitors like Intel and NVIDIA collaborating to improve such vital technology, especially when it drives powerful and efficient outcomes that help lower carbon footprints created by datacenters," said Dan Newman, an analyst with Futurum Research. "This is going to remain a central challenge for corporate users for the foreseeable future."

Another reason the cooperative effort is good for each party – and, more importantly, their users -- is the resulting energy cost savings.

"Probably the single largest cost for data centers these days is energy," Gold said. "Even if you save a data center 10%, you are talking real money."

The upcoming systems will run typical enterprise workloads on average about 25 times more efficiently than more traditional CPU-only data center-class servers, according to Nvidia. It uses available power to data centers more efficiently, while processing workloads faster.

Jensen Huang, Nvidia president and CEOJensen Huang

Explaining the need for added compute power and energy savings, Jensen Huang, Nvidia's CEO, said data center energy consumption is growing at a rate that is no longer sustainable.

"To change this trajectory, we must accelerate every application possible," Huang said in a video blog. "Just one [of the upcoming servers] can reduce the processing time, energy and cost by X factors."

Compared to the previous generation of Nvidia servers, the new systems can speed up training and inference to raise energy efficiency. The added performance could lower the total cost of ownership by a factor of 3X, the company claims.

Contained in the 4th new Intel Xeon chips is support for PCIe Gen 5, capable of doubling the data transfer rates from the CPU to Nvidia's GPUs. The increased number of PCIe lanes in the chip provide greater density of GPUs along with high-speed networking within each server.

The Intel chip also has faster bandwidth improving the performance of data-intensive workloads, including AI-based workloads, along with network speeds up to 400 gigabits per second.

As Editor At Large with TechTarget's News Group, Ed Scannell is responsible for writing and reporting breaking news, news analysis and features focused on technology issues and trends affecting corporate IT professionals.

Dig Deeper on AI technologies

Business Analytics
Data Management