Getty Images/iStockphoto

Compare GPUs vs. CPUs for AI workloads

GPUs are often presented as the vehicle of choice to run AI workloads, but the push is on to expand the number and types of algorithms that can run efficiently on CPUs.

GPUs have attracted a lot of attention as the optimal vehicle to run AI workloads. Most cutting-edge research seems to rely on the ability of GPUs and newer AI chips to run many deep learning workloads in parallel. However, the well-established CPU still has an important role in enterprise AI.

"CPUs are cheap commodity hardware and are present everywhere," said Anshumali Shrivastava, associate professor of computer science at Rice University and CEO of ThirdAI, a platform vendor that trains AI models on commodity hardware. On-demand pricing for CPUs in the cloud is significantly less expensive than for GPUs, and IT shops are more familiar with setting up and optimizing CPU-based servers.

CPUs have long held the advantage for certain kinds of AI algorithms involving logic or intensive memory requirements. Shrivastava's team has also been developing a new category of algorithms, called SLIDE -- Sub-Linear Deep learning Engine -- optimized to run on CPUs.

"If we can design algorithms like SLIDE that can run AI directly on CPUs efficiently, that could be a game-changer," he said.

AI development is at an inflection point. The early work in AI started with small models and relatively small data sets. As researchers developed larger models and larger data sets, they had enough workload to use parallel computing in GPUs effectively. But now the size of the models and volume of the data sets have grown beyond the limits of GPUs to run efficiently.

If the computation does not fit in the main memory of GPUs, the computation will slow down significantly. The same specialized memory that reduces the latency for multiple threads on GPUs becomes a limitation.

"At this point, training the traditional AI algorithm itself is prohibitive [in terms of time and resources]," Shrivastava said. "I think, in the future, there will be many attempts to design cheaper alternatives for efficient AI at scale."

GPUs best for parallel processing

GPUs became the preferred vehicle for training AI models because of their ability to handle performing almost identical operations on many data samples simultaneously. With the growth in size of training data sets, the massive parallelism available in GPUs proved indispensable, Shrivastava said. GPUs provide impressive speedups over CPUs, when the workload is large enough and easy to run in parallel.

On the flip side, GPUs have smaller and more specialized memories. Currently, one of the most widely used GPUs on the market, the Nvidia A100, offers memory capacity options of 40 GB or 80 GB. If the computation does not fit in the main memory of GPUs, the computation will slow down significantly. The same specialized memory that reduces the latency for multiple threads on GPUs becomes a limitation.

CPUs for sequential algorithms

Figuring out how to run more efficient AI algorithms on CPUs rather than GPUs "will drastically expand the market for the application of AI," said Bijan Tadayon, CEO of Z Advanced Computing, which develops AI for IoT applications. Having a more efficient algorithm also reduces power requirements, making it more practical for applications such as drones, remote equipment or mobile devices.

CPUs are also often a better choice for algorithms that perform complex statistical computations, such as natural language processing (NLP) and some deep learning algorithms, said Karen Panetta, an IEEE fellow and the dean of graduate education for the School of Engineering at Tufts University. For instance, robots and home devices that use simple NLP models work well using CPUs. Other tasks also work on CPUs, such as image recognition or simultaneous location and mapping for drones or autonomous vehicles.

In addition, algorithms such as Markov models and support vector machines run on CPUs. "Moving these to GPUs requires parallelization of the sequential data, and this has been challenging," Panetta said.

CPUs strive to be the AI workhorse

Intel is seeing interest from enterprises in running many types of AI workloads on its Xeon CPUs, said Eric Gardner, director of software product management for AI at Intel. Companies in the retail, telecom, industrial and healthcare industries are adopting AI in their CPU pipelines to perform large-scale deployments.

Other vendors are also exploring ways to optimize CPUs for many AI workloads. GPU leader Nvidia's Grace CPU accelerates certain workflows and can integrate with GPUs at high speed where required. AMD's EPYC processors are also being tuned for many AI inferencing workloads.

Use cases include AI in telemetry and network routing, object recognition in CCTV cameras, fault detection in industrial pipelines, and object detection in CT and MRI scans. CPUs work better for algorithms that are hard to run in parallel or for applications that require more data than can fit on a typical GPU accelerator.

The types of algorithms that can perform better on CPUs include the following:

  • Recommender systems for training and inference that require greater memory for embedding layers.
  • Classical machine learning algorithms that are difficult to parallelize for GPUs.
  • Recurrent neural networks that use sequential data.
  • Models using large data samples, such as 3D data, for training and inference.
  • Real-time inference for algorithms that are difficult to parallelize.

Developments in AI hardware architectures

Generative AI adoption has overwhelmed the supply chains for high-powered GPUs used to train and run large language models and image generators. The resulting backlog of months or even years for some of the leading GPUs is encouraging many enterprises to explore alternatives, including AI-specific chip architectures such as tensor processing units (TPUs), intelligent processing units, and computer-in-memory and neuromorphic chips.

Many newer AI chips are designed to stage memory closer to AI processes, promising to improve performance and reduce power consumption. Google released the TPU v5e for use in Google Cloud. AWS' next generation of AI chips includes Trainium2 and Graviton4. In addition, promising alternatives come from startups including Cerebras, Graphcore, Groq, Hailo Technologies, Kinara, Luminous, SambaNova and Mythic.

Cerebras is building wafer-size chips that allow 850,000 cores to connect to each other and memory at high speeds. A wafer is typically cut into many individual chips. Other companies such as Hailo and Kinara are optimizing chips to run privacy-preserving AI models at the edge.

These newer approaches face significant competition from GPU and CPU leaders. Nvidia, Intel and AMD continue to innovate in both CPU and GPU hardware. These giants are addressing some of the infrastructure and software bottlenecks that plagued earlier chip hardware.

All three companies have also developed a substantial base of tooling to speed AI development and deployment efforts. Innovative new architectures cannot only be faster or more energy-efficient than the established chips; to succeed, they also must seamlessly fit into existing enterprise workflows.

Rethinking AI models

Traditional AI methods rely heavily on statistics and math. As a result, they tend to work most effectively on GPUs designed to process many calculations in parallel.

"Statistical models are not only processor-intensive, they are also rigid and do not handle dynamics well," said Rix Ryskamp, a former technology executive who is transitioning into law.

Ryskamp believes machine learning architects need to hone their skills so that they rely less on statistical models that require heavy GPU workloads.

"To get new results and use different types of hardware, including IoT and other edge hardware, we need to rethink our models, not just repackage them," he said. "We need more models that use the processors that are already widely available, whether those are CPUs, IoT boards or any other hardware already in place with customers."

Editor's note: This article was originally reported in April 2020 and was updated by the author with new technologies and market advances in December 2023.

George Lawton is a journalist based in London. Over the last 30 years he has written more than 3,000 stories about computers, communications, knowledge management, business, health and other areas that interest him.

Dig Deeper on AI infrastructure

Business Analytics
Data Management