Nvidia has invested heavily in creating an ecosystem around its GPU product. But rival AMD also offers powerful GPUs that present solid competition. As Nvidia releases its new Ampere architecture and AMD puts forth its MD100 GPU, the two vendors once again vie to dominate in the GPU market.
GPUs originated as fixed-function accelerators to speed demanding graphics calculations but have since evolved into fully programmable compute engines. Organizations now frequently use GPUs for tasks such as training machine learning models or high-performance computing (HPC).
Organizations hoping to accelerate graphics processing in their data centers should compare Nvidia vs. AMD's GPU offerings to determine which might suit their needs best.
Nvidia's GPU offerings
Organizations use Nvidia's GPUs for a range of data center workloads, including machine learning training and operating machine learning models. Nvidia GPUs can also accelerate the calculations in supercomputing simulations, such as financial modeling or extreme weather prediction. In addition, Nvidia's partner OmniSci has developed a platform with a GPU-accelerated database, rendering engine and visualization system that can deliver analytics results much faster than conventional alternatives.
Nvidia's A100 GPUs -- its most recent -- are based on Nvidia's Ampere architecture, which replaced its Volta and Turing architectures. The A100 GPU accelerator features 108 streaming multiprocessors, each of which contains four of Nvidia's third-generation Tensor Cores and 64 FP32 CUDA Cores. The Tensor Core is a specialized processing unit optimized for 4x4 matrix operations, which can significantly speed machine learning calculations. Tensor Cores also feature enhancements for the fine-grained sparsity common to AI and HPC workloads.
The GA100 chip at the heart of this GPU accelerator has a dozen 512-bit second-generation High Bandwidth Memory controllers that feed into six banks of HBM2 stacked memory. At launch, the A100 shipped with 40 GB memory, offering 1,555 GBps of memory bandwidth, but Nvidia unveiled a new version in November 2020 that doubles the memory to 80 GB and increases the memory bandwidth to 2 TBps.
Nvidia also offers CUDA software support for developers, such as the CUDA Toolkit that includes GPU-accelerated libraries, a compiler, development tools and the CUDA runtime. Organizations can build machine learning frameworks around CUDA for GPU accelerator support.
AMD's GPU offerings
AMD's Instinct MI100 GPU came out in 2020, targeting scientific computing workloads. AMD has effectively split its GPU portfolio into gaming-targeted models -- with its Radeon DNA architecture -- and data center performance models, such as the Compute DNA architecture in the Instinct MI100.
The Instinct MI100 implements 120 compute units (CUs), split into eight blocks and interconnected by an on-die fabric, meaning they're connected at the chip level. Like Nvidia's GPU, the CUs are made up of smaller functional units called stream processors, which number 64 per CU. Also like Nvidia, AMD uses HBM2 memory, and its GPU has four banks, providing a total of 32 GB of memory and 1.23 TBps of aggregate memory bandwidth.
The CUs in the Instinct MI100 feature Matrix Core Engines optimized for the matrix data types seen in machine learning. It also supports new numerical formats for machine learning and preserves backward compatibility for software written for the extant AMD GPU architecture.
AMD offers a software development platform called ROCm. ROCm, an open platform, enables developers to write and compile code for multiple environments, including Nvidia GPUs. It supports common machine learning frameworks such as the open source TensorFlow and PyTorch. ROCm also provides pathways for porting Nvidia CUDA code to AMD hardware.
Nvidia vs. AMD GPUs: How do they measure up?
A straight comparison between Nvidia and AMD's GPU performance figures gives AMD an apparent edge over Nvidia, with up to 11.5 teraflops in 64-bit floating point (FP64) and up to 23.1 teraflops in FP32, compared with Nvidia's 9.7 teraflops in FP64 and 19.5 teraflops in FP32. However, Nvidia's A100 GPU boasts key enhancements to accelerate AI functions and includes much more memory than AMD's GPU. AMD presents a serious rival for Nvidia when it comes to HPC, but Nvidia still maintains the edge for AI acceleration, according to Moor Insights & Strategy.
Nvidia has a more mature programming framework in CUDA, but AMD's ROCm works as a universal platform for GPU-accelerated computing on any GPU. Potential customers should evaluate performance for themselves, based on the applications and tools they wish to run.
IT shops might have an additional choice when it comes to GPU accelerators. Intel intends to enter the GPU market with its own offerings -- the Intel Xe family, which focuses on a high-end device, codenamed Ponte Vecchio. Intel plans to deploy Ponte Vecchio in the Aurora supercomputer at the Argonne National Laboratory, in Lemont, Ill., in 2021.