agsandrew - Fotolia
Cloud vendors are investing in GPU-based capabilities from different vendors, so organizations should try to understand the difference between AMD and Nvidia's offerings.
At Microsoft Ignite 2019, Microsoft revealed that it was working with semiconductor vendor AMD to provide a new set of virtual machines on Azure powered by AMD-based GPUs.
In Azure alone, Microsoft now has seven different virtual machines instance types with different GPU cards from AMD and Nvidia. Amazon and Google's cloud services have roughly the same amount of options as well.
IT departments should educate themselves on the technical differences of AMD vs. Nvidia and what kind of workloads they are best suited for.
How do different GPU offerings work?
From a virtualization perspective, GPU-based offerings are primarily aimed at remote visualization and encoding. These offerings provide a GPU-based desktop or application to remote end users.
AMD and Nvidia have been working on GPU cards specifically suited for AI, and deep learning-based workloads such as the popular machine learning engine Tensorflow. These GPUs are also the preferred choice of hardware to accelerate computational workloads in modern high-performance computing-based offerings.
For remote visualization workloads on traditional hypervisors, there are three options available to provide GPU capabilities to a virtual machine:
Pass-through. Mapping a physical GPU card directly to a virtual machine through the hypervisor. Technologies that use this method include VMware DirectPath I/O, XenServer GPU Passthrough and Hyper-V Discrete Device Assignment.
Virtual Shared Graphics. Hypervisor-based sharing of GPUs to virtual machines. Technologies that use this method include VMware vSGA and the previous feature in Hyper-v RemoteFX vGPU.
Virtual GPU (vGPU). GPU-based virtualization, having virtual GPU profiles attached to each virtual machine. Technologies that use this method include Nvidia vGPU and AMD MxGPU.
The main differences between these three delivery models are scale and compatibility with the different GPU capabilities to the virtual machine.
Pass-through mode provides full graphics compatibility, which means the end user can access the full functionality of the GPU attached to the machine. However, this approach does not provide scale because the GPU cards are locked to one virtual machine.
This can also mean that resources are not being used most efficiently. Organizations typically use pass-through mode for specific workloads that require more dedicated capacity.
With the second option of shared vGPU, the GPU capacity is split into multiple virtualized instances that can be attached to multiple virtual machines. The vGPU method also provides full functionality, but it ensures that each virtual machine has access to a certain amount of the underlying GPU resources.
The third option, regular vGPU, has been the most common deployment model for visualization workloads. For example, IT can run Citrix Virtual Apps and Desktop or VMware Horizon to provide GPU capabilities to multiple end users.
AMD vs. Nvidia comparison of vGPU products
Both AMD and Nvidia provide vGPU-based products, but there is a difference in their delivery models that organizations must understand before they choose a vendor.
Nvidia's vGPU offerings are based on installing host drivers within the hypervisor, which allocates the virtual graphics cards to the guest VMs. AMD is, however, a fully hardware-based approach with its MxGPU offering, based on the hardware feature called single root input/output virtualization (SR-IOV).
Both vendors also take different approaches from the hardware side. Nvidia implements Timeshare Scheduling in its GPU. This means that each user accessing the GPU gets access to all physical cores on the GPU for a time-slice. On the other hand, AMD allocates a portion of the GPU cores to each machine directly.
Nvidia's approach works well in situations when all users don't need full access to the GPU at all times. This approach allows users to share the resources with less friction. Nvidia's architecture also allows for live migration of virtual machines running with vGPU, which is not possible with AMD's MxGPU feature, but this feature comes at a cost.
Years after Nvidia released its vGPU offering, it altered its sales model to require a software license on top of their GPU cards. In addition to the Nvidia hardware, customers also need to buy a license to access software upgrades and activate the vGPU features.
AMD, on the other hand, does not require customers to buy any additional licenses to activate their MxGPU offering. Additionally, the MxGPU offering allows customers to provide virtualized graphics on top of the different cloud providers because it is hardware-based.
AMD MxGPU instances are now available in Microsoft Azure and they are one of the default options when choosing GPU-based VDI on Amazon AppStream.
Nvidia still has a larger footprint within major cloud providers such as Amazon, Google Cloud, Azure and even Oracle Cloud. However, this is only on virtual machines with dedicated GPU cards, which come at a much higher cost compared to AMD offerings, depending on the use cases.
Many of the most popular virtualization products with libraries for machine learning, deep learning and even statistical workloads have built-in support for the compute unified device architecture model. This model is only available on NVIDIA GPU cards.
AMD vs. Nvidia comparison: The verdict
Both Nvidia and AMD have evolved over the past couple of years and have their strengths and weaknesses. While AMD has been less visible in the public cloud market, they are well positioned to gain momentum now that more cloud vendors are adopting their MxGPU Offering.
MxGPU-powered desktops will always be cheaper than Nvidia's GPUs for both on-premises or cloud-hosted desktops. However, organizations that want the best performance and may want to support machine learning or other high-performance workloads should go with Nvidia.