benedetti68 - Fotolia


What's behind cloud providers' push for custom hardware?

The nature of the cloud market demands innovation, and custom hardware might be the next move for the major vendors in this space. Experts weigh in on the key drivers of this shift.

Initially, public clouds were built on generic hardware to cut costs and operate at massive scale -- but that's changing.

A shift toward highly scalable AI and machine learning workloads, as well as IoT and analytics applications, is driving cloud vendors to consider new architectures. Legacy chip and hardware manufacturers are attempting to bring these capabilities to market. But the major cloud vendors have increasingly taken matters into their own hands because those manufacturers can't keep up with their needs.

In terms of cloud hardware, Amazon already owns a small chip manufacturer, while Google has its tensor processing units (TPUs). This competition between cloud giants promises to change the way infrastructure is built and how it's consumed by developers.  

"End users [will] benefit from these purpose-built compute environments and could become more productive with the comfort that the platform will be there to support what they need to achieve," said Fahim Khan, vice president of cloud transformation services at Brillio, a digital transformation consultancy.

Amazon has been the most aggressive with its cloud hardware customization. It started by building more efficient routers. Then it moved into rethinking its server architecture with the Nitro System, which brought efficiency to all aspects of virtual machine provisioning. Most recently, Amazon developed AWS Inferentia, a customer inference engine for AI, and the Graviton line of CPUs based on ARM architecture.

Cloud vendors are exploring cloud hardware architectures from third parties designed to accelerate AI workloads, reduce costs or both. Dozens of AI chip and quantum computing startups are also emerging with offerings that could be made available in the cloud. These startups are also developing novel chips for optimizing cloud infrastructure.

Key drivers for custom cloud hardware

Cloud providers have capitalized on the demand for innovative software models and platforms that can support large data volumes. This has been the main driver behind the move toward custom cloud hardware and hardware-based features.

We've advanced a lot since then, but public cloud providers haven't stopped trying to squeeze out every efficiency they could.
Jeff ValentineCTO at CloudCheckr

"There's also a much higher demand for increased computing power at lower costs, which fuels hardware innovation from public cloud providers as much as new software services," said Jeff Valentine, CTO at CloudCheckr, a cloud management platform.

As cloud consumption grows, public cloud providers can only operate efficiently in one of two ways. Either they shoehorn commodity hardware into their data centers to try and accommodate their unique needs, or they design and develop something internally instead. Public cloud vendors are using custom hardware for improvements to availability, performance, security and cost, Valentine said. And a more secure and reliable infrastructure could ultimately attract and retain more customers.

In the early days of cloud, one of the first issues providers ran into was density and cooling. Data center space was expensive, and cooling was a big concern. Providers mounted motherboards onto racks and ran specialty fans across them to cool everything appropriately.

"We've advanced a lot since then, but public cloud providers haven't stopped trying to squeeze out every efficiency they could," said Valentine.

The focus today is mostly on how to operationalize the infrastructure. If Microsoft, Amazon or any other cloud provider can make its infrastructure super-efficient, it can theoretically pass those savings on to customers through lower prices.

But cloud data centers operate much differently than the typical enterprise facility, which presents unique challenges for vendors. For example, commodity hardware can update firmware through software, but shared-use servers must be specifically configured to disallow that. Instead, these vendors must roll out updates when they can safely be provisioned to the hardware BIOS. It's a pain for the public cloud staff, Valentine said.

As a result, AWS developed a Nitro security chip so firmware can be updated by AWS -- and only AWS. This saved AWS time and effort, but these types of behind-the-scenes efforts will largely go unnoticed by customers, at least directly.

"The reality is that most customers will only notice the cost," said Valentine.

The future benefits of custom hardware

In most cloud computing models, the end user is not directly exposed to the hardware. The reduced overhead for end users is one of the main reasons the cloud has become so popular.

"This [abstraction] keeps pace with demand, offers better quality of service at no higher cost, provides the ability to target the right type of hardware for workloads while not introducing any additional complexity for the end user," said Jeff Wittich, senior vice president of products at Ampere, a semiconductor company that creates CPUs for cloud and edge infrastructure.

Cloud providers continue to look for data center efficiencies that translate to improvements in their products. One area that's getting more attention involves reducing latency. This is particularly important in the wake of COVID-19 and the increase in work from home, online gaming, remote learning and video conferencing, said Vipin Jain, CTO at Pensando Systems, which uses custom chips for its software-defined service platform. Custom hardware promises to help ease the pressure on an overperforming infrastructure that was never sized for this level of rapid increases in scale, Jain said.

Bare metal is an emerging category of digital infrastructure that enables businesses to deploy workloads on secure, single-tenant hardware, distributed geographically for proximity and performance. Traditionally, organizations that wanted single-tenant hardware had to purchase colocation and power, order and ship their own hardware, and then employ technicians to set up, test and activate the servers. Cloud-based bare metal enables customers to skip these steps and create compute instances on demand when their needs meet a set of standard server configuration requirements.

A side effect of the AWS Nitro System was that it made it easier to provision bare metal instances, which enabled organizations to customize their infrastructure running in AWS.

Down the road, cloud providers might use custom hardware to reimagine the traditional computing architecture for things like AI. For example, IBM researchers have been working on a new class of neuromorphic chips that perform computations in the memory itself.

This practically eliminates the memory processor bottlenecks when performing many types of AI calculations, said Manuel Le Gallo, researcher in the Neuromorphic and In-memory Computing group at IBM Research.

However, the type of technology is still a few years away and will require developers to learn new programming techniques. In the meantime, ideas like Amazon's Nitro System will inspire other approaches to rethink traditional cloud architectures.

However, there could be downsides to this push for greater efficiency. The next wave of cloud hardware innovations could create a new kind of lock-in, which is ironic, since the cloud started out relying almost exclusively on commodity infrastructure components.

In theory, custom hardware like Amazon's Graviton CPUs or Google TPUs should run the same software as other hardware. But enterprises might be tempted adopt ancillary, cloud-specific services to improve performance or reduce maintenance for their apps. This could make it harder to migrate to other cloud platforms down the road.

Dig Deeper on Cloud infrastructure design and management

Data Center