Tip

How Chiplets will Accelerate Storage

Chiplets are a newer approach to chips in processors, where a smaller collection of chips is packaged together to emulate a larger die size.

The world of advanced processors is moving to a new approach, where discrete chips that have grown to their maximum size are being succeeded by collections of "Chiplets," smaller chips that reside together within a single package to emulate a larger die size than the industry can produce. Both the sheer die (an individual piece of silicon that contains a functional circuit) area and the interconnections cause quality and reliability concerns, but there's a way to address these concerns that has the added benefits of improving time to market and overall quality and reliability. This post will explain chiplets, the concerns they cause and the remedy to these concerns.

Why are Chiplets Catching On?

Complex chips like processors have gotten into a bind. They have grown large enough that Moore's Law no longer applies.

As a refresher: Moore's Law projects that the number of transistors on a chip should double every year or two. Gordon Moore explained that there seem to be three drivers for the increasing complexity of an integrated circuit (IC):

  • The shrinking dimensions of a single transistor.
  • An increase in the size of the IC's die.
  • "Circuit Cleverness" -- which are the little tricks that the chip designers develop to pack things more tightly than before.

The contribution of each of these factors is estimated graphically in Moore's 1975 review of his original 1965 paper, the paper that spawned the name "Moore's Law."

Figure 1. Gordon Moore's estimation of contributions from transistor dimensions, die size and circuit cleverness

Estimation of components per chip over year.
Gordon Moore's estimation of contributions from transistor dimensions, die size and circuit cleverness.

The industry continues to advance along the first and third paths, transistor dimensions and cleverness, but modern processors have reached a size that fills the total area of the reticle on a photolithographic scanner, so their die size can no longer increase. This threatens to limit the rate of growth for a chip's complexity.

The chart in Figure 2 below, created using actual device die sizes, helps to illustrate the fact that processor die sizes are no longer growing:

Figure 2. Die Sizes Over Time

Die sizes over time.
This illustration indicates that processor die sizes are no longer growing, partially due to semiconductor technology and challenges in computer microarchitectures.

The large black squares are Intel processors. Note that after the 1995 release of Intel's Pentium processor (labelled "Pent"), die sizes have leveled off between 100-300mm².

In order to continue the industry's trajectory and grow die area along the same path, designers have adopted the "Chiplet" approach. The processor is split into two or more smaller chips that can be manufactured on current tooling and behave like one single die. An example of this approach appears in Figure 3, which shows an AMD RYZEN 9980 with the package lid removed. Eight "Core Complex" (8-core processor) chiplets surround a central I/O chiplet to provide a 64-core product.

Figure 3. AMD RYZEN 9980 with I/O Chiplet Surrounded by Eight Core Complex Chiplets

An AMD RYZEN 9980 with I/O Chiplets.
An example image showing an AMD RYZEN 9980 with I/O Chiplets.

These chiplets communicate with each other using a new kind of interface that differs significantly from standard digital I/O pins. Conventional chip-to-chip signaling assumes that chips are connected to each other at some distance on a high-capacitance printed circuit board using well-established signal levels through relatively large metal package pins or solder bumps, which also contribute capacitance. The chips also have ESD (electrostatic discharge) protection to keep them from being destroyed by the static charge built up on a human body, since humans are assumed to be touching them at some point. ESD protection adds even more to the signal line's capacitance.

All of this capacitance slows the communication between chips.

Chiplets can bypass many of these sources of capacitance. Today's logic chiplets are proprietary, so they can use I/O signaling levels that are defined by the chip's manufacturer. They communicate using tiny TSVs (through-silicon vias -- signal lines that run vertically through the chip itself) or through small and short connections on an interposer without going through metal pins, so they drive much smaller capacitive loads. Finally, they can avoid adopting ESD protection as long as the manufacturer can assure that no human will come in contact with the chiplet before it is assembled into a module. This type of assembly is exclusively performed automatically.

Proprietary chiplets work well, but there are strong economic benefits to defining a standard chiplet interface that enables third parties to produce certain functions using function-specific processes. For example, if the processor uses a large cache, it might make sense for the cache to be built on its own chiplet using a memory process while the logic portion of the processor is built on a chiplet that uses a high-performance logic process. A memory company might be better than a processor company at building a cost-effective cache memory chip, but a memory company would be much more interested in building a cache chiplet that can be used by multiple processor vendors than in building a number of customer-specific products.

In order for this to happen, an interface standard must be developed for inter-chiplet communications.

HBM: Memory Chiplets with Standardized I/O

One kind of chiplet that is already widely used has a standardized interface. That chiplet is the high-bandwidth memory or HBM. This device is not a chiplet in the usual sense, since it consists of a stack of DRAM chips atop an interface chip built using a logic production process, where most other chiplets will be simple single-die chips.

The HBM used in most high-end GPUs and AI-oriented xPUs -- like Google's TPU or AWS' Tranium -- uses a JEDEC-standardized interface that has been mutually agreed upon by DRAM makers and the HBM's customers, such as Samsung, SK hynix, Micron, Nvidia and Broadcom. The standardized interface is from the same JEDEC that defines pinouts for DRAM and flash chips. HBM has shown the industry that there is a benefit from both a sourcing and a pricing standpoint to using a standardized interface to communicate from chiplet to chiplet.

HBM stacks must be intimately attached to the SoC (Systems on a Chip), so they are mounted along with the SoC within the SoC's package, making them inaccessible for test or repair. Figure 4 shows an Nvidia AMPERE GPU with the package's lid removed. The three silver rectangles at the top, and three below the gold GPU die are HBMs.

Figure 4. Nvidia AMPERE. Three HBMs are above the GPU, and Three are Below

An Nvidia AMPERE chip.
An example image of Nvidia AMPERE. Three HBMs are above the GPU, and three are below.

SanDisk has revealed a device that brings storage to the HBM format. In a collaboration with hyperscalers and GPU designers, the company says that it is developing high-bandwidth flash, or HBF, which puts nonvolatile flash storage onto a JEDEC-standard HBM interface. HBF is based on a new type of NAND flash chip that focuses on bandwidth, rather than low cost, to provide significantly more memory to be packaged within a GPU. In one example, Sandisk executives explained that 192GB of HBM DRAM could be completely replaced with 4,096GB of HBF NAND flash. This is intended to better support the needs of inference applications where the read/write balance is heavily tilted in favor of reads. Since HBF is nonvolatile, it will probably fall to the storage experts in the data center for management. Expect to see big changes in the way that data is handled in inference applications.

What is Being Done to Produce a Standard Logic Chiplet Interface?

The semiconductor industry has come together to form another organization to create a standard interface for chips other than HBM. This interface has been named UCIe -- the Universal Chiplet Interface Express. The standard organization's name is the UCIe Consortium.

This specification does much more than address signal levels and pin loading. It establishes a complete Die-to-Die interconnect with physical layer, protocol stack, software model and even a compliance testing methodology. The UCIe Consortium's goal for this specification is to enable end users to easily mix and match chiplet components from multiple vendors for either off-the-shelf or custom SoCs. 

While storage is not likely to be connected to a UCIe interface directly, it is quite likely that the high-complexity storage controllers and network management "Chips" of tomorrow's storage systems will be constructed using multiple chiplets inside. Expect for this to create newer, higher-complexity storage systems over the next few years.

Who's Driving This?

Current members of the consortium come largely from four groups: SoC/Processor vendors, memory manufacturers, Foundries & OSAT and OEMs & end users. Some of the leading companies appear in Table 1 below.

Other companies that don't appear in the table above have also joined the push to promote the UCIe standard. One such company is Synopsys, a leading EDA (electronic design automation) firm. Engineers use Synopsis's design software to design everything from chips to systems, and the company is now supporting the UCIe interface with an off-the-shelf UCIe interface package that communicates at a speed of 40Gbps, which Synopsys states is 25% faster than the upper limit of the current UCIe specification.

There's a good reason for the strong interest in this standard. According to market research firm Omdia, a division of Informa TechTarget, the chiplet market should reach $57 billion by 2035.

Longer-Term Thinking

One of the big advantages of packaging chiplets together instead of using multiple packages on a printed circuit board is that the chiplet approach provides enormous speed between chiplets at lower power levels than can be done through a package's pins across a printed circuit board. For this reason, chiplets are being used to provide the highest possible chip-to-chip communication rates.

But this isn't enough for the fastest designs, and so some companies are focusing efforts on a future version of UCIe that will be based on optical interconnect rather than electrical. Optical interfaces aren't burdened as much as electrical interfaces with delays stemming from wire and pin capacitance.

Optical interconnect is, however, already available in the form of a dedicated optical interface chiplet based on the UCIe standard. Ayar's Ayar's new optical interconnect chiplet communicates at a speed as high as 8Tbps through a multi-band optical interface. Ayar believes that this product will assist the development of high-speed mesh networks, which could be used to connect storage arrays.

Wrap-Up: A Worthwhile Approach

As the world of large SoCs increases its adoption of the chiplet approach, an increasing number of chips will be absorbed into the SoC's package. This will lead to more elaborate storage controllers, and appears also poised to bring persistent storage into the package of GPUs, xPUs and other high-complexity SoCs. Storage Admins will be faced with two new areas to manage: these high-complexity storage networks, and the storage that is tightly coupled to the xPU. The next few years will be very interesting for anyone involved in high-end system storage.

Jim Handy is a semiconductor and SSD analyst at Objective Analysis in Los Gatos, Calif.

Dig Deeper on Storage architecture and strategy