Compute Express Link technology should boost an organization's price/performance ratio, but data center managers must understand which systems can best take advantage of it.
Compute Express Link, or CXL, enables large memories to be supported coherently between processors. It's a good alternative to the way that GPUs currently work, where data moves from the host's memory to the GPU's memory through a slower PCIe I/O channel. CXL 3.0 also enables shared memory to be allocated to different hosts in a system, thus allowing for memory to become disaggregated, similar to the way that storage is today.
Until now, systems that included multiple kinds of processors (such as CPU, GPU and AI processors) broke memory down into different spaces -- one for the server, one for the GPU and perhaps one for the AI processor -- and admins treated any data sharing between those devices as an I/O transfer. Compute Express Link is a way to make all memory accesses look like memory to all processors.
Why CXL is important
Today, all communication between a GPU or similar device and a server's memory is performed as I/O. I/O is a much slower way to update memory compared to the memory channel. Likewise, all data management between the server's CPU and an accelerator's memory has also been handled as I/O, and that slows down its data movement in both directions.
Before CXL technology, the host managed these systems' memories through PCIe. The PCIe physical interface is fast, but the protocol is not -- particularly for large data transfers. CXL enables admins to perform data transfers much faster, but the CXL protocol is more limited than that of PCIe.
Some of the delay also stems from the fact that software manages PCIe through interrupt-driven context switches. On the other hand, processors immediately handle memory accesses without using an interrupt-driven I/O protocol.
CXL enables accelerators to access a server's memory as memory and allows the server to access an accelerator's memory as memory. There are no interrupts and no context switches.
This technology will speed up certain tasks. Prior to CXL, admins loaded a GPU card the same way they would write to a PCIe SSD: through a PCIe I/O channel. There's a lot of software overhead in an I/O transfer, so these transfers are very slow. By converting such accesses to standard memory load-store semantics, this overhead disappears, and admins can rapidly copy the data from the server memory to the GPU's memory. Data also travels in the other direction, so admins can copy the AI accelerator's memory contents into the host server's memory as a memory-to-memory copy.
It's not simple to implement, though. When two independent devices access each other's memory as memory, the process can lose coherency. For example, if a server processor has cached a certain memory location and an accelerator revises that memory location, then admins must update the cached copy in the server's CPU. CXL technology uses built-in mechanisms to assure that admins will properly manage such occurrences.
As with most architectural changes, CXL focuses on increasing compute throughput and lowers costs. By reducing memory-to-memory transfer delays, CXL optimizes the use of existing hardware with changes only to the underlying software.
In fact, the physical interface of PCIe and CXL are the same. During initialization, these protocols determine whether a certain channel will be used as a PCIe lane or for CXL.
Who's using CXL?
Most potential early adopters are in hyperscale data centers. Admins can use CXL as a mechanism to share pooled memory between many servers, making it most interesting for large installations; it might help improve data center throughput, in addition to cutting costs and server count.
Data center managers will also use CXL in other applications, particularly in subsystems that have their own memory, such as GPU cards and AI accelerators. Expect to see CXL gain widespread use over the long term in systems with multiple processors.
Wasn't CXL an Optane thing?
Many folks have pigeonholed CXL as a place to put Optane DIMMs. Intel's decision to scale Optane back does not affect the use of CXL.
A significant portion of CXL's support is in the hyperscale community. Hyperscale data centers are interested in CXL technology for its support of shared pools of DRAM. For most of those firms, DRAM is a big concern. For example, consider a server farm with 20,000 servers that are all similarly configured with a certain amount of DRAM. If there's too little DRAM for certain applications, those applications will perform slowly. Pages get swapped onto and off of storage more frequently than desired.
So, why not make all the servers use a large complement of DRAM? This costs more, but there are other issues, too. DRAM is an energy hog, so there's motivation to keep its size down. Much of that energy becomes dissipated heat, so cooling costs increase with more DRAM. It's a best practice to equip each server with the minimum practical DRAM size.
Some admins might be tempted to dedicate certain servers to large memory applications. This works against the concept of virtualization and composability, where tasks can be freely allocated to any server with the assumption that all of the data center's servers are equivalent. Additionally, any server with a large amount of DRAM leaves much of that memory idle when it performs smaller-memory tasks.
Other parts of the system dynamically allocate shared resources, such as storage and even servers in the case of virtualization. Why can't the same thing be done with memory? This is the key motivation for hyperscale data centers to adopt shared memory pools. Applications that require large memories can borrow a portion of memory from the pool and relinquish it when the task is complete.
In this way, individual servers can contain relatively little internal DRAM, called near memory, and borrow what they need from the pool on the other side of the CXL channel, or far memory.
How CXL improves performance
Nearly any program performs better as admins add memory, but there is a point of diminishing returns for all applications. CXL technology helps by enabling the system to dynamically determine which programs should get a performance boost and which should not.
In one example, Application A experiences a big boost every time its memory size is doubled, while Application B sees a more modest boost and Application C is only lightly affected by memory size. Any system that runs any instances of Application A should allocate as much shared memory to that application as it can and should allocate anything left over to Application B.
If there are no instances of Application A, then Application B should get most of the memory pool, with any leftovers going to Application C. All the applications, then, will benefit from this improved performance.
But this example is too simple. If a block of memory frees up, and data center managers can use that block to accelerate an instance of Application B by 12%, but it will only accelerate an instance of Application A by 11% simply because Application A has already been allocated a lot of memory, then the dynamic allocation algorithm should favor Application B.
A relatively sophisticated supervisor must manage this process. This software will be developed over time.
In the end, though, it's all about how a large system can provide more memory to the most memory-hungry applications in their time of need, while simultaneously reducing the data center's DRAM population, with its high cost, power consumption and cooling requirements.
CXL technology provides large memory at a low cost for memory-hungry applications. It also enables accelerators and servers to write into each other's memories, along with maintaining coherency. That's quite a lot for something that is a new protocol operating over existing PCIe hardware.