Annual Update on Persistent Memory
Explore some of the new developments happening within the persistent memory market, the best fit applications, use cases for persistent memory and more.
Download this presentation: Annual Update on Persistent Memory
00:10 Dave Eggleston: Hi, welcome to the persistent memory track, sponsored by SNIA. I'm Dave Eggleston. I'm an independent consultant, and I'm going to be talking about some of the new things that are happening in persistent memory this year. We're going to have a full slate of speakers in the track, so please stay tuned for their presentations as well.
00:31 DE: Let's start by talking about what problem is it that persistent memory solves, and let's focus in on working memory. As this graphic from Intel shows is that as compute has accelerated and changed, DRAM, its scaling, has actually slowed down. And we can see that in the density curve. That's one of the key questions: "Can persistent memory help solve some of those problems around working memory? And what are those problems?"
In this graphic from Micron, we can see that the DRAM bandwidth per core has actually slowed down over time. That may be a surprising result, but as we've added cores and as we've added channels, the need of that processor has grown, and we haven't been able to keep up with feeding that memory into that processor.
01:29 DE: Here's another example of a problem with DRAM, is if we look at a current HPC system, which might have 56 cores and have about 112 gigabytes of memory, that takes about 50 watts. But as we scale going forward -- again, adding cores, adding memory -- the power we require is much, much more. This calculation shows it could be as much as 700 watts, which is really unsustainable. That's another problem with DRAM, is the power problem.
As we transition into DDR5, one of the things that's been increasingly happening is the complexity. We don't think about this often, but there's a lot of complexity in the board to routing all these different DIMMs for all these different channels. And as we can see for the DDR5 graphic on the right, this even gets to a point where we might even have a number of channels per DIMM, so that board complexity is also something that needs to be considered.
02:37 DE: The key question here is, can persistent memory deliver where DRAM may be falling a bit short both in the bandwidth, the power, the complexity and also the cost per gigabyte? Also, what applications does persistent memory work well for? We're going to explore some of these topics.
03:00 DE: Let's switch gears for a minute and talk about the media itself for persistent memory. There's been three leading candidates here in the past with a lot of R&D that's gone into it, MRAM, RRAM, and phase-change. What we're really starting to see for persistent memory consolidation around phase-change is being the correct media option to fill this space. As we can see in this chart, as I put up the logos of who's doing what, we now see that not only Intel and Micron are pursuing phase-change, but I do fully expect that we'll see SK Hynix and Samsung, so all the big memory players getting behind phase-change memory.
MRAM does have a role, but it does seem to be as an embedded NVM solution at the foundries. RRAM, it's been quite disappointing. I don't see the same kind of development around RRAM that we do in these other two technologies. So, if we're going to look at one for true standalone, high-density ability to address the DRAM market, that's going to be phase-change memory.
04:15 DE: Now, let's look at what Intel themselves have done as the one supplier right now of Optane and in Optane DIMMs, and we have already witnessed them moving from their first generation, called 100 Series to 200 Series And if we look at the advances in the 200 Series, something interesting stands out. They focused on increasing the endurance and also increasing the bandwidth. This must be what their customers are demanding. And we don't see the capacity increase yet; there is some discussion in the industry when that will come, but it just shows that getting better performance and getting that higher endurance is really critical for persistent memory.
05:02 DE: Intel themselves is promoting persistent memory to solve some real problems for their customers, and this graphic gives us a pretty good example of what some of those improvements can be. They range from the TCO savings, just less-expensive memory, to increasing the throughput. One of the key ones here being increasing the VM density that occurs in the system and by having that greater amount of memory. And then the last one is faster time to insight, so really solving some of the key problems for those customers in less time. That's kind of a high-level view of how Intel themselves positions it.
05:45 DE: Let's look at a couple other examples from companies that have been working with Intel but have their own take on it. One of the speakers, you'll hear later is Charles Fan of MemVerge, and MemVerge has done a really good job at the software level to make applications run on persistent memory without requiring the application change, so they do virtualize that access to the persistent memory.
06:13 DE: This one shows a surprising result. This is for a database application, and in the orange bar, they show what kind of performance you can get if you have a full DRAM system, and then the blue bars to the right show a mix of persistent memory with DRAM. And we do see a surprising result all the way on the right-hand side, which is even a certain combination of persistent memory, and DRAM actually has higher performance than the full DRAM system. In addition to that, one of the things that Charles emphasizes is there's a cost savings, and we'll get into that a little bit later, what is the relative cost of persistent memory to DRAM. So, more performance at a lower cost, a good value proposition.
07:08 DE: Here's another example from HPC specialist, Penguin Computing, which is taking the Facebook recommendation system, putting persistent memory in, that's shown on the blue bar, for doing the inferencing operation. And what they're doing is comparing it versus when they have a system with DRAM plus NVMe SSDs. And what we can see is a tremendous improvement in the inferencing because all that data is now held directly in memory. So, it would get a 10x inferencing acceleration, and as I mentioned before, in this case, they are using the MemVerge software to virtualize -- they did not have to rewrite their application in order to run this job.
08:00 DE: Now, where do you naturally attach persistent memory? This is a little bit of a complicated diagram, thanks to Smart Modular, and what we can see is, traditionally, we've attached that on the DDR bus, which is off to the side. And certainly today, that is a place where persistent memory can be attached.
08:20 DE: Other interfaces have been proposed, which include the OMI which includes CCIX, which include Gen-Z, which includes CXL, so there's been quite a lot of confusion about where do things attach outside the DDR bus. And a later speaker, Chris Peterson from Facebook, is going to talk about which interface is the right one and why it's the right one, and we see that consolidation really occurring around CXL now.
08:52 DE: And then finally, the form factor. When you have this attachment, when you move off that DDR bus, what kind of form factor is going to be utilized? Now, an effort by SNIA is to standardize these EDSFF form factors, and the form factor which most carefully mirrors a DDR DIMM is the E1.S, so do look for that to be a potential successor to the DIMM form factor for persistent memory, for these memory-driven applications. Want to learn more? Of course, get involved at SNIA.
09:31 DE: And then, finally, I mentioned a little while ago, what does the price look like, which reflects the cost of persistent memory versus DRAM? This is a slide quite recently from both MemVerge and The Next Platform, which does that comparison. Now, of course, persistent memory goes to the higher capacity levels, but if we do a rough comparison, we can see that persistent memory right now is priced roughly about 30% to 50% of the DRAM price on a dollar-per-gigabyte basis. That's pretty good and pretty attractive for persistent memory moving into that space. It's numbers that I've heard from potential customers as a tipping point where they might consider that, so we do see that kind of pricing coming now.
10:19 DE: These are the points we've covered, some of the challenges for DRAM, what are the things that persistent memory can perhaps overcome focusing on that bandwidth power and complexity as well as the cost. We did talk about the various media, and we see that consolidation around the PC RAM. We do see improvements coming already going to second generation and some of the applications we talked about, and the form factor of E1.S. Please stay tuned for other speakers in this track.
I hope you can learn a lot more about persistent memory and have your questions answered. Also coming up is going to be the SNIA Persistent Memory + Computational Storage Summit that will occur now in April, traditionally has been earlier in the year, but will be a two-day and a virtual conference coming up in April, so please join SNIA for that to learn in-depth about persistent memory. And thank you for your attention.
11:23 Chris Petersen: Hello. My name is Chris Petersen, and I'm a hardware systems technologist at Facebook, and also on the Board of Directors for Compute Express Link, or CXL. Before we jump into persistent memory, let's spend a few minutes talking about CXL -- what it is and how it applies to persistent memory.
In our increasingly connected world, there are many layers of interconnects involved in delivering all of the experiences we have come to enjoy and depend upon. As we move down the stack here from mobile networks and into the data centers that house and support the content, we traverse a variety of different interconnects. CXL is a new class of interconnect that is focused on below the data center rack level and is directly connected to CPUs. CXL is a coherent interface built on top of the PCIe physical layer. It provides a high bandwidth in very low-latency solution or heterogeneous workloads, like AI and high-performance compute, while also providing a path for memory expansion and pooling.
12:44 CP: Now, let's dig a bit deeper. As I mentioned, CXL runs on top of the PCIe physical layer. This provides us with an excellent opportunity to develop flexible server designs, as we can now design a single common slide that can accept either PCIe or CXL devices. As the system boots up, it will auto-detect and auto-negotiate the connection, depending on the type of device that is installed. Since it's built on top of PCIe, it also benefits from the same lane-count scalability that we enjoy today with PCIe. CXL is currently aligned with PCIe Gen 5, as the CXL use cases require the higher bandwidth and therefore link speeds provided by Gen 5. The expectation is that CXL is one of the primary drivers for further link speed increases with PCIe Gen 6.
13:49 CP: There's a pretty broad span of potentially supported use cases with CXL. It spans everything from processors, memory accelerators and so on. However, today I'm going to focus on one particular area. I'm going to focus on memory buffers and memory expansion use cases. This basically allows us to attach larger swaths of memory and make them available to the host processor, can, of course, be DRAM, but it can also be other types of new memory, including persistent memory.
14:28 CP: CXL provides us with the opportunity of a common memory interface. It's a standardized interface, and we can put many different types of memory behind it. Now, we have the opportunity to put a media-specific controller behind the CXL interface. This provides us with the opportunity to implement only media-specific timings, including asymmetric or non-deterministic timings, air handling or other media-specific needs without having to put these into the CPU itself. This provides us with a media-independent solution so that we can connect all types of media, including different DDR versions, LPDDR, persistent memory and so on.
15:20 CP: The commonality of this interface is incredibly valuable from both a server architecture and design perspective, and as a customer, it greatly simplifies the solution space, while at the same time maintaining flexibility to support a large variety of applications and allows us to adapt to the rapid evolution of software. We can now more easily make just-in-time server configuration decisions. Dave nicely highlighted a number of challenges that we are facing with DRAM, including bandwidth, power, complexity and cost. I will touch on these a bit further in the coming slides.
16:06 CP: One of the core challenges is that memory options are becoming more heterogeneous, it's no longer just DRAM. DIMM certainly served their purpose very well, but are not ideal for all types of memory, nor can we easily support heterogeneous memory types. Traditionally, we end up having the same generation of DDR on a single platform because we have common memory controllers, and we're limited by the controllers and device themselves. Generally speaking, we also have the same speeds and timings across the entire platform. This ensures that we have consistent bus efficiencies and controller implementations.
16:47 CP: And, finally, we typically even end up with the same device geometries implemented on the same platform, which allows us to implement consistent interleaving across different channels. In other words, it's all very homogenous today. CXL solves this problem. I already mentioned that CXL allows us to separate the media and its memory controller from the CPU giving us the flexibility we need. This then allows us to separate out tiers of memory, for example, slower or faster tiers, with minimal interference. We can now mix and match different types of memory types to add bandwidth or capacity or both. We can have DDR4 and DDR5 coexist on the same platform, for example.
Finally, as memory bus feeds continue to increase to meet bandwidth demands, we'll end up with less DIMMs per channel. CXL allows us to expand capacity in bandwidth on top of the traditional DIMM channels to compensate for this.
18:00 CP: CXL also helps us address a number of other system challenges. First, let's talk a little bit about power density. DIMM slots tend to be limited to around 15 to 18 watts, and perhaps less in some platforms that are even more dense. This is challenging for some media types because that's a relatively constrained power envelope. CXL enables us to separate the DIMM slots and CXL memory slots out further. Now, we can work on form factors that are perhaps more optimal for different memory types that could handle higher power consumptions or could be cooled more efficiently. Now, we can build 25 watt or higher power devices, and we can keep them in separate portions of the platform so that we can continue to cool them efficiently.
18:55 CP: Another challenge that's becoming increasingly difficult is scaling memory channels. Unfortunately, the memory speeds themselves have not been scaling as quickly as we ultimately need them, so they're not able to keep up with a lot of the CPU core count growth that we've seen over the past few years. This means that to continue to scale capacity and bandwidth, we need to continue to add memory channels to our platforms. This is becoming increasingly difficult. Ultimately, it also comes at a higher cost. As we add each memory channel, we're adding several hundred pins of additional signals and power. This ultimately leads to more expensive CPU sockets and impacts the reliability of those sockets as well, as they need to make good contact over all of those thousands of pins and over many, many years of deployment.
19:53 CP: As an example, we also need to add additional PCB layers to the motherboard to be able route these additional channels, and this starts increasing the PCB costs. As we've added memory channels, we've gone from 12-layer motherboards to 16 layer, and if we get to 10 or 12 memory channels, we'll easily exceed 20-layer motherboards. This becomes increasingly expensive. Ultimately, we'd like to keep the layer count lower, work with smaller sockets and eliminate the need for other complex PCB design options, including things like back drilling. CXL allows us to reduce a lot of that impact because we have a lot less pins, basically one quarter of the pins for a single memory channel. And, therefore, we can either add more channels to the CPU sockets or we can simply reduce the socket sizes by reducing the pin counts, which will overall reduce the PCB layer counts as well.
21:00 CP: Finally, CXL offers us the opportunity to try out different form factors, especially for different media types. Now that we have an opportunity to build memory solutions beyond what we typically do in a DIMM form factor, it opens up an opportunity, so now we can move beyond the thermal and power improvements that I've already mentioned, where we can also add other flexibility that perhaps new use cases demand. For example, we can enable hot plug, system disaggregation and cable solutions. Dave alluded earlier to the E1.S form factor, which is an excellent example of this. This provides us a hot pluggability, for example, should a use case require.
21:52 CP: The CXL Consortium is pleased to announce that we have completed the 2.0 specification approximately one year after incorporation. The 2.0 update brings with it a number of important updates, but for today I will only focus on three of them as they relate to memory specifically. I'll cover each of these in more detail in the coming slides, but we now have the ability to support full memory solutions. We also now have a standardized management interface, in addition to the standardized data plan. Finally, we can now fully support persistent memory with the addition of a flush mechanism.
22:38 CP: Memory pooling provides us with an excellent opportunity to improve the utilization of memory. As Dave mentioned earlier, memory is a very valuable resource, and it's important that we're able to utilize it effectively. Memory pooling also allows us to right-size the memory for the specific applications so that we no longer have to overprovision. With CXL 2.0, we have implemented CXL switching, which allows us to map memory on different devices to different hosts, and to do so dynamically. At the top of this diagram, we can see several color-coded hosts, labeled H1, H2 and so on. Below this, we have a CXL 2.0 switch, and below that we have a series of memory devices labeled D1, D2 and so on. With memory pooling in CXL, we can now allocate memory to each host as needed across these devices. For example, host 1, or the yellow gold color here, has memory allocated from devices 1, 2 and 4. Similarly, host 1 has memory allocated from devices 3 and 4. If more memory is needed on host 1 in the future, we can either add additional memory devices to the pool or move an existing memory allocation from a different host to host one.
24:16 CP: In addition to memory pooling, CXL also enables additional capabilities to help us support persistent memory. As I mentioned in past slides, CXL allows us to move the memory controller behind CXL and out of the CPU, enabling the greater flexibility. We've also standardized the management interface. This allows us to manage all CXL memory devices in the same way, regardless of the type of memory that is behind the media controller itself. This could be volatile or persistent, or any combination thereof. For example, we can now update firmware or check on air accounts with the same interface using the common driver and even common tools. This eliminates the need for vendor-specific software, simplifies adoption, and streamlines monitoring and management of devices in the field. Finally, with the global persistent flush mechanism we have introduced, we can now ensure data integrity with CXL memory devices across all failure scenarios. We now have a solution using CXL and persistent memory that fills the gap between DRAM and high-performance SSDs.
25:41 CP: In summary, CXL enables a standardized, scalable persistent memory solution. It provides us with the ability to expand on bandwidth and capacity. It enables form factor flexibility and it improved system design flexibility as well, so it allows us to make sure we optimize power, thermal motherboard design complexity, and ultimately the cost of the entire solution. We can further improve the electrical protocol and switching interfaces, and we now have a common management interface as well. Ultimately, CXL provides us with a holistic solution and should be an excellent fit for persistent memory solutions now and in the future.
26:32 CP: The CXL Consortium is now over 130 member companies, and it's still growing. I'd like to call attention specifically to the fact that we have all of the CPU manufacturers on the board represented, and we expect this to be an industry first. We have multiple membership levels, and we'd love to see you join and contribute to this continuously growing ecosystem. Help us make CXL successful and help us continue to improve the standard.
Since we've just released the 2.0 spec, work has already begun on the third generation, and we're trying to respond to the industry needs and challenges while continuing to maintain backward compatibility. We have also instituted compliance and interoperability program. Finally, please join CXL Consortium. Follow us on Twitter and LinkedIn to follow us for additional updates. Thank you.