Managing Data Growth in the Zettabyte Era Challenges in Hyperscale: What Hyperscalers Care About
Guest Post

Using 3D XPoint Persistent Memory Effectively

There is a lot of excitement around 3D XPoint, but there are some hidden potholes that can interrupt the most promising efforts. Designing around these potholes is a must.

Download this presentation: Using 3D XPoint Persistent Memory Effectively

00:02 Marc Staimer: Welcome to Flash Memory 2020 presentation of using 3D XPoint persistent memory effectively. This is a presentation by me, Marc Staimer, the Dragon Slayer. Who, you might ask, is the dragon slayer? Well, I've been an analyst, consultant, industry expert for 22 years. More than 22 years and an industry experience of more than 40 years. I help end users and vendors. For vendors, I train them, and I coach them, and I provide content for them to make their sales and their marketing more effective. I shorten their sales cycle for them. For end users, I help them solve problems, I give it away for free. If they want information, if I can help them, I will. I don't sell them anything unless they want me on-site, and you can see, I publish frequently with TechTarget. I do not have a website but you can contact me at my email, or my phone numbers, or my Twitter handle. So let's talk about what I'm going to talk about. There's a lot I'm going to talk about. I'm going to cover a variety of topics that you see on your screen, I'm not going to read them to you, you can read them. And we're going to go into great detail on a lot of things, so let's get going.

01:17 MS: For those of you who've heard me speak before, you know I like to tell stories, it's a funny thing. So I'm going to tell you a story. This is about a magician on a cruise line. He was a good magician. Had crowds every night. They loved him. They cheered. The captain heard about it and he wanted to see it, so he started coming down to watch him every night, and he brought with him, his parrot. And the parrot started getting wise to what was going on, and pretty soon he started yelling out answers like, "It's in his sleeve. Check under the table. It's in his hat. It's behind his back." This was kind of upsetting the magician, but what could he do? It was the captain's bird. And then one day, the boat sank and the bird and the magician ended up on a deserted island. And days go by, neither one says anything to the other. Nothing happens. Finally, after a week, the parrot turns to the magician and sighs, "Okay, I give up. Where'd you hide the boat?" It's not magic. PMEM is not magic, but it's different and that's what we're going to get to.

02:25 MS: I'm going to briefly explain what this non-volatile memory is. A lot of people think that it's ReRAM or a subset of ReRAM. That's because it uses different storage physics and it swaps transistors with threshold switches in the memory cells, and it looks a lot like resistive RAM, right? ReRAM. Other people think it's a PCM variation. Partly because Intel and Micron had worked with some PCM materials before they came out with 3D XPoint. And that's also because they were using chalcogenide versus GST for the selector and memory cell storage parts. But in the end, both Intel and Micron will say, "It's not PCM." They swear up and down, it's not PCM. They will concede that it is a subset of ReRAM in many ways. So that's what we're going to go with for right now.

03:20 MS: There are two physical implementations. The first is as an NVM SSD. NVMe SSD also known as storage class memory. And you get it from either Intel or Micron, they each have them there. But the other implementation, as a Data Center Persistent Memory Module, or DCPMM, which nobody talks about. It's really referred to as PMEM, is a DDR4 DIMM, and it's only from Intel. And it only works with Intel's latest CPUs. Don't try to make this work with ARM, or with AMD, or with Power, it's not going to work with those chips. It only works with Intel CPUs and only the latest Intel CPUs.

04:06 MS: So what's PMEM? Well, think of it as twice the capacity per DIMM slot as DRAM. A little slower than DRAM, but more capacity. You need to have one DRAM DIMM for each PMEM DIMM. It's a one-to-one relationship. The capacity can be way different, you can have, even a 16 gigabyte DRAM DIMM and 512 gigabyte PMEM DIMM. That doesn't matter, but you have to have one DRAM for every PMEM. And that's because the DRAM acts as a cache for PMEM, in certain modes and as a result, you've got to have enough DRAM to make it work out. Now, realistically, you can have as much as four and a half terabytes per socket. One and a half terabytes of DRAM and three terabytes of PMEM per each Intel socket. There are two generations of PMEM, the Series 100 and the Series 200. Kind of easy to understand. If you look at their specs, in reality, the 200 series adds roughly about 25% more bandwidth per PMEM and roughly 33% more endurance, plus and minus, then the 100 series. And they come in 128 gigabyte, 256 gigabyte, and 512 gigabyte. There's no difference in capacities.

05:37 MS: The PMEM enables 4.5 terabyte sockets, as I said before, and the key thing here is it's got 60 times the write endurance of NAND. Wow. Now note, the 200 series just came out. The 100 series came out a couple of years ago, and that's what most people are using right now. The 200 series will eventually supplement it. Now, one of the key advantages for persistent memory is the latency. Yes, it's slower than DRAM. Yes, we know that, but it's much faster than storage class memory, 29 times faster. Yes, it's 25 times slower than DRAM, it's 29 times faster than storage class memory when you look at it from a latency point of view, and when it comes to the latest NVMe 1.4 NAND flash SSDs, it's 86 to 343 times faster, depending on whether you're doing writes or reads. And when it comes to the SATA SSDs, it's thousands of times faster.

06:44 MS: Now, there are two addressability modes, memory mode and application direct mode. Let's talk about each of those. For memory mode, it looks and acts like DRAM, even to the point where it loses its non-volatility. Woops, using non-volatile PMEM but it's volatile now? Yes. But there are no changes you have to make to applications in the file system. It's paired with DRAM. DRAM acts as a cache as I just mentioned before, and it makes the PMEM volatile during power fail. Oh well. But the way to think of it in this circumstance is that the PMEM is really just less expensive memory at a lower cost. Well yeah, less expensive. Higher capacity at a lower cost. Keep that in mind, higher capacity, lower cost, lower cost per gigabyte. Performance, you're not going to really notice it that much, unless you're paging a lot out of DRAM to PMEM.

07:48 MS: The other application method or the other access method is application direct mode or DACs which is direct access. And it's much faster than storage, although slower than memory. In this case, there are three methodologies; you could have the raw device access, file API, or memory access. And under the file API you have two subsets, the file system API and the NVM-aware file system API. They're different. Let's look at each one for a moment. In raw device access, the application reads and writes directly to the persistent memory driver that exists in the host operating system at a block level. Very straightforward, still going to write to it, but it's straightforward. It's relatively fast, but there are faster ways. The file system method is the easiest method. The application reads and writes via the file system API. So you just going to make API calls. And that's to the VMEM driver but the thing to remember about this is, of all the direct access methods, this is the slowest.

09:00 MS: The other file system method is the NVM-aware file system. It modifies the file system as you would expect to be NVM-aware, it's designed to run faster. It's similar to the other with the API calls, it's relatively easy to use, it makes Windows and Linux and vShare access faster. You'll find it in all three of them. Then there's memory access and you're going to find that most of the implementations out there are going for memory access, and that is with good reason. The applications use memory semantics to load and store the instructions directly accessing the persistent memory. It's byte access, not block access. All the rest were block access, this is byte access. It's the fastest by far of persistent memory performance. And there are some really interesting clever app implementations. Consequential use is being done with this with Oracle Exadata X8M, SAP HANA, Formulus Black, and MemVerge. Oracle's database, the latest versions, the one that's supposed to come out this year, will also directly access it, even if it's not in Exadata.

10:13 MS: So let's look at their implementation in Exadata which is incredibly useful. All of the persistent memory exists in the storage servers, not the database servers. For those of you who aren't familiar with Exadata, you have two types of severs within the rack. One is in the database server where the Oracle database runs, and the other is the storage server. And the storage, you can have multiples of both depending on your configuration. But they're putting all of the persistent memory, all the PMEM, inside the storage servers, none of it is in the database servers. And they're accessing that PMEM from all the database servers, through all the storage servers via RoCE, RDMA Over Converged Ethernet and they're getting incredible performance. In fact, they're getting performance, read performance, less than 19 microseconds, which is pretty amazing. And what they do is they triple mirror it between the storage servers, so unless, and in case you lose a storage server you still have access to all of the data. Very clever. They can have up to 18 racks of this stuff, up to 27 terabytes in a rack, and that's just generally, they can actually go beyond 18 racks.

11:22 MS: So in 18 racks, you get up, nearly half a petabyte. Half a petabyte of persistent memory, which can be very big on performance. Now, these others, this hybrid, memory access hybrid from Formulus Black and MemVerge is also really interesting. It's software-defined memory. Now, they sit between the application and the persistent memory, but unlike memory mode, they're using the persistent memory as persistent memory, it's non-volatile. But it looks and feels like the applications, like all one memory pool. They virtualized it and turned it into a pool. So you don't have to make any changes to the application, it's backwards compatible. That means no writing on the application part, no scripts, no change of the code. Just run the software, run the application and go. And because they do it this way, they can do things like snapshots and backups from the persistent memory. Wow, that's cool, 'cause if you get a corrupted memory for whatever reason or corrupted data, you can just roll back to any snapshot, in time on the persistent memory. That's good.

12:38 MS: Just like if you have a power fail, instantly loaded from the PMEM using their snapshots. Very clever. Here's a chart that my good friend Jim Handy, the objective analysis semi-conductor market research guy, also known as Mr. Chip, he knows his stuff. He put this together, you can get in on his website or if you get these slides, obviously, you'll get it then. Very useful to point out the differences between the different access methods. The thing to keep in mind is the fastest method, it's going to be memory access.

13:15 MS: So let's talk about the part that's near and dear to all our hearts, which is key tips. Keys and tips for success. There are five of them. First, you have to remember this is more than storage or memory replacement and that programming has specific requirements and libraries can save you a ton of time and money, and there are many use cases that require key-value stores, and there's an embedded key-value store, and that leverages PMEM aware Java libraries. Let's go through each of these for a moment. It's more than a storage or memory replacement. Yeah, I know, it's flexible and, it's fast storage and gee, all you have to do is support. It has support for file block I/O storage APIs. You're not going to get the most value out of it. In fact, you're only barely scratching the surface in value.

14:11 MS: You need to think of this as a new tier between storage and memory for data that may be latency sensitive. Volatile data that you don't want it to be volatile, like journals, write-ahead logs, redo logs, these are all kind of key things like in databases that you can think through and take huge advantage of with PMEM. Programming has, as I said, system requirements. Now, you have to remember, applications have some responsibilities now. You have to know whether the platform supports cache flow. Well, that's a given, but more importantly, you have to know when you don't need to implement it.

15:00 MS: So it may automatically do a cache flow share after a power fail, but let's say you're using MemVerge or Formulus Black, do you really want to do that, or do you want to recover from the snapshot? See, there is a real difference here. Libraries save time and money. They do. The PMDK libraries were designed by Intel and they've done a lot of them . . . Really good job on this, to simplify a lot of the complexities in programming. For like power fail resistance and the easy-use API, there's a lot of good stuff in their APIs. And the value of this is it reduces your time to market, and time to market is key, time to market means you capture revenue, you never would have captured if you get there faster, you get market share that you never would have had. You may be first to market, in a given space.

15:49 MS: First to the market means you're less likely to have to fend off other competitors, initially. That's big. Then there are many use cases for the key-value store, right? The PMEMKV, PMEM optimized key value data store. It's embedded. That's right, it's embedded. You don't need to reach storage full blocks with this, because it accesses the keys and values directly from the PMEM. There's no allocation of volume memory buffers and you don't have to modify the data in place without read, modify, write operations. It reduces something that's key to everything which is write amplification. Yes, I know it's got 60 times the endurance than NAND. However, it still write amplification reduces your endurance and when you reduce write amplification, you increase endurance. And there are many PMEMKV storage engine options and you can do sorted and unsorted, concurrent engine implementations, you got hash maps and red-black and B plus and radix trees. It's easily extendable and you can add new engines easily to suit an application needs.

17:03 MS: This stuff has been already pre-done for you, in fact, speaking of pre-done, you can leverage PMEM aware Java libraries. Java developers can use PMEM without code changes. For example, Java virtual machines can allocate Java object and heap on persistent memory, no code changes. And over time, more and more PMEM constructs are being added to the Java language. So if you're Programming in Java you already got a head start. Now, let's talk about the gotchas. The stuff you learn about after you get nailed. For databases, on schedule power fails. This can be crucial if you're using, let's say memory access mode. Let's say you decided you're going to write for MySQL or IDB, enterprise db, PostgreSQL databases, a variety of different relational databases. Well, here's something that happens in a power fail, the database can end up with a torn write. And a torn write equals a corrupted database. Oops, which you have to then do a restore, a recovery and a restore. You don't necessarily want to do that, you want to be able to detect it and just recover whatever was torn.

18:25 MS: So that means you have to have code in the database to do that. Now I can tell you, SAP and Oracle did that in their databases, they also did it . . . Oracle also did it in Exadata and they only did on their latest databases, so if you have older stuff this doesn't quite work. But you need to be aware of it 'cause that's a gotcha. Well, that's most of the information that I was going to cover today, if you want more, we can talk about this offline, but I do want to finish with another story. One day, a sailor from the navy and a pirate meet up in a bar, in a pub. And they're swigging back their ales and their beers and their chasers, and they're having a good time chatting and the sailor's getting a little tipsy and he's curious and he says to pirate, he says, "Alright, you gotta tell me, how did you lose your leg?" He says, "Ooh, do I have a story to tell you? We're in a typhoon, a category five typhoon and the boat was going up and down and up and down, then this huge wave swept across the deck and knocked me overboard, just as my mates were pulling me back onto the ship a shark jumped up and snatched off my leg." "Oh my goodness, that must have hurt."

19:38 MS: He says, "You have no idea." "Then how did you lose your hand?" "Oh do I have a story to tell you, we were fighting my worst enemy, going sword to sword, back and forth, and just as I was about to run their captain through, his first mate chopped off my hand." He says, "That must have hurt." He says, "You have no idea." "How'd you lose the eye?" He mumbles a little bit, doesn't say anything. "No, no really tell me, how did you lose your eye?" He says, "Well, a bird crapped in my eye." "A bird crapped in your eye? How did you lose your eye over that?" Well, he said, "It was the first day with the hook." Be careful with your new tools. And with that, I thank you for attending this session and please enjoy the rest of Flash Memory Summit 2020.

Dig Deeper on Flash memory and storage

Disaster Recovery
Data Backup
Data Center
Sustainability and ESG