Top Ten Things You Need to Know About Big Memory Management Today 3D XPoint in 2025 and How We Got There
Guest Post

Accelerating Performance With NVMe-oF

Can you accelerate your performance with NVMe-oF? George Crump, former storage industry analyst, and CMO at StorOne, explores the pros and cons of NVMe-oF to help you determine if your data center is ready for it.

Download this presentation: Accelerating Performance With NVMe-oF

00:04 George Crump: Hello, and welcome. I'm George Crump, chief marketing officer at StorOne. Thanks for joining us for the Flash Memory Summit virtual.

As part of the NVMe over Fabric track, I'm going to be talking about accelerating performance with NVMe over Fabric -- specifically, how does this look in the enterprise? So, let's get started.

First, a little background about me. Again, I'm George Crump, I'm the chief marketing officer at StorOne. I also am responsible for a lot of product strategy and those sort of decisions. Prior to joining StorOne, I was a storage analyst, in fact, you may remember me from those days. I did that for 14 years. I was founder and lead analyst at a company called Storage Switzerland where we did a lot of design testing and reporting on various storage systems of all different types. Prior to that, I was a CTO in a large storage integrator, and again, responsible for testing and validating different types of storage systems that are on the market today. So, before we get too far in, let's just talk about some basic NVMe over Fabric realities.

01:15 GC: First of all, there's really two points to NVMe over Fabric that are what I would call server connectivity. So, that's how do you attach the server and get it to storage? There's a variety of different ways to do that today. Clearly, there is Fibre, TCP, which is a new technology, and then RoCE or RDMA. As I said, Fibre and RoCE/RDMA are probably the two that most people are associating with NVMe over Fabric today. It's a very low latent, high-performance way to connect and that's really the key advantage there.

The downside is is you do need to change elements within your network to be able to get that to work. Now, in some cases, some of those changes have already happened for you as you've been upgrading infrastructure and things like that. In other cases, you may have to make some specific infrastructure changes, and what I've learned through the years is that infrastructure in particular tends to change at a much, much slower pace than storage does. So, you really need to look into that.

The advantage of TCP and why we think it will be sort of the wave of the future, if you will, is that it runs on existing TCP infrastructure, so no need to change virtually anything in the environment.

02:43 GC: Now, a potential downside to that is you may not get as low latency as some of the other protocols or methods, I guess I should say that we had mentioned here earlier. But we do think TCP will end up with the lion share of the NVMe over Fabric market, and that's a trend that we think is just now getting started. If you stick with the other protocols, again, part of the problem is it does require a new adapter, and it requires a new or updated switch. And again, the exception is TCP, but it adds latency.

03:30 GC: Now, what we've seen in most of our testing is that the latency that TCP adds is kind of more theoretical. There's so much other latency in the environment just at the application layer and all the things that have to happen. I don't know how many customers, especially enterprise customers, are really going to notice a difference. So, that's an important thing to keep in mind as we start down this path. Now, the other end of the equation is storage connectivity, and to be honest with you, storage connectivity is a little disappointing at this point in time.

04:07 GC: There's clearly going to be Fibre Channel and RoCE or RDMA connectivity, but most of the . . . let's call them expansion shelves and things like that that are on the market today are proprietary to the vendor developing the solution. And the challenge with that is we're moving more and more toward an open environment software to find, things like that. Well, those need commodity hardware, and we have seen a very limited supply of commodity hardware in the NVMe space.

At StorOne, for example, we're actively testing several systems, and we have some that are coming to market and are available, but it's not the unlimited number of choices almost that we would see say in traditional SaaS connectivity. So, that's something to really keep in mind is how accessible are other components. Clearly, if you're going to buy a closed solution from a single vendor, that's going to limit your flexibility as we go through the future. And, so, it's something to keep in mind as you start to build out and consider your NVMe over Fabric strategy. And what we're basically seeing, again, as I said, we're seeing a lot of proprietary designs or we're seeing a lot of essentially a scale-out or a server/storage cluster approach to the solution where you're running software on each individual server and then you're leveraging the built-in NVMe that's in the servers, and by the way, which commonly available in those servers, and so we see a lot of that occurring.

05:48 GC: Again, the challenge is that of all the environments that probably doesn't need much scale out, it's NVMe over Fabric because everything else is so efficient. Typically, you scale out to get around other issues like an inability to scale and scale with performance. Well, NVMe kind of fixes that. So, I think this is a relatively short-term problem. I think it's something we'll see addressed over the next three to six months, and then just get increasingly better as time goes on, but it is something to think about as we move forward.

06:27 GC: So, media connectivity obviously requires NVMe, media and NVMe slots to put that media in. The challenge here is -- I know we're at the Flash Memory Summit -- but there's no hard drives on the market today that support NVMe. There's one vendor that has a system that it is hard drives with proprietary NVMe interfaces in it, but you have to buy the system completely full and it can only be hard drive, so some challenges there. And we do think that, especially with the increasing density of hard disk drives, that they will continue to be a thing that enterprises want to take advantage of. We think that hybrid and multi-tier technology is getting better and more intelligent, and so that you'll be able to move data between these different tiers without the impact, the significance of impact that we used to see in the past. So, anyway, so we think the support of both media types is important.

07:27 GC: Again, this leads to storage silos and storage clusters, and the real important thing, and we have not seen much of this at all, is that storage systems and storage software really need to support both NVMe and the more traditional connectivity options, most notably SAS, so that we can support hard drives and legacy environments. And we really think it's the critical for storage software, storage systems to be able to provide that bridge to NVMe but support existing protocols and technologies.

And by the way, we see this drag on adoption in many places, it's not just NVMe, SAS technology, for example, 24G SAS has been around for or available in theory for two to three years now, and there's almost no off-the-shelf 24G SAS expansion units. So, something to keep in mind as you kind of move forward.

08:28 GC: The other thing that's important is when I talk about this bridge concept is you've got to be able to go back and forth between the two technologies, right? Because you're not . . . A bridge sometimes implies I just have to get to the other side and then I can tear the bridge down. The bridge has to stay up. And in a perfect world, what you really want is a storage system that can support a NVMe fabric in one end out to different disk devices, and then the other end still be able to support maybe legacy SAS connectivity for hard disk drives and things like that. So, that becomes very critical as well. So ,the big question, I think is, is it worth it, right?

09:21 GC: Does NVMe over Fabric pay off? And, again, I'm very focused on the enterprise. I think in environments that would not be traditional enterprise -- so AI, machine learning, analytics -- the case for NVMe over Fabric is easier, much easier to make. I think it gets harder to make in the more traditional enterprise. The challenge is the more traditional enterprise is also starting to encroach on those traditional HPC workloads, and we see a huge upswing and interest around commercial HPC-type of environments. So, the key thing here is the advantage of NVMe over Fabric is it provide shared storage that acts like locally attached storage. It has low latency, it doesn't have the network overhead, things like that.

10:14 GC: It's also can be a composable infrastructure, so you could, in theory, the picture that I drew before where we had a storage system here, I call it a controller, going to a fabric of NVMe, and then the concept of being able to into that back-end NVMe switch connect a variety of different storage devices is very fascinating, and I think it's something that enterprises will absolutely want to take advantage of. But that composability as an open standard anyways, isn't quite there yet, right? And I think we're close. We're, like I said, six months away maybe from being readily available.

So, I think that again, the caution here is to go slowly, and again, make sure that you can also connect to SAS and other Fibre channel-type of connectivity out of the back end. And then the other option is sort of a scalable storage repository that's separate from servers so that . . . As you can see what I drew right here, as this is different, this is not part of the hyper-converged type of idea, so I can scale storage and still independently, but in much the same way as I do the traditional compute infrastructure.

11:39 GC: So, it gives us a lot of value in those areas. So, the big question is, do you need it? So, let's look at the two sides of the equation that I was talking about. So, the first is, do you need it for servers? I think eventually, yes, right? The connectivity, the low latency, the ability to access storage as if it was right inside that server. I think there's clearly enough use cases and evidence of use cases where this makes sense today. I think the challenge that we're going to see here is how quickly does that curve become practical, and so that's the today question mark part of this puzzle.

I would say that there are probably, again, some workloads -- AI, ML, analytics, things like that -- that probably could justify it today. More traditional enterprises, I think not. The good news is both sides of that fabric that I was talking about, both the compute connectivity and eventually the storage connectivity, should be able to support a mixing of the different protocols. And, frankly, they must support that, and that's much better to find frankly, on the server side today.

12:54 GC: If we look at . . . The other thing to kind of consider in this though is current networking technologies. They're all increasing in bandwidth; their latencies are already excellent, but they're not as good as NVMe over Fabric. So, there's not the pressure, I think that there used to be to address and get after these latency requirements, except for again, in those machine learning, AI-type of workloads. So, make sure you're looking at current technology. The other thing is current technology is always less expensive than the next generation, so make sure you're factoring that into that as well, and of course, I've addressed the use cases thing there. So, I think in servers, you might move to that first just to get the connectivity down to storage, but again, I would proceed slowly and adopt it as it makes sense for your environment.

13:54 GC: So, the bigger question, at least for me because I'm a storage guy is, do you need it for storage? Well, I think it's really the only way to scale back-end NVMe storage infrastructure without having to go to a scale-out server or storage node-type of infrastructure. And again, the problem with NVMe and scale-out storage is you are going to -- quicker than ever -- end up with excess resources, and that's going to really be a challenge, I think.

We'll see much smaller scale-out environments because we just don't need it especially if the storage software is efficient and takes advantage of all the capabilities of the products that surround it. And again, I can't emphasize enough that we still need a bridge to SAS both to support the current infrastructure that's in place today, as well as the fact that I doubt that we'll see a lot of NVMe hard disk drives. So, the ability to support both, I think becomes very, very important as we move forward in this journey.

15:03 GC: So, what's the plan? I think my advice would be -- putting my former storage analyst hat back on -- is don't be in a race to get the NVMe over Fabric. Make sure your storage vendor, your networking vendor, your network interface card vendors, make sure they have a plan for NVMe. Make sure you're buying, especially on the infrastructure side, make sure you're buying NVMe product today, make sure your storage vendor has or will have the capability to intermix both SAS and NVMe in the same storage controller -- you don't want to have to have a separate island just to support legacy storage. And, by the way, I've been saying disk drives a lot; don't forget that in that group is object storage. So, even if you're going to object storage, those will mostly be SAS-based, we believe.

15:58 GC: So, the ability to support both and support that interconnectivity is going to be really, really critical and something to look at. So, understand those as options, make sure that you have a bridge back to SAS technology, also remember that SAS technology, as I had mentioned earlier, isn't standing still There will be 24G units available, and so the performance of SAS may be more than enough for many data centers. So again, NVMe over Fabric, it's going to be very beneficial for the environment, it's going to allow for great scale where we need low latency. One of those areas is going to be NVMe storage, so it's going to do a lot for scale-up and small scale-out infrastructures to be able to deliver a very, very high performance without really much penalty.

I don't know if this moment, right now, is necessarily the time to jump on NVMe over Fabric, I think it takes some patience. If you have the workloads that can completely justify it. Again, that group that I mentioned before -- AI, ML, analytics, those sort of things -- absolutely. But for the more traditional workloads, I would go slow just make sure that your current storage system either can or has a plan to be able to intermix both NVMe over Fabric and SAS in the same controller infrastructure.

17:29 GC: So, with that, here's my contact information. Feel free to reach out to me if you have any questions, would like some more detail on anything, that's my actual email address. So, feel free to reach out to me there. For now, though, I hope you enjoy the rest of Flash Memory Summit, and I look forward to seeing you virtually around the conference.

Dig Deeper on Flash memory and storage

Disaster Recovery
Data Backup
Data Center