Download this presentation: Keys to Making Computational Storage Work in Your Applications
00:02 Stephen Bates: Hello, everyone. My name is Stephen Bates. I'm the chief technology officer of a company called Eideticom, and I'd like to welcome you to Flash Memory Summit 2020 and the computational storage track. My talk today is around real-world deployments of computational storage, and although I'm the CTO of Eideticom, this talk will be a little more vendor neutral than that, and we'll be talking about real deployments from a number of different companies that are working in the computational storage space.
01:01 SB: So, the outline of my talk is as follows: I'm going to start with a short state of the nation just to outline where computational storage is today from a standardizations and a deployment point of view, and then what I want to do is to jump into four real-world deployments, and we'll try to spend roughly the equal amount of time on each of these. These are from a range of thought leadership companies in the computational storage space and include NGD Systems, Samsung, ScaleFlux and my own company, Eideticom. So, hopefully, you'll enjoy the next 20 or so minutes.
01:39 SB: So, where are we in terms of state of the nation? And I think this year, 2020, has been a very exciting year for computational storage, and one of the reasons why is that we're seeing computational storage standardization really starting to catch on, and that's leading to an increased interest in deploying computational storage products and computational storage solutions in the real world, which is what we're going to illustrate in this talk.
02:10 SB: There's two main bodies that are working on the standardization of computational storage. And if you're not involved, I would highly recommend you take the time to get involved in one or both of these standardization activities because this is a very important driving force for the deployment and success of computational solution.
So, the first body that's working on standardization has been doing so for a number of years, and it's SNIA. They have a computational storage technical working group. There's about 45 member companies in this technical working group now, and the last time I checked, there was over 200 members, so individuals who are involved in the standardization process and [02:57] ____ needs. The SNIA effort is focused mostly around high-level architecture and on the user space libraries which will be deployed or involved for computational storage, and they've been less focused on specific protocol implementations of computational storage.
03:17 SB: But what I'm very excited about, is in the last few months, one of the standards in the storage space, very successful standard, I'm sure most of you, if not all of you, have heard of this, NVM Express. So NVM Express has been around for about 10 years, it is now an incredibly successful way of communicating with flash-based storage devices. A number of people have been deploying large volumes of NVMe products for a number of years, and the NVMe market recently was projected to be growing at about 40% CAGR and leading to very, very large markets over the next four or five years. So, it's a very promising, very exciting standard, and what I'm really pleased to announce has happened, or pleased to talk about, is that the NVMe Technical Working Group, so the technical component of NVMe, has agreed to standardize NVMe computational storage. So, that is taking some of the work from SNIA, bringing a new NVMe slant to it, and actually updating the NVMe standard to comprehend computation, as well as storage.
04:30 SB: So, it's a very exciting development. This is a real protocol, a real standard and we're excited to see where this goes. Right now, that process is ongoing; I recommend you get involved. We currently stand at about 25 member companies and 78 individuals who are involved in the NVMe computational storage standardization, like I said, that will standardize NVMe commands for computational storage. So, a very exciting time for computational storage, and I think that's driving a lot of these real-world deployments that we're going to dig into over the next 15 or so minutes.
05:07 SB: So, the first demo I'd like to talk about is from one of my fellow computational storage startup companies, this is NGD Systems, and this is a demo that they've done with VMware at VMworld, so a very, very . . . More than a demo, this is a real-world deployment of their computational storage solution, and so I'm going to talk about that over the next couple of slides.
05:33 SB: Now, what NGD Systems are offering on the market is something called a computational storage drive, so this is a device that combines the amazing storage capabilities of an NVMe solid-state drive with computational resources on top. And NGD Systems have an excellent solution where they basically run a mini server within the solid-state drive which allows you to push a lot of your application down to these smart SSDs, and they are seeing quite a lot of success in different markets around how this can lead to much more efficient systems, much more volume-condensed systems. So, if we think about how much rack space do we have to take out to do a certain amount of work, companies like NGD Systems are showing that computation storage can lead to much more condensed deployments, and that's really important at places like the edge where the rack space can be incredibly expensive.
06:35 SB: So, traditionally, segments for these kinds of applications that are database applications that we've done at the server level. Now, what VMware can do with Dell and with NGD Systems is to push this down towards the drive, and they've shown some really good results around databases, while still providing the kind of data resiliency and fault loss protection that's really, really important for these kinds of applications. So, there is an online demo there that you can see from VMworld 2020, and what they were doing is they were integrating the NGD Systems computational storage drives into the VMware product vSphere. So, this is very important; the software enablement of computational storage is a key part of success. So, the more integration that computational storage products can do with operating systems and operating system stacks like vSphere, the better.
07:36 SB: And what we have here on the left-hand side is Ubuntu running with the NGD Systems' computational storage software stack, allowing both the host and the device to share data in a way where they can both access this data through file systems, either standard file systems like XFS or things like shared disk partitions. And what this allows is for the both the host of the drive to access these datas, sorry, this data in a safe and . . . manner. TCP connection allows the boxes to actually connect to each other, so not only do we have device connection within the server, but we would also have TCP connection so that we can connect to other devices that are in other servers across the network. And this leads to much, much improved performance, and a much more cost-effective solution, and like I said, integration into stacks like vSphere. So, a really interesting deployment of what I would call a computational storage drive, and quite an advanced computational storage drive. This is rather more complicated than some of the other computational storage devices we might see in the latter part of the talk.
09:03 SB: Next, I want to talk a little about a very interesting product that's being deployed by Samsung, who are one of the most foremost SSD companies in the world, a very successful company, a really big participant in NVM Express, a really big participant in the world of non-volatile memory, and very successful in terms of shipping large volumes of solid-state drives. So, what Samsung have done is deployed a product for the computational storage space called the SmartSSD.
And the SmartSSD is a very interesting concept. It is another computational storage drive, so in some ways similar to what NGD Systems are doing, but a little different in the sense that the computation in the SmartSSD comes in the form of a programmable FPGA from Xilinx, whereas in NGD Systems there was a Linux-capable Arm complex that was there for the computation. So, in the SmartSSD case Xilinx and Samsung have partnered together to bring the best of both worlds. Samsung are bringing their best-in-class NAND and their best-in-class the NVMe SSD controller, and Xilinx are bringing their expertise in reconfigurable logic in the form of a Xilinx FPGA.
10:25 SB: Now, what that means is you're combining the compute and the storage together in a single device. And you can see on the right-hand side that this device is a storage-centric form factor type of device, in this case, a U.2 NVMe-compliant SSD. This allows some interesting things to happen, because now the compute is in the same box as the SSD, is in the same U.2 form factor, they can share that internal bandwidth without necessarily impacting on the host. So, you get an awful lot of data movement capability inside the device; each device has a large amount of offload capability, accelerator capability in the form of that Xilinx FPGA, and you have a very large amount of NAND storage at 4 terabytes, and you also have FPGA DRAM which you can use for computation.
11:22 SB: And I think one of the interesting things about the SmartSSD is what's given in this center column on this slide. So, with the SmartSSD and clever software, you can actually scale your performance in a linear fashion as you add more SmartSSDs, and that's the green line in this diagram in the middle of this slide. Typically, when you have some kind of processor-based performance gain -- so something based on the software running on the CPU -- you end up hitting some bottleneck and adding more drives to the system doesn't help you, and that's the red line on this slide.
So, what Samsung and Samsung's customers are seeing is the ability to scale performance by adding more and more SmartSSDs into their systems, and then using clever software to tie that into the applications and seeing a large performance benefit. Some of the applications that the SmartSSD is particularly successful in . . . and things like database filtering, data format conversion, video processing, and analytics and so on. And this is something that's being deployed on the edge and in the cloud.
12:44 SB: Companies are working . . . Sorry, ISVs are working with Samsung and Xilinx to put their own software onto that FPGA and provide incremental or additional solutions in the space. And one of those companies is my company, Eideticom. We've actually work with some Samsung and Xilinx to enable our software stack on that Xilinx FPGA in the SmartSSD. And what that allows is a standards-based solution that's based on this NVM Express standardization effort that's used to access both the storage and the compute, and that makes our solutions much more consumable because they leverage a standard vendor-neutral inbox driver that's available in all modern operating systems.
13:34 SB: The Samsung SmartSSD is very interesting because, like we said, it combines the best-in-class SSD controller and NAND from Samsung, giving you a great storage solution with the excellent reconfigurable silicon from Xilinx, which gives you a great computation solution. And then we used frameworks like peer-to-peer communication to allow data to move directly between the SSD and the FPGA without necessarily having to go up to the host. So, what we can do is map the Xilinx device into the address space of the local processor and use clever software, which is being upstreamed again into major operating systems like Linux in order to enable the connection of the computation and the storage. And this is leading to some very, very interesting solutions, and quite a large number of customers are either evaluating the SmartSSD or deploying the SmartSSD, with a range of different software solutions on top of that.
14:44 SB: Next, I'd like to talk about another startup in this space, which is ScaleFlux. And ScaleFlux have been doing quite a bit of work around database acceleration using both computational storage processors and computational storage drives. And this work is based on a great paper that they presented at FAST, which is a Linux Storage and Filesystems conference earlier in 2020, and I highly recommend you look up that paper which is linked on the next side. If you're looking for more details, then you'll get a lot more detail in that particular slide.
But what we're going to cover here at a high level are some results that ScaleFlux and Alibaba were achieving on a computational storage drive from ScaleFlux. The database in question was PolarDB, which is a scale-out database, and what ScaleFlux are showing, again really, is a similar benefit to what we saw for the SmartSSD -- moving computation into the drive leverages scalability and performance. So, the ability to scale performance by adding more drives tends not to work when you use some kind of centralized accelerator.
16:04 SB: So, on the right-hand side . . . or, sorry, the left-hand side of this particular slide, you can see we have a bunch of solid-state drives in the diagram, and then we have one orange box, this is our accelerator. And what happens is that that becomes a bit of a bottleneck, a bit of a hot spot, and this performance doesn't scale as well as we'd like. And what ScaleFlux have done is move some of the database computation, in this case the table scan, into the SSD itself in the form of computational storage drive, so in some ways, very similar to the SmartSSD.
And what happens now is that the system is more distributed and there's less bottlenecks, and so the performance can scale as we add more devices. And what ScaleFlux and Alibaba did was they measured a bunch of benchmark data based on TPC, which is a very well-known database test environment, and they saw massive reductions in PCIe traffic volume, and they saw a huge reductions in network traffic volume when this database was scaled out across many, many servers, as large cloud companies like Alibaba, tend to distribute their or deploy their databases. So, another real-world example of where computational storage can provide a massive benefit to the end customer.
17:28 SB: And the last real-world example, I'll talk about my own company, Eideticom. So, Eideticom is building computational storage processors and computational storage projects that are NVMe-compliant. And, so, we're very aligned to this NVMe standards effort that I talked about earlier in the talk. We provide both software that runs on the devices -- whether they're computation storage processors, whether they're FPGAs, whether they're SoCs, whether they are some other type of device -- we have software that runs on the device and we have software that runs on the host to connect those devices and their accelerators to the applications that want to use them. And we can provide a range of services, like storage-centric services, like compression and encryption. And we can also provide more analytic-type services such as the data analytics, regular expression, video codecs, et cetera.
18:29 SB: We provide a couple of different software solutions, and what I want to do today is talk about the second of those. So, we do have a user-space library, which is a very classic way of connecting an accelerator to an application, but one of the ones that we developed recently is actually built into the kernel, it's a stacked file system called NoLoad Filesystem. And what it does is it sits between the virtual file system in Linux and the customer's file system, which could be ext4 or xfs or something else, and it connects to our accelerators in whatever form they happen to take. Really, what we like about this is the application no longer has to change; it can run exactly as it is. And the customers like it because they can continue to use their preferred file system. And convincing a customer to use a vendor-specific file system is a very hard thing because they don't want to lose their data.
19:24 SB: So, what we can now do is, customers can deploy our product underneath their favorite applications, whether it's a database of MariaDB or perhaps, RocksDB, or whether it's a big data analytics stack like Hadoop, or HDFS, and they don't have to make any changes to their application. The NoLoad Filesystem can perform the computational offload by intercepting standard system calls like reads and writes, and we can leverage the underlying hardware to provide some kind of offload.
So, just to give you an example, we've done some testing using an application called FIO, which is a great benchmarking tool. We have NoLoad Filesystem sitting on top of, in this case, ext4, we have an NVMe driver which is the standard inbox driver, and we have a range of different hardware. In this case, you have the NoLoad, our computational storage processor, on an Alveo U50 card from Xilinx, and we have a bunch of NVMe SSDs, in this case, from Samsung.
20:28 SB: And if we look at offloading something like compression, what we can do is, with that single Alveo U50 card we can either achieve compression at the rate of about 11 gigabytes per second, or we can do decompression at an even faster rate of about 12.5 gigabytes per second or we can do simultaneous, over 7 gigabytes in each direction. And let me just say that if you wanted to achieve that much compression and decompression using an Intel processor, for example, you would need a lot of Intel CPUs in your server in order to do that, so our solution is very, very efficient. And like I said, it doesn't require any application changes, which is very, very important from a consumability point of view.
21:15 SB: So, I want to wrap up here in the last minute or so. Computational storage product deployment is happening; all of the things I talked about today are real-world deployments of that. Standardization is incredibly important, both at the . . . senior level and at the NVM Express level. Standardization will increase adoption. Customers, end users, will become more comfortable, they will see vendor-neutral interfaces, so they'll be like, "I don't have to buy a product from a particular vendor. I can multi-source. I can be safe in the knowledge that I have multiple vendors that are servicing my needs." Standardization will also enable an open source software ecosystem, something that I feel is incredibly important for the success of computational storage, is vendor-neutral, open source software that customers can use to connect their accelerators to their applications.
22:14 SB: And another thing I think is going to be huge is the standardization of NVMe Express computational storage. The fact that we can leverage is incredibly a successful protocol for computation, will be a huge market-enabler. Just to give you some idea of how big NVMe is, like I said, we expect the NVMe market to grow at about 40% CAGR between 2020 and 2025, and computational storage will be a huge part of that. So, I would highly recommend you get involved with that process as it develops.
So, thanks a lot for your time today. I'm hoping there'll be some questions. I'm very excited about this space as you can probably tell, and I look forward to working with you as we move forward. So, thank you very much, and take care.