Using Computational Storage to Handle Big Data
Computational storage examples already in use illustrate ways to overcome bandwidth and data processing issues. Find out about other promising directions to explore for the future as well.
Download this presentation: Using Computational Storage to Handle Big Data
00:01 Andy Walls: Hi, and welcome to this session on computational storage, specifically using computational storage to handle big data. My name is Andy Walls. I'm an IBM fellow. I'm the CTO and chief architect for IBM flash systems. I want to thank you again for joining. These are very difficult, strange and extraordinary times. So thank you for taking the time to be a part of Flash Memory Summit this year, and specifically my discussion here of computational storage.
So what is the situation today? When we talk about data lakes and analyzing all the data that is in that huge repository, that thing that we call a data lake, really a data ocean, if you will. There is so much data that is collected today, and if you think about an individual organization, they are likely to have data that comes from sensors, and maybe data that comes from cameras. It could be security. It could be cameras that are recording or filming things like this session right here. Of course, social media makes up so much of the data for many organizations. There are text files, there is multimedia, there is structured database data, there's financial data, there's maps, there are employee records, there is just all kinds of data.
01:32 AW: Now, the important thing about that is that for a particular organization, that data is critical. It really is part of what differentiates it, and if used properly can help it to get a competitive edge over the competition.
Now, to use it properly means it has to be analyzed. It has to be prepared. It has to be sorted. It has to be collected. And models built from that data and then value derived. Oftentimes, this is machine learning, deep learning, sometimes just simple analytics to contrast and compare and find patterns. To do that today is kind of the problem. The data is sitting in storage. That could be just SSDs inside a server, or it could be a hyper-converged infrastructure, or it could be in an external storage system like IBM produces.
This article is part of
Flash Memory Summit 2020 Sessions From Day Three
Wherever that data is, it has to be brought over a network, brought over interfaces into the CPU. And therein lies some of the difficulty because the CPUs . . . Well, GPUs as well, are having to process all that data. It's having to find out which data is important. It has to read it all in through memory buses that are not growing that fast. Has to read them in over PCI Express interfaces, which are really not getting that much faster. And of course, the CPUs themselves have limitations because of Moore's law ending.
03:18 AW: And so, the path to get from this huge ocean to a fairly small CPU is almost like a garden hose that you see in the picture. You want to bring all that data through a garden hose into relatively small amounts of compute. Now it's always better, if possible, to process data closer to where it resides.
Now, there's several advantages for that, some of which are not intuitive. Inside an SSD, for example, there is a pipe that goes to that SSD, NVMe or SaaS, and inside there are a lot of flash lanes. There tends to be, normally, more bandwidth available inside than what is taken advantage of on an average basis by the compute. So, there's spare bandwidth that could be used to do additional processing or searching or statistics on, alright?
In addition to that, there's the internal fabric of the all-flash array itself tends to be faster than the external fabric to the server. So, there may be spare bandwidth that can be used by the overall application by the data center to derive more value, to help offload the CPUs. And so when you think about all the analytics we want to do, all the AI we want to do, and as much data as possible to look at, even a 100 Gigabit Ethernet, even DDR5, even Gen 5 PCI Express, pose limits on how much data can be processed. That's the problem today.
05:14 AW: Now, the beauty of the solution is that offloads are really not new. I mean, we've had offload really since almost the beginning of compute and especially in the storage industry and with all-flash arrays in particular, there is already a lot of offload. Just think about data reduction, all-flash arrays today are ubiquitous, and almost all of them do data reduction.
There is the concept of data compression and data deduplication, looking for duplicates of data and not storing the duplicates, but rather pointers to those. In IBM, we have the FlashCore Module, which is a two-and-a-half inch NVMe SSD that has compression built-in, and I'll talk about that a little bit more in a moment. So, the data reduction there, reducing how much data is actually stored, is an important offload that could be done, and sometimes is done by applications. But here, instead of taking precious CPU resources to do it on a server, you offload it to the array.
06:28 AW: Other offloads today include replication and hyper-swap. These are things that databases can do, VMware can do, Linux can do, IBM's AIX could do, but it does take CPU resources. So, all-flash arrays have a rich set of functionality and storage services which include replication, disaster recovery, hyper-swap between two units in order to offload the CPU.
The other thing is storage virtualization. Obviously, there's a rich set of virtualization available on servers today, but storage virtualization is something that IBM pioneered a couple of decades ago. That allows the host to be able to access multiple storage devices through this virtualization layer, offloading, if you will, from the CPU. Of course, RAID is something that's been around for quite a while, and that is a redundancy technique to improve fault tolerance, a little more efficient than if you have two copies or three copies. And that RAID is a function that all-flash arrays do today, and therefore that's offloading the processor.
07:48 AW: So, we already have a rich assortment of offloads that are available today. Now, what we're talking about with computational storage is taking offload and accelerators really to a new level. It's being able to use it in ways that go beyond just storage services. So, what is the motivation really for extending offload, for having more acceleration than what we have today? It really gets back to what we said at the beginning. The data lake is just simply enormous and it's growing daily.
08:31 AW: I believe the statistic is still true that in the last two years, we've created more data than was previously around. I'm sure those of you with kids can attest to the fact that your kids are providing a lot of the water in that lake. The social media texts and all of the pictures and the videos comprise a lot of that lake, but it's not just that. It's a business who has cameras that are looking for security. There are cameras that are going in manufacturing lines to look for defects. And there are lots of different types of data, a lot of it unstructured, that is going into that lake. And the network speeds really cannot grow fast enough. And the server compute, as we said before, has some limitations.
Therefore, what we want to do is to take some of that compute and to sprinkle it in the lake itself, so that the lake now starts to do more of that analysis and more of that pre-processing, more of that. More of the analytics, or at least some of it, to offload the compute, allowing it to do more complex analytics and come to more complex decisions by being able to sort through and look at even more data than it does today. So that is really the motivation.
10:14 AW: I think the best way of understanding the potential here is to look at an example of a computational storage device. Now we, at IBM, have been doing the IBM FlashCore Module for a couple of years. And the IBM FCM is a good example of a computational storage platform. It is not sold as a CSD, it is used inside a storage system.
Now, in order to keep the price down of the overall system, what we decided to do was instead of just getting larger and larger Intel CPU servers, we would offload some of that work on to the SSDs themselves. And so, inside the FlashCore Module are several different assists. The one that is the most differentiating is the compressor that we have. This is a modified dynamic Huffman compressor that does a very good job on compressing the data, and it's done in-line. It's not like the storage system has two different PCI Express functions here and you compress the data and then you move it. It's done in-line. As the data is written, the data is compressed and put into the right cache of the SSD and then later stored onto flash. All of that is done without the storage stack knowing or having any involvement in the data path.
11:55 AW: Obviously, it has to do some work out-of-band on out-of-space conditions, but the data is flowing directly through the compressor, and that's an important model when we look at computational storage. Things that can be done in-line as the data moves reduces how much the server has to do, reduces how much the application really has to know. There's always some APIs and things that have to be involved, but the more in-line it is, I believe the more effective the computation could be, at least for data manipulation. And so that's what the FCM does.
We also have an AES-256 encryption engine that is in-line, and all of this encryption, all of this compressing is done without any storage stack involvement whatsoever in the data path. Now, also what we've recently come up with is something that I call a hinting architecture. So, with the compressor and the encryptor, those are examples of data manipulators. Those actually take the data and changes it, and in one case reduces how much data that you need to store, and in the other case, providing security for data at rest.
13:20 AW: The hinting architecture is an example of a categorization of data, allowing the storage stack to categorize data, and in this case, it categorizes by heat. So, data that the operating system believes will be accessed more often is given a hot temperature and is told . . . The SSD is told that it is hot and will be accessed more often. That allows them, the SSD, to put it either in SLC, or if some pages in the flash have lower access times, they can put them there.
The concept is to take the data that is accessed the most and to put that on the fastest media that the device has. By doing that, the overall access time can be lower. And what this allows, these kind of hints, is higher throughput, higher IOPS and lower latency, making the job of the operating system, or in this case, the storage stack, more efficient. In its simplest form, this can be just saying, "This is data that is metadata for a data reduction scheme, for example. And that metadata is accessed more often and therefore putting it on a faster page can help the overall efficiency of the system."
You could also think of this in the other direction. You can indicate data that is cold or access times that are not as important, for example, if you're doing a background operation, the system can tell the SSD, "This data is -- it's not that critical. Put it lower priority, if you're doing other things, garbage collection or other . . . put it on a solar cue, it's not as important."
15:17 AW: And so that helps to make the system more effective and efficient. And so, after you look at that example and say, "Well, that's one way of doing computational storage device, and it really helps the storage stack. What's really the future of this?" And some of this future is now because there are companies here at the Flash Memory Summit that will talk about various ways of doing those various things that they have done. I kind of break it into several parts.
The first is doing an in-line filtering or reducing. And again, the reason I say in-line is to allow with a single PCI Express function to take data and manipulate it in-line. And so it could be a form of reducing how much data: "Only read data from this region that fits these characteristics," for example. And so in-line of that region, a simple search is done, and filtering and data is only given back that meets the characteristics.
So that would be an example of in-line filtering or reducing of data. It could also be out-of-band where the storage . . . Where the SSD is given a set of background tasks, where it is reading the data, looking for certain characteristics, building indexes into data that fits certain criteria that later then these indexes could be read, could be coalesced with the other SSDs, other CSDs. And then this data read into the CPUs, which is a subset of the overall data, looking for certain things.
17:16 AW: Now, in addition to reducing the amount of data that needs to be read, that's really what filtering and searching and sorting really amounts to. It reduces how much data I have to bring into the CPU. The other thing is to reduce the amount of data that needs to be stored, and compression is one example of that. You can also think of, in a data lake, being able to use the SSDs themselves to assist deduplication.
Today, all of the all-flash arrays do deduplication, but you could imagine offloading this to the CSDs, at least to accelerate it. The CSDs could do the hashing, for example, instead of the CPUs needing to build a hash table. Those hashes could be looked at in the background on certain block sizes and then collected and processed, and then reduce how much data has to be stored and instead have pointers to it.
To do any of this, especially the in-line filtering, there has to be a way of describing the data in-line so that you can tell there are tags associated or simple ways to do the filtering that is required. I know the SNIA computational storage team is looking at ways of doing this.
18:50 AW: So that is reducing the amount of data to read, reducing the amount of data to store. What's also, I think, very possible is to take tasks which today take quite a bit of processing, and offload those onto cores inside the computational storage device. And they can do background searches, filters, and really background jobs, software that can run or functions in the FPGA. You can imagine reporting on access rates, age, deduplication, like we talked about. Many different things that can be done in the CSD.
Now, the interesting thing is that the all-flash array itself is also a computational storage platform. And that could be viewed as a disaggregated storage device, just a bunch of flash or external storage or hyper-conversion infrastructure. That device itself . . . And here is an example of the FlashSystem 9200.
20:00 AW: The FlashSystem 9200 is really two full-blown servers with Cascade Lake CPUs and a lot of memory that is running a storage stack. Well, since it has an operating system and has CPUs, you could imagine it being able to run a rich set of predefined storage services, like we had mentioned earlier.
But you can also build a platform that would allow microservices and containerized or VMs where the application could download accelerators that it wants to run. And now, in this computational storage platform, you are not just storing the data, you are also offloading the CPU or the server, and using part of the compute, part of the bandwidth of the flash core modules or the SSDs to process data, to analyze it, to look for trends, to look for statistics, all kinds of different things can be done as a way of offloading the main CPU.
21:25 AW: So, again, just in summarizing, I think computational storage is one of the most exciting trends we see today. We are just on the cusp of AI and just on the cusp of what we can do with analytics to help really make value out of an organization's data. And to the extent that we can accelerate that, accelerate that processing, that analytics of that huge data lake, is a way that we can help organizations differentiate and compete in this market. So again, I thank you for joining. I hope you have a great day. I hope you get a lot out of the Flash Memory Summit.