Optimizing SSD Performance With AI and Real-World Workloads
Here is a real-world use of AI methods on advanced XL-flash SSDs in which a recurrent neural network serves as the underlying model and produces substantial performance improvements.
Download this presentation: Optimizing SSD Performance With AI and Real-World Workloads
00:05 Eden Kim: Hello. My name is Eden Kim, I'm CEO of Calypso Systems, Inc., and I'm here to present "Optimizing SSD Performance With Artificial Intelligence and Real-World Workloads." I want to welcome you to this virtual FMS.
So today, we're going to look at an example of how to capture a GPS navigation edge workload and use it to optimize artificial intelligence, machine learning, LSTM/RNNs. LSTM stands for long short-term memory, and RNN stands for recurrent neural networks. We will be capturing and looking at workloads on the input side of LSTM/RNNs, as well as on the output side to see the characterization of the workloads. However, this talk will not delve into the mechanics inside the artificial intelligence analysis and optimization as that's outside the scope of this work.
01:21 EK: So, what's the difference between a synthetic and a real-world workload? Well, synthetic workloads are typically corner case workloads that stress storage outside the range of normal usage, whereas real-world edge workloads are a constantly changing combination of different I/O streams and queue depths. Synthetic workloads tend to be a fixed or a single, very few I/O stream workload applied over a defined period of time. Real-world edge workloads may have many I/O streams, hundreds to thousands that change over the course of the workload. Corner case workloads have traditional transfer sizes such as half-K, 4K, 8, 16, 32 and so forth, up to 1 megabyte, whereas real-world workloads have many non-typical transfer sizes such as 10 byte, 28 byte, 60 byte, 128 byte, 1K, 1.5K, 20, 36, 56 and so on and so forth. Corner case workloads also have a fixed demand intensity or queue depth. On the other hand, the real-world workload, edge workloads, have a constantly changing demand intensity or queue depth that varies as time goes on.
02:44 EK: So, what are some examples of real-world edge workloads? Well, you have artificial intelligence, you have genetics, genomics, 3D animations, cloud storage, big data, web portals, machine learning and 5G edge servers. Today's example will be a blend of AI, cloud storage, big data, web portal and machine learning.
So, let's take a closer look at a GPS navigation edge portal 24-hour workload. So, what you're looking at here is a representation of a 24-hour I/O capture or what we call an I/O stream map. The first thing you'll see on the X-axis is time. On the Y-axis, you have I/Os, IOPS and I/O stream metrics. If you look at the colored bars going across the X-axis, you can see that there are different I/O stream sizes, queue depths and read/write mixes. Also note that there are four 12,000 I/O spikes, each of which is comprised primarily of sequential half-K writes. The blue dots are the I/Os or the IOPS and note how they're homogeneously clustered around the 10,000 I/O level. Finally, you can see the orange line, which is queue depths, which ranges from 7 to 368, depending on the activity that occurs on the I/O stream map.
04:38 EK: If we look a little closer at the I/O streams, they're listed across the top as data series, but also in the cumulative workload. Here we can see that the nine most common I/O streams by percentage of I/O occurrence are listed and selected to display on the I/O stream map. The other thing to notice is that there are a total of over a thousand different I/O streams that occur over the 24 hours. These are all different sizes, 28K, half-K, 1.5K, 64K and so on and so forth. But the nine selected I/O streams that you see compose 68-70% of the total I/Os that occur. Finally, if you look at the nine I/O streams, they have a read/write mixed ratio of 94% writes to 6% reads. So, this is a heavily dominant write workload. If you look at the activities that occur, we can chart the process IDs or PIDs. PIDs are signed by the OS kernel to each I/O stream and can reflect the process, the event or the driver associated with the I/O stream. In this case, SQL Server and MySQL comprise over 80% of the I/Os. So that means that 80% of the I/Os are SQL Server-associated.
06:11 EK: In addition to that, while not listed as a SQL Server-related process ID, many of these SQL Server PIDs can be subsumed in the system I/Os. So, if you see there below the bubble, the system I/Os are another 15%. So, in addition to the 80%, some number of those system process IDs would also be SQL Server IDs. So, where you capture a workload in the stack is very important. This is because I/O streams change as they traverse the software stack from user space to storage and back.
06:57 EK: For this presentation, we are using the IOProfiler, an I/O capture tool to capture the workloads. So, in the first case, you can capture the I/O stream workloads in the enterprise storage of the file system and see all of the I/Os that are applied to the logical storage, whether it's NAS, SAN, DAS, persistent memory, computational object or any other kind of storage. In addition to the file system, you can also capture I/Os that go with the block I/O level, to NVDIMM, to data center, persistent memory modules, to NVMe SSDs, logical units, storage clusters, hard drives and the like.
In addition to direct enterprise storage, we can also capture the fabric. So, depending on the fabric that you use, you can install the IOProfiler capture tool on the host initiator and capture all of the I/Os that are going from the host initiator to the target server storage. In addition, you can put the capture tool on the target storage server, and then again, just as with the enterprise server, capture logical storage at the target storage server file system or tunnel through to the block I/O layer for the individual storage.
08:18 EK: You can also look at virtual storage. Here we go at the VM level and put the I/O capture tool on the virtual storage server, and then you can proceed to capture your fabric or direct enterprise storage and follow the path as previously indicated. So, again, you can capture any direct, remote or fabric storage. You can capture the physical, virtual or logical storage layer. You can capture the file system, block I/O or byte adjustable, and you can capture on persistent memory, NVMe SSD, logical unit, storage clusters and more. So, if you're interested in seeing the SNIA compute memory storage initiative reference workloads or look at these I/O capture tools in the analytics, you can look at them and download them for free at www.testmyworkload.com.
09:24 EK: So, let's look at a I/O workload capture or curation and optimization with using the IOProfiler. Again, IOProfiler captures real-world workloads. So, in the first case, we can capture time steps of I/O activity and capture a very fine grain or very broad stroke I/O capture workloads. I/O profiles are different than I/O trace captures in that we don't take any personal data, or record actual information, because we use steps and because we average the statistics of the I/O streams over each step, we can transfer this table of I/O statistics to a database and rebuild a workload without dragging over the associated large data file sets associated with I/O trace captures. Using the I/O capture tool, we can also monitor workloads from the edge. This edge transmission allows us to portably send small file size captures from the edge to the data center server. We can use edge transmission for real-time telemetry, and we can also use the automated phone home alert system, which we'll talk about in just a minute.
10:47 EK: With this edge transmission monitoring, we can continuously monitor, view the workloads, we can replay workloads as they happen from the edge, and we can also monitor multiple nodes, servers and drives. Once we capture the I/O stream activity, we can then look at the key performance indicators such as the I/O streams, I/O stream metrics, such as process IDs, I/O bursts and we can look at spikes from TREM activity, from queue depths, from blue storms and other activity. Once we have these key performance indicators, we can then curate the workload. We can parse, filter, rebuild or splice any workload to focus on the activities of interest, to reflect the target workload as we wish to see it. Once we have a curated workload, then we can use it to train or use it for replay testing for artificial intelligence or for validation of other storage systems. So, in the AI case, we use the replay of these workloads to loop, validate and infer to optimize the performance activity. In addition to AI, we can use the captured workload and the replay test to optimize storage. This could be used for storage qualification, validation and evaluation.
12:15 EK: Okay. So, let's take a little closer look at artificial intelligence and the IOProfiler. So, there are basically three parts to AI in this context. First step is gather. Here you want to gather the workload I/O capture and curate the workload into a test script. The second step is training. Here you take the replay test script and you apply it to your storage and you repeat it, and then you use your AI algorithms of interest to optimize the storage. And then in the third case, you do inference. Once you have the optimization algorithms, you can then use it to infer the workload performance from the edge and optimize it for your storage. So, let's take a little closer look at gather training and inference.
13:15 EK: For gather, again, we're looking at a long short-term memory recurrent neural network workload. We are going to gather the workload on the input layer of the LSTM/RNN. So, you can see here that there are many servers with real-world storage workloads that are inputting to server nodes. We capture this input activity and create training test loops, which we then use an AI optimization to help the LSTM layer optimize the performance of storage to the workload. Hidden to us are the LSTM layer, the hidden dense layer, the sigmoid activation function and more. But what we see is on the output layer, workloads coming out to other nodes, so we can capture the input and the output to this LSTM/RNN and use those workload captures to either train using the replay test or for validation or inference, looking at the output layer to make sure that the output of our AI matches our assumptions and our objectives. So again, at the gather stage, we are going to capture, curate and create the replay test script. At the training, we're going to take that replay test and we're going to apply it to the storage repetitively and keep tracking the key performance indicators to see how performance improves with each iterative of step of the AI algorithmic development.
15:12 EK: So, this AI machine learning workload training is a replay test and loop. So you can see here that you can take the input test and you can run a series of tests against the storage over time, and then you can analyze that output and then improve your AI algorithm to target the key performance indicator of interest. We're not going to peel back the layers and look under the hood to see which key performance indicators were used in this case, but it was looking primarily at the data transfer size or the block size, in the Codex as the key determinants in optimizing this workload.
For the inference or the optimization, what we see now is taking the I/O capture tool and monitoring the edge workload to see if the output layer has been getting us higher performance. So here you apply the LSTM/RNN AI algorithms and you optimize your performance gains, and then you want to monitor the edge performance to see that the continuous delivery and integration is achieving the goals and objectives of your AI.
16:34 EK: So here, you can then see that from the edge, my workload has been improving with the orange line going up and the blue line going down, so IOPS going up, response time is going down, and then you can see by grabbing individual test workloads, we can see a progression where in this case, our IOPS are increasing from 112,000 to 115,000 over the course of 10 runs.
The last aspect of this real-time monitoring of the edge workloads is to use key performance indicator or KPI alerts from multiple edge servers and nodes. So, we'll have on the edge, many different servers that we may want to monitor the workloads from. Because we're using I/O stream time-steps we can deliver, monitor and observe the workloads very efficiently in real time without having bottlenecks or having to move huge amounts of workload data from the edge to the data center.
Further, if we set the key performance indicator monitoring up, we can have full home alerts that can give you an alert when a threshold is met for, for example, a low IOPS level or a high response time level, it can tell you which node has done that, you can click that node and look at the real-time performance and then capture the workload for future performance optimization or for validation of your AI ML LSTM/RNN. That was a lot of letters I know.
18:19 EK: So, in conclusion, applications and storage performance depends on the workload, and these workloads change as your diversity I/O and software stack. Real-world edge workloads are very different than your synthetic lab workloads, and you can capture your real-world workloads with the free I/O capture tools available at testmyworkload.com. Once you have your workload, you can curate and train AI workloads using the IOProfiler tool set, and you can use these replay results to optimize your applications and storage by monitoring and curating key performance indicators.
19:06 EK: So that was just a quick overview of how to use IOProfiler, real-world workload, I/O capture tools, analysis and monitoring. And in this case to AI LSTM/RNN. So, if you have any questions, you can send them to [email protected] otherwise, I look forward to seeing you and hearing from you at the next FMS, perhaps not virtual, show. Thank you very much.