Computational storage fundamentals explained
Computational storage is an emerging technology, but it isn't too soon to develop an understanding of how it can be used to address storage latency and performance issues.
Computational storage is all about moving computation closer to storage and reducing the amount of data that must travel between storage and compute. If successful, this emerging technology would overhaul the traditional architecture where storage and processing are separate and distant.
This shift isn't happening in a vacuum, but rather in response to the massive amounts of data enterprises have to manage from IoT and other edge devices, along with AI applications. Moving that data around adds latency and performance issues few can afford in today's real-time business environment.
Vendors have responded to this data challenge with a variety of approaches to getting compute functions closer to or incorporated in storage systems. The Storage Networking Industry Association recently outlined some ways vendors are doing this in early products, including connecting field programmable gate arrays to SSDs, putting FPGAs in RAM and using systems on a chip.
It's too early to say if computational storage technology will take off and what configuration it will take. Nevertheless, Gartner sees potential for substantial growth. More than half of enterprise-generated data will be created and processed outside the data center and the cloud by 2024, according to the research firm. That compares with 10% in 2020.
It's time to get up to speed on computational storage fundamentals. What follows is a look at the problems it could address, some early use cases and early adopters.
3 problems computational storage addresses
Part of understanding the fundamentals of computational storage is knowing where it could be of use. The technology has the potential to solve three problems facing today's storage systems, according to Marc Staimer, president of Dragon Slayer Consulting.
- Speed-of-light latency. When you close the distance between compute and the data being processed, you reduce latency. IoT, remote monitoring and remote administration devices all put data at a considerable distance from compute processes of traditional infrastructure. "The distance problem is one computational storage can potentially solve," Staimer said. "It enables the compute-, memory- and data-intensive processes to move closer to the stored data." The result is only a "nominal" amount of processed data must be sent to the data center or cloud for further processing, he said.
- CPU performance bottleneck. The proliferation of CPU-intensive storage processes -- erasure coding, deduplication, compression, snapshot and encryption -- have turned processers into a performance bottleneck for storage systems. Computational storage can offload these CPU-intensive processes, freeing the primary CPU for better I/O and throughput performance.
- Application performance issues. Database and various AI applications can also take up a lot of CPU resources. Here, too, computational storage can offload processes to improve the performance of these applications.
2 key use cases for computational storage
Many of the IoT devices populating the network edge must capture, store and analyze data in real time. IoT is computational storage's "biggest use case," Staimer said in a separate article. He pointed to autonomous vehicles, drill bit vibration monitoring in oil and gas installations, and closed-circuit TV cameras as examples of devices that could benefit from computational storage's ability to analyze data in close-to-real time and in the field.
Another important use case when looking at the fundamentals of computational storage is the possibility of turning the server into an inside scale-out cluster. Here, you use computational storage cores for snapshots, replication, deduplication, compression and other CPU-intensive workloads, and dedicate the main CPU cores to application processing.
The case at the edge
Edge computing often happens in a confined physical space, such as autonomous vehicles, surveillance cameras and other IoT devices, which have little room for storage and computing devices. Such limitations put constraints on what can be done to maximize the performance of data-intensive workloads.
Space isn't the only issue at the edge, said consultant and writer Robert Shelton. Budgets and I/O port bandwidth also hamstring conventional storage and computing infrastructure.
Computational storage could solve these edge computing constraints. With it, the storage system can preprocess data so only the most useful data gets sent to memory, minimizing data movement and reducing the load on computing resources. Data gets processed faster and more efficiently thanks to less data movement and extensive use of parallel processing, Shelton said.
AI and computational storage
AI applications to date have relied on in-memory databases and a lot of preparation to use all the data that's being generated. To run real-time analytics on large data sets, scale-out storage with low latency is needed. Here, too, computational storage is a likely player.
For AI, computational storage can do upfront analysis to help identify what data is needed or, at least, do preliminary sorting, reducing the amount of data that needs to be moved, said Andy Walls, IBM fellow, CTO and chief architect for IBM Flash Storage. "At a bare minimum, it could be improved software that tags the data and helps to reduce the time to prepare it," he said.
Who are the early adopters?
IT pros are in the early stages of figuring out these computational storage fundamentals and how to use the technology in their infrastructures and what use cases might benefit. It will take a broad ecosystem that includes "killer apps" and software-defined storage from several vendors for computational storage to become "relevant for mainstream adoption," said Julia Palmer, a research vice president at Gartner.
Tim Stammers, a senior analyst at 451 Research, said he expects only very technical organizations with big problems, such as hyperscalers or the next tier down from hyperscalers, will turn to computational storage.
Among the vendors in this market, NGD Systems Inc. recently rolled out a 16 TB NVMe-based SSD equipped with multicore processing technology that can be used with computational storage. Microsoft Research, an NGD partner, is working on proofs of concept to demonstrate the potential benefits of computational storage SSDs for search requests for large image files, among other applications.
Annual Update on Computational Storage
Session A-10: Keys to Making Computational Storage Work in Your Applications