While data reduction in storage used to have a backup focus, vendors now commonly include the technologies in their flash-based systems.
The superior performance that SSDs offer over HDDs has made it possible to use data reduction techniques even in primary storage systems that support mission-critical applications. Vendors have also improved their data reduction techniques for better efficiency and to help minimize the impact on performance.
Despite these advantages, SSD data reduction technologies can vary significantly from one product to the next in terms of effectiveness and, on the other end, the resulting performance decline. Before choosing a storage product, decision-makers should fully understand a system's data reduction capabilities, their potential effect on application performance and how much an organization saves in storage costs.
What data reduction encompasses: Then and now
Data reduction is a broad term and can incorporate a variety of features, but it generally focuses on two primary technologies: compression and deduplication. Compression removes redundant data at the bit level, and deduplication removes data at the block level.
For many years, data reduction was generally relegated to secondary storage that supported backups and archives, where the emphasis was on effective resource use rather than on application performance.
IT teams have been reluctant to implement either compression or deduplication on their primary storage systems because of the potential impact on application performance. For many organizations, this continued to be true even as they moved to all-flash arrays for their primary storage.
The issues with SSD data reduction
Data reduction operations could eat up both memory and CPU resources, add I/O overhead, increase latency and reduce overall performance. Even if organizations wanted to adopt data reduction, they might have been running workloads that couldn't benefit from these technologies, and any attempt to use them resulted in unnecessary overhead. For example, their data might have had a low rate of redundancy, so there was little to gain from trying to deduplicate the data.
Some storage systems took an all-or-nothing approach to data reduction, though always-on could do more harm than good. The all-or-nothing approach presented challenges for organizations that had to comply with regional data regulations, which might require data to be stored in its original format.
IT teams might not have fully appreciated their products' limitations until they were running production workloads. For example, they might discover too late that their storage systems didn't support lossless compression, failed to meet evolving scaling requirements or couldn't apply data reduction globally across an entire array.
Advancing SSD data reduction and how it helps enterprises
As all-flash arrays have continued to proliferate in the data center, data reduction techniques have made important inroads into storage environments, where cost per gigabyte is still a primary consideration, coming in a close second behind performance. However, advancements in SSDs have resulted in a new generation of devices that can deliver exceptional IOPS and microsecond latency, as well as performance that can more easily accommodate the data reduction overhead.
It's not just the storage devices themselves that have improved. SSDs that conform to PCIe 4.0 are now common, offering faster data rates than were possible only a few years ago. In addition, PCIe 5.0 drives have hit the market, and the PCIe 6.0 specification was released in 2022. Each new PCIe generation doubles the data rate from the previous one.
NVMe and NVMe-oF have helped to deliver high-performing storage systems. The NVM Express organization recently published Revision 2.0c of its Base Specification, which provides a more efficient interface for achieving lower latency and greater throughput. Together with PCIe, NVMe helps to maximize the full potential of SSD performance, resulting in storage systems that can better accommodate data reduction.
The data reduction technologies themselves have also evolved. They minimize the impact on performance and deliver more effective data reduction. All-flash arrays commonly support lossless compression, in-line data reduction and global reduction across the entire array or namespace. Vendors have made data reduction more adaptive and improved their algorithms to provide smarter reduction and better performance.
Data reduction capabilities in all-flash arrays can increase the effective capacity of their storage systems, save energy and reduce the storage footprint. Data reduction decreases the number of program-erase cycles, extending the life of the drive and reducing the data transmission load. Together, these factors help to lower overall storage costs and make SSDs more affordable to more workloads.
Although SSD data reduction operations can still affect performance, their impact is usually minimal in comparison to the performance gains offered by enterprise drives. For many workloads, the savings in capacity could be worth any performance tradeoffs.
Examples of SSD vendors, products with data reduction
SSD data reduction is a common feature of many enterprise products, but vendors take different approaches.
Dell Unity all-flash storage systems provide advanced data reduction capabilities that include both compression and deduplication. When data first enters the system, Unity segments it into 8 KB blocks and then passes it to the deduplication algorithm, which analyzes the blocks for known patterns. If Unity finds patterns, it dedupes the blocks and writes them to disk. If it doesn't find patterns, Unity passes the data to the advanced deduplication algorithm, which fingerprints each block to quickly identify duplicate data. Unity then passes the data to the compression algorithm, which applies compression only where savings are possible. Unity's data reduction occurs inline between the system cache and the storage devices.
HPE 3PAR systems have adaptive data reduction capabilities that provide inline deduplication and compression. 3PAR also includes thin provisioning, thin conversion, thin persistence and thin copy reclamation under its data reduction umbrella. HPE designed the lossless compression algorithm specifically to operate on a flash-native block size. It stores writes in cache before acknowledging them to the host and performs compression after the acknowledgement. The 3PAR systems scan the data to identify incomprehensible streams. If discovered, HPE stores them in their native formats rather than wasting CPU cycles in trying to compress them.
Pure Storage FlashArray incorporates multiple data reduction technologies to save space in its all-flash arrays. The system identifies and removes repetitive binary patterns and provides inline deduplication that supports a variable block-size range of 4 KB to 32 KB. It ensures that only unique blocks of data are saved to storage. The product applies deduplication across the entire array and not just a single drive. FlashArray provides inline compression that uses variable addressing and an append-only write layout to eliminate wasted space. It runs a post-process compression algorithm to squeeze out even more space.
Vast Data storage systems use an adaptive chunking technique that segments data into blocks ranging from 16 KB to 64 KB. Data reduction occurs within a single realm across the entire namespace created by the cluster, with the reduction metadata retained in a storage class memory write buffer. The deduplication operation first identifies identical data blocks and then runs a set of hash functions that look for similarities. If found, Vast compresses them together using a common compression dictionary. Vast provides data-aware compression that's applied automatically and in real time. Its systems also use delta encoding to reduce the number of stored bytes even further.