Data reduction in primary storage (DRIPS) is the application of capacity optimization techniques for data that is in active use, in contrast to storage that is used for backup, archival or other secondary storage purposes.
DRIPS uses data reduction techniques such as data deduplication, data archiving, thin provisioning and compression that have traditionally been associated with backup storage rather than primary storage. Additional data reduction methods used for primary storage include automated storage tiering, efficient clones and RAID-level selection.
The purpose of data reduction
The purpose of data reduction, whether for primary or secondary storage, is improved storage efficiency, lower costs and better use of available resources. The technology decreases the number of disks to buy and reduces support fees, which lowers operational costs associated with managing storage.
Inactive data increases at a rate several times that of active data. By removing inactive data from expensive and invaluable primary storage media, data reduction in primary storage can have a positive impact on storage, application performance and cost overall. However, primary storage, unlike secondary storage, is all about performance. So, because data reduction often requires system resources to work effectively, it is best used for primary storage when it makes the least demands on overhead to find and remove duplicate data.
Data reduction techniques
Data deduplication detects repeated patterns in data -- commonly based on fixed block sizes -- to reduce such patterns to a single instance. Every reference to a particular block of data then points to a single physical copy. Space reduction of data dedupe for primary storage can be substantial. Inline data deduplication removes data redundancies before or as data is written to backup. Post-processing deduplication -- also known as synchronous deduplication -- copies and removes redundant data after backup completes.
Thin provisioning eliminates the reserve on unwritten blocks of storage, allowing overprovisioning of storage resources and enabling more logical capacity to be created than is physically available. The technique does not actually reduce data but optimizes storage. Thin provisioning is widely implemented by storage vendors.
Compression finds repeated patterns of similar information that can be reduced and replaced with an optimized data structure. The method works with processing cycles to compress and decompress data as required. Compression is a mature and widely implemented technology that can significantly reduce storage requirements.
Data archiving moves less frequently used data to slower, less-expensive storage. The data involved may be maintained for compliance or for possible future use, but quick access is not required. For DRIPS, data would be moved from primary storage to backup media. In the case of an active archive, where archived data may be called back to primary storage for use by applications and users at a moment's notice, performance becomes a critical factor.
Automated storage tiering actively moves data between disk types – cheaper Serial Advanced Technology Attachment (SATA) storage for less frequently accessed data and high-performance Serial-Attached SCSI (SAS) or solid-state drives for more active data -- and RAID levels to meet cost, space and performance needs. It is a feature found in storage management software.
RAID increases data redundancy for data protection purposes and, depending on the RAID level selected for a particular storage environment, can have a positive or negative effect on the amount of active data in primary storage. It also has an effect on disk requirements, reliability and performance. Mixed RAID support enables users to optimize application performance, availability and costs.
Efficient clones create an identical copy of an existing volume, and they can often be used as a method of cloning virtualized operating system volumes. They can create a full copy of a source volume with the same amount of physical storage or duplicate thinly provisioned volumes with or without the same amount of storage. Most efficiently, clone a volume with no data at all by referencing blocks on a source image instead. Here, a new clone requires just a marginal amount of physical disk space, and only as clones alter do changes from an original image requiring storing. The technique of using efficient clones has become more prevalent with server virtualization.
The market for data reduction in primary storage is being driven by an increase in storage costs, which is, in turn, being driven by an increase in the amount of data that enterprises deal with.
Are data reduction techniques essential in VDI environments?