animind - Fotolia

How do deduplication and compression compare to RAID erasure coding?

RAID 5 and RAID 6 erasure coding, deduplication and compression are similar but distinct concepts that reduce data and maintain storage capacity in virtualized environments.

Erasure coding, deduplication and compression are three distinctly different and independent concepts, but all...

three can help conserve costly storage capacity.

Generally, RAID techniques add resilience to storage, but RAID 5 (single parity) and RAID 6 (double parity) demand less storage capacity than the more traditional RAID 1 (mirroring).

Mirroring demands twice the storage capacity. To mirror a 200 GB disk, you would need two 200 GB volumes, for a total of 400 GB of committed capacity.

By comparison, RAID 5 under VMware vSAN 6.2 uses only 1.33 times the storage the original disk requires, and RAID 6 needs 1.5 times the storage. A RAID 5 disk group would require 266 GB total storage to protect a 200 GB volume, while a RAID 6 group would require 300 GB total storage to protect a 200 GB volume. The additional storage would be used for parity or erasure coding data.

VSAN 6.2 also supports deduplication and compression. In essence, both ideas involve the removal of redundant data content. Compression finds and removes redundant data within blocks of storage, while deduplication -- understood as a form of compression -- finds and removes redundant blocks outright.

Both deduplication and compression are usually applied together when additional storage savings is desired; deduplication is typically applied first, and then the remaining blocks are compressed, if possible. The actual amount of storage savings available with deduplication and compression is hard to calculate, and depends on the type of files being compressed, as well as the number of file copies -- to other duplicated blocks -- present to be deduplicated. VSAN reports the actual deduplication and compression amount/ratio in the vSAN capacity monitor.

However, deduplication and compression don't provide resilience, so use caution when you apply them. Both processes remove redundant content and replace redundant content with pointers to a single copy or instance of the content. Even though there might be many instances of duplicated blocks scattered across a disk group, only one full and complete instance of that redundant content will exist within the disk group; all other duplicated instances point to the one full copy.

This means a single disk failure might render the only full copy of that redundant content inaccessible, cripple the entire disk group and require the administrator to restore from a backup.

Generally, deduplication and compression are used in conjunction with some form of RAID implementation.

RAID is a well-established and reliable way to ensure resiliency in storage. VSAN 6.2 users can opt to implement simple RAID 1 mirroring, but can also use other erasure coding methods, including RAID 5 and RAID 6, striping data, parity and erasure coding information -- such as parity for RAID 5 -- across multiple disks organized into RAID groups.

Higher RAID levels can help organizations conserve valuable storage capacity, while maintaining data availability; a critical attribute for businesses of all sizes.

Next Steps

How to implement RAID 5 and RAID 6 in vSAN 6.2

Dedupe and compression software on the decline

Explore different data reduction methods

Dig Deeper on VMware ESXi, vSphere and vCenter

Virtual Desktop
Data Center
Cloud Computing