Data deduplication is a relatively new technology that has made its way into many data storage environments. But what makes it a justified expenditure in one environment will not necessarily hold true in all cases; there is definitely a need to understand whether the dedupe will fill a gap, help you meet a requirement or reduce costs. Storage vendors are typically better at finding a need for their technology in your environment rather than finding a technology that will actually meet your needs. Beware of vendor ROI calculators that spew out fantastic dedupe savings as mileage will most certainly vary.
When talking about building a business case for dedupe, the term expenditure is preferred here instead of investment because we are talking about backups. Rarely does data backup technology generate revenue unless it's used by a backup service provider. For most companies, backups are a way to prevent losses so the mindset is around saving money. You don't really hear about "investing in a backup technology to increase revenue." Cost reduction is therefore a good place to start to build a solid business case for data deduplication.
What are you trying to solve with data deduplication?
What are you trying to solve with deduplication technology? This should be the first question asked. While there is actually nothing wrong with adopting new technologies and improving the way certain IT processes work, obtaining funding is always easier when it is aimed at cutting costs or addressing something that is failing to meet requirements. Here are some pros and cons that can help build a case for deduplication
Advantages of data deduplication
Remote offices: Deduplication can help address a common situation for remote offices where there are no onsite skills to manage backups. Using a dedupe-capable disk array as the primary target to store backup data will eliminate the need to ensure a tape is always available and eliminate the need to have someone mount a tape for restores. Add to that the ability to replicate deduplicated data across the WAN and you have a low management overhead backup solution. Additionally, replicating deduplicated data across the WAN reduces the network bandwidth requirement, making this a cheaper alternative to disk mirroring. This does not necessarily translate into immediate savings over tape, but it can eliminate frequently failing or missed backups.
Data deduplication and duplicate files: Eliminating duplicate files is one of the most appealing reasons for data deduplication. Environments with large amounts of duplicate or similar files have a lot to gain from a storage cost-reduction perspective. Deduplication yields the best data reduction results when it encounters large volumes of identical data segments. In instances where full backups are frequent and data change rates are moderate to low, date reduction can be very impressive and can result in significant storage savings. A data reduction ratio between 5:1 to 10:1 is not uncommon but ratios of 20:1 and higher have been observed in some environments.
Reduced media handling: For environments still needing tape operators and racks to store media because the tape library is at near capacity, deduplication offers a great opportunity to reduce media handling allowing resources to be redeployed in other areas where they are needed. Once more, the ability to replicate data to a remote after it has been deduped can eliminate the need for offsite media handling without requiring major network bandwidth to meet backup windows. Organizations with at least two locations already connected via a network link can leverage replication of deduplicated data without significant capital expenditure while reducing their offsite storage budget and reallocating resources to more productive tasks.
Space reclamation: Given the cost of data center space, it may make a lot of sense to reclaim some of the space occupied by a very large tape library and replace it with some reduced footprint, dedupe-capable disk arrays.
Tape upgrade: Any organization considering a tape technology update should seriously consider disk deduplication. Where it does not necessarily make financial sense to rip and replace a tape subsystems that is still meeting requirements, the need for a technology update always offers an opportunity to evaluate other options.
The disadvantages of data deduplication
Data type: Not all data is a good candidate for deduplication; image, video and audio or other types of compressed data will gain little from deduplication.
Encryption: For security-minded organizations that implement data encryption at the source, deduplication at the backup level is not the best choice as encryption's first job is to make date unrecognizable without the keys. This nullifies most benefits of deduplication unless encryption is applied post-deduplication.
Transient data: Data with very low retention parameters will typically see a poor dedupe or reduction ratio. This is because deduplication needs to build a base of identical data segments before it really becomes effective. Pass-through or very short-term retention data does not typically reside long enough on the storage array to allow dedupe algorithms to build history. Deduplication is definitely better suited for longer term retention.
Deduplication-capable virtual tape libraries (VTLs) should not be considered an endless source of tape devices. While manufacturers may enable you to configure 128 logical tape drives or more, this does not automatically translate into a massive performance gain. For example, streaming data to more than100 virtual tape drives over a gigabit link will still not exceed gigabit performance. You may find yourself with the same performance bottleneck tens of thousands of dollars later.
Many vendors will leverage the fact that deduplication-capable disk arrays can be faster than tape but there are still limitations. Data deduplication to disk is not mirroring or snapshot technology; data must be reassembled and if managed by a backup product, it must also be written back to a file system in a format that is readable by the applications accessing it. Depending on the deduplication technology in use, performance for large restore operations can also be disappointing.
Deduplication should be presented like any other technology. Unless using it will address the shortcomings of another technology or truly help reduce operational costs beyond the initial capital cost over the solution's usable life, it will be a tough sell.
About this author: Pierre Dorion is the Data Center Practice Director and a Senior Consultant with Long View Systems Inc. in Phoenix, AZ, specializing in the areas of business continuity and disaster recovery planning services, and corporate data protection.