What you will learn from this tip: This tip compares and contrasts in-band and out-of-band deduplication and offers some pros and cons of each approach.
An important differentiator among deduplication products is whether they work in-band or out-of-band. That is, do they deduplicate the data as they're writing it to the array or VTL (in-band), or is deduplication a secondary process that may run asynchronously (out-of-band). There are advantages and disadvantages to each method.
The advantage to the in-band method is that it works with the data only one time. The drawback is that, depending on the implementation, it could slow down the incoming backup. The inline camp argues that while they'll probably slow down the backup somewhat, when they're done, they're done. The out-of-band camp still has important work to do: Store the data.
The out-of-band method has to write the original data, read it, identify its redundancies, and then write one or more pointers if it's redundant. The advantage to this is that you can apply more parallel processes (and processors) to the problem, whereas the in-band method can apply only one process per backup stream. The disadvantage is that the data is written and read more than once, and the multiple reads and writes could cause contention for disk. In addition, the out-of-band method requires slightly more disk than an in-band setup because an out-of-band system must have enough disk to hold the latest set of backups before they're deduplicated. The out-of-band camp counters that slowing down the original backup is unacceptable, and that they'll be able to deduplicate the data in time for tomorrow's backup.
You probably shouldn't dismiss a vendor simply because it uses in-band or out-of-band methods, but definitely test the different deduplication methods to determine how fast they work in your environment. Remember to test the product against many slower backups as well as a smaller number of backups where speed matters. Some systems perform well for single streams, but don't scale for many streams. Some work well only when you send them many streams, but don't perform well with a very fast single stream. Finally, test the deduplication product with enough data to see whether it will handle the amount of data you back up every day. If it doesn't get the deduplication job done every day in time for the next night's backup, you're going to be in trouble.