The differences between block-based and file-based data backup

This tip examines the differences between block-based and file-based data backup and will help you choose what's best for you.

While the distinction between block-based data storage and file-based storage is worth understanding (because it helps explain the differences between a NAS filer and a storage area network, for example), backup administrators don't usually choose one over the other -- at least not directly.

For most administrators, the block vs. file question is decided at the level of the storage technology rather than the bits moving over the wire. While each approach has its advantages and disadvantages, those are usually subsumed under the overall design of each piece of equipment. In other words, you seldom choose either block-based or file-based storage. You choose a NAS gateway or a SAN based on overall characteristics, which include block- and file-based storage.

File-based storage is usually associated with NAS. The original NAS implementations aimed to provide the simplest possible method of adding storage to a network. Dealing in files eliminated management overhead in the filer and was a better fit for organizations that had previously been using direct-attached storage (DAS). Companies like NetApp started out in this space and expanded.

SANs, from companies like Brocade are usually based on blocks. This reflects the SAN's more sophisticated roots and its early uses handling very large files with the capability of having those files distributed over multiple storage devices.

Some of the earliest users of SANS were movie, video and music production studios that were tossing around multi-gigabyte files of when the rest of us still thought a 500 meg file was a moose. In some of these installations the music would reside on one server, the video clips on another and the edited product would be stored on a third. Because a lot of the very early SANs were, in effect, custom built, this had an inordinate effect on these very early architectures. The block-based storage gave more granularity for better control, among other advantages. Consider moving a huge file over a complex SAN with alternate pathways from source to destination. If the system "speaks" block as its native language, it's a lot easier to split that file up into blocks and seek the most efficient -- least congested -- path through the network for each block or group of blocks. That gives you higher throughput and better network utilization. You can do the same thing with a big file, of course, but it's harder because there's what amounts to a translation step in there.

Finally, there is content-addressable storage (CAS), which is file-based in a special way. CAS stores meta data along with the files and uses that data to manage the files. In fact, CAS is a class of object-oriented storage (OOS). Although OOS is file-based at bottom, it has so many special features it's best considered as a separate class. EMC Corp. is probably the best-known vendor of CAS products.

One thing that complicates this discussion is that because all storage is ultimately in blocks it's difficult to make hard-and-fast statements about what one approach or the other can or can't do. You can generalize, but the generalizations have a lot of exceptions.

Another complication in having a pure block vs. file debate is that devices and approaches tend to overlap as the markets develop. For example, NAS may have started out as a simple, low-cost approach, but with the development of NAS gateways and other technologies, NAS has moved into SAN territory at the high end. While most NAS devices still deal in terms of files, the distinction is very much blurred today.

Finally, you don't have to choose one or the other. Recently vendors have started marketing storage equipment that can handle either block- or file-based storage. This offers greater flexibility at the expense of more complexity. Companies such as Hewlett-Packard (HP) Co. with its StorageWorks All-In-One Storage and NetApp with StoreVault can handle both block-based and file-based storage protocols.

About the author: Rick Cook specializes in writing about issues related to storage and storage management.

Do you have comments on this tip? Let us know.

Please let others know how useful this tip was via the rating scale below. Do you know a helpful backup tip, timesaver or workaround? Email the editors if you'd like to write tips for SearchDataBackup.com.

Dig Deeper on Data backup and recovery software

Disaster Recovery