ekzman - Fotolia

Tape archiving tips for a better storage strategy

Studies show that tape archives are a comparatively cheaper method of long-term data retention. Explore how to use active archive and other storage technologies with a tape system.

The key advantage of using tape for the lion's share of storage is its low cost of ownership.

Multiple disk archiving vs. tape archiving analyses have been conducted by organizations ranging from the Informa­tion Storage Industry Consortium (INSIC) to the Clipper Group. According to INSIC, the total cost of ownership over a five-year period for maintaining a 500 TB archive on disk vs. tape is approximately $1.5 million for disk -- in acquisi­tion and energy costs -- compared to about $250,000 for tape archiving. Clipper Group considered the com­parative costs of same-capacity disk and tape repositories and determined that, based on energy costs alone, disk products are about 76 times more expensive than comparable tape offerings.

Tape archiving with disk or flash

Still, some firms prefer to establish a disk or flash buffer in front of the tape archive to facilitate archival data usage. For example, if digitized films are stored in a tape archive where end users can access them -- and it's desir­able to have the movie begin within a minute of a request -- it may be desirable to begin the movie immediately upon request, concealing the amount of time it takes to locate the movie on a tape in the library, mount it and get it to the location on the tape where the movie file starts -- a maximum of about 2 minutes. 

To circumvent this delay, an alternative is to store two minutes of every movie in the tape library as a file on disk or flash repositories. As soon as a movie is requested, it can begin streaming from the disk while the back-end tape system finds and loads the tape and spins the file up. The tape can then seamlessly take over from files playing off the disk buffer. The great thing about tape is jitter-free playback at a consistent transfer speed, which is one reason why tape technology persists as a preferred storage medium in media and entertainment.

Tape to cloud gains traction

Some firms prefer to establish a disk or flash buffer in front of the tape archive to facilitate archival data usage.

In some implementations, like Fujifilm's Dternity, the storage buf­fer in active archives -- whether disk or flash -- remains on-premises, while tape storage goes to a cloud. There are some indications that Microsoft will be leveraging archival tape in its Azure cloud offering.

Late last year, an Azure cloud architect made the case for tape as the only option for storing the 10 zettabytes to 60 zettabytes of all the new data expected to arrive in cloud archives by 2020. The combined man­ufacturing capacity of the disk, flash and opti­cal industries would be insufficient to meet this storage demand.

Microsoft isn't alone in identifying a role for tape in the cloud. Discussions at seminars and in the trade press suggest all the major cloud brands are considering tape technology, whether for backup, large data transfers (cloud seeding) or archive.

Tape archiving: Combating the increase in data

In addition to media and enter­tainment, oil and gas, healthcare and cloud services industries, most scientific laboratories continue to use tape archiving as a means to keep pace with storage capacity demand. For example, Brookhaven National Laboratory (BNL), in Long Island, N.Y., uses a particle collider to conduct experiments that produce massive quantities of data. According to David Yu, a storage expert at BNL, the volume of data generated by experi­mental collisions and other work increased from about 2 petabytes (PB) in 2009 to upwards of 13 PB in 2014.

Active archive platforms

By definition, an archive is a collection of information. So a set of backup files, a pool of objects and even a database are all archives.

Archives are different than simply collections of data, however. An archive is a collection of information purposely built to facilitate long-term retention and workable search and retrieval -- all of which is hosted on resilient media and preferably at a low cost befitting limited re-reference rates. Some vendors distinguish between active archives and cold storage/deep storage archives, the former for facilitating higher rates of re-reference than the latter.

To support more active use cases, active archives often span at least two tiers of storage, including a random access media tier -- flash or disk -- and a linear access media tier, such as tape. The first tier provides data access speeds in milliseconds or microseconds, while accessing data from the tape archiving tier can take anywhere from 45 seconds to two minutes depending on library size (num­ber of media cartridges) and the number of robots and drives (to pick media from storage racks and place it into a drive for reading). Current midpoint-loading tape cartridges can usually get to the first byte of requested files in well under 45 seconds from drive insertion.

Many active archive platforms are available, some intended for on-premises deployment, others for hybrid cloud deployments -- by integrating a gateway service for migrating less frequently accessed data to a remote repository, such as AWS Glacier.

In 2016, planners expect 20 PB to 30 PB of new raw data. Eight tape libraries, each hold­ing 10,088 tapes, have been put into produc­tion to archive data that remains active even after it's written to tape. In 2014, BNL restored more than 7.5 million files from the archive, which is 20,843 files per day or 868 files per hour. Those numbers are increasing exponentially.

Tape continues to store an estimated 70% of the world's data, a percentage expected to grow as data itself grows. What remains to be seen is whether big data analytics will drive even greater usage of tape archiving. At present, object storage companies like Caringo and web-scale, software-defined storage startups like NooBaa appear to be encouraging a shelter-in-place and archive-in-place model for their data, storing data locally on hyper-converged server appliances or simple servers with direct-attached storage, then spinning down nodes when data becomes archival quality.

Advocates of this shelter-in-place approach claim that although big data files and objects have a very limited useful life, they must be retained anyway. The companies state that moving massive quantities of data spread over analytics server farms is fraught with potential data loss. It's also a source of friction -- the latency that accrues to any data movement -- customers would like to avoid. As an alterna­tive, and to drive down disk costs, shelter-in-place advocates recommend powering down drives that contain the data, creating a cold archive. On its face, that is certainly one strategy for addressing the Clipper Group find­ing that disk archives are 76 times more costly than tape based on energy costs alone.

You may want to cross your fingers when powering up those drives later on, though.

Next Steps

LTO bit error rate improves; significant for backup

Tape offers most archive security

Explore factors driving increase in tape archiving

Dig Deeper on Archiving and tape backup

Disaster Recovery