cecs - Fotolia
Tape archive remains the recording medium of choice
Tape archiving offers low-cost, long-term data protection in a variety of industries, keeping it atop the list of data center backup technologies.
Once upon a time, we recorded and saved backups on tape, with each server streaming its files or volumes directly to a tape drive. As they strived to meet their "backup windows" -- typically overnight and on the weekend -- companies had to buy multiple drives to get any sort of parallelism with what involved a mostly serial process and completely serial recording medium.
This predicated the use of large tape libraries to house these drives and (the often) hundreds of tape cartridges required to contain entire backup data sets. In order to provide DR protection under these conditions, most companies would then transport backup tapes to off-site locations, often contracting with a service such as Iron Mountain to weekly (sometimes daily) carry tapes to secure vaults.
Times have changed.
Tape is no longer the standard for corporate backup or DR systems. It's been replaced by large-capacity disk drives as the primary backup medium for most companies. Tape technology continues to evolve, however, as it's still being used as an archive tier, particularly in certain industries.
This article delves into where tape came from and where it's currently deployed. It then discusses other potential use cases for this venerable, yet much-maligned recording technology.
LTO and tape format evolution
Hewlett Packard, IBM and Seagate, which later sold its tape business to Quantum, formed the Linear Tape-Open (LTO) Consortium in the late 1990s as an alternative to Quantum's proprietary Digital Linear Tape (DLT) format. Launched in 2000, the first generation of LTO (LTO-1) held 100 GB of data per cartridge and achieved a maximum throughput speed of 20 MBps (uncompressed).
Since then, the LTO Consortium has released a new generation of LTO every two or three years, each essentially doubling storage capacity. Throughput speed has increased substantially as well. It released its latest specification, LTO-7, in 2015. LTO-7 provides 6 TB of raw capacity and 300 MBps throughput, both significantly greater than current HDDs. Using a standard 2.5:1 compression ratio, LTO-7 provides 15 TB of capacity and 750 MBps throughput.
The LTO Consortium has published a roadmap out three more generations that would put LTO-10 at well over 100 TB and over 2 GBps. Each new generation of tape drive can also read tapes back two generations and write to the previous one.
Aside from capacity and throughput increases, LTO has added several other important features over the years. In 2005, LTO-3 added a write once, read many (WORM) capability. WORM makes data immutable and allows LTO to be compliant with legal recordkeeping regulations. LTO-4 in 2007 added the ability to encrypt data before it is written to tape with the AES-GCM (Advanced Encryption Standard-Galois Counter Mode) algorithm, using either proprietary or open-standard key management protocols. And, perhaps most significantly, LTO-5 in 2010 added partitioning, which enabled the Linear Tape File System (LTFS) to be incorporated into the LTO standard.
It should be noted that while LTO is far and away the industry standard for digital tape systems, IBM and Oracle (formerly StorageTek) also make proprietary single-reel cartridges and tape drives. Both of these companies were instrumental in the development of tape and tape libraries for the computer industry and maintain these products to support their installed bases. The IBM 3592 supports up to 10 TB on a cartridge (compressed) and the Oracle T10000, 8.5 TB (uncompressed).
Mainframes: A look at tape's origins
While tape in the open systems world was once the standard for backup and long-term archiving, cartridge-based tape and tape libraries actually originated in the mainframe world.
These large, monolithic computer systems ran compute jobs in batches. And since disk space was at a premium, computer operators would store a program and the data it used on tape. That way, a job could be moved off the computer for temporary storage and reloaded when it needed to be run again. Monthly or quarterly financial applications were good examples. Companies would run these jobs and then pull the program and its associated data set off for storage until the next month.
StorageTek is widely credited with inventing the first tape library as a way to handle tapes more efficiently and in a manner that was less prone to error. Its PowderHorn model 9310 was a room-sized "silo" that held over 5,000 tape cartridges. The original drives developed for these use cases were designed for maximum performance rather than high capacity as backup tapes are today. Many were dual-reel formats that allowed tapes to be loaded in the middle instead of being rewound before ejection.
An open-source program originally developed by IBM and now supported by the Storage Networking Industry Association, LTFS runs on Windows operating systems to provide rudimentary file access and handling functions to tape as a simple utility. LTFS can also be incorporated into applications, much like tape drivers. It creates an index on available partitions to make file searching more efficient and tape-based data more independent of software programs.
That said, LTO is still a serial recording medium. Data at the end of a cartridge is only accessible after the entire length has been run past the tape heads. Also, data is only appended to the end of the tape. The recording space freed up by deleted blocks can't be reused until the entire tape has been reformatted.
Large-capacity disk drives, data deduplication and disk-based technologies like snapshots and clones have, meanwhile, made LTO tape more or less obsolete as a direct backup medium. Moreover, the ability to capture a virtual server instance in a few discrete files and then replicate them over high-bandwidth connections has provided options to replace the off-site transportation of tape cartridges for DR.
Nonetheless, tape is still the recording medium of choice in many archive use cases, often as the most economical tier of storage due to its low operational costs over the long term (based on power and data center floor space). But use cases like public and private clouds require faster data access, something that object storage systems provide (See: "Tape in the age of object storage").
Tape in the age of object storage
The challenges of large file-based data storage, the kind required by cloud implementations, are cost per terabyte and access times. Tape provides cost-effective capacity, sure, but as a serial recording medium, it can't deliver the access speeds required by large file-based archives. Traditional NAS performance, meanwhile, can get bogged down as systems scale due to their hierarchical data organization, and may not be cost-effective -- particularly when they're required to store multiple copies for data protection.
With its flat architecture, object storage -- by contrast -- offers acceptable access latency in these extreme capacity environments and, with forward error correction and data dispersion, can provide data protection without generating multiple copies. Since it doesn't use traditional RAID, object storage can also avoid the long rebuild times associated with high-capacity disk drive failures.
Object storage is becoming the go-to technology for a high-capacity, disk-based tier, and in many cases, like the cloud, is supplanting tape altogether. But in other use cases, object storage is being used in conjunction with tape to provide an archive with lower access latency and lower cost.
Where tape is used today
Tape's portability and economics have made it the standard recording format for capturing and storing digital video files created during the making of a motion picture.
The film industry began with reel-based recording technologies. And although cameras are now all digital, studios routinely convert raw recordings to tape (usually LTO). Tapes are used because they're portable, making it easy to send a "golden copy" back to a secure location and to store them in a vault. Tape capacity is also cheap enough to enable studios to create a copy after each step in the post-production process and save the work invested in the project up to that point.
LTFS, in the meantime, has enhanced LTO's utility to the movie industry by making it an easy way to share files as essentially a "medium of exchange." Production work is routinely outsourced to artists and technologists, who enhance quality, add special effects, do format conversion and so on. LTFS makes this file transfer much simpler by providing a standard file system to replace proprietary software originally required by LTO.
As digital video formats evolve, resolution and file sizes have increased, driving up the capacities needed to store all versions of motion picture segments created during the interim steps in the production process. Tape, already embedded in the industry, continues to be a cost-effective storage medium (now a tier below object storage systems) and a long-term archive for finished productions.
Like media and entertainment, the oil and gas industry has used tape for years to capture, transport and store valuable data -- in this case, seismic images typically in a Society of Exploration Geophysicists recording format. Because oil exploration occurs outside the data center, tape is a good medium for transporting data back from the field. LTO tape also provides an economical format for storing these large data sets and, with LTFS, supports file sharing with outside contractors or other groups in the production chain.
Tape's intrinsic advantages for long-term archival, low cost per terabyte and high data density make it a good storage offering for other industries that struggle with the costs of storing large files and handling growing data sets. Genomic sequencing, medical research, biotechnology and high-performance computing, as examples, generate extremely large amounts of data and the results must be available for review, long-term. Tape provides an economical platform for storing these data sets and, when paired with an object storage system, can deliver a cost-effective, long-term archive with better data access.
Where tape is not used
Similar to the oil and gas industry -- where data sets are processed, saved, reprocessed in the future and saved again -- big data analytics would seem to be a good use case for tape. However, these sophisticated analyses typically involve large numbers of smaller files, not the large files created by video or scientific imaging processes. Handling that kind of data is better suited to disk than tape. And, unlike the film or energy industries, most enterprises don't have a requirement to transport large data sets from remote locations. For these reasons, companies are turning to object storage technologies to archive big data onsite or in the cloud.
Large public clouds like Amazon and Google have some of the most extensive archives in the world, but don't use tape either. YouTube is reported to create more than 100 petabytes of new data each year, but does so on hard disk storage because it needs faster, random access to support its user experience. Even Amazon's Glacier, which offers a tape-like time to data of several hours, is widely purported to use offline disk drives or even optical storage. Facebook, with a requirement to store a third copy of users' files made news a few years ago by developing a Blu-ray optical jukebox instead of using tape. (Fujifilm, by contrast, does leverage tape [LTFS] as the main storage medium for Dternity Media Cloud, an off-site data protection and archiving service. The company goes so far as to call Dternity Media Cloud a "tape storage as a service.")
Pairing flash with tape would seem to make sense given NAND flash's success in replacing disk drives. But tape, as a capacity tier, doesn't offer enough of an advantage to replace object storage systems today, and can be a disadvantage in latency-sensitive use cases. Also, given the continued advancements in flash capacity (as of this writing, up to 15 TB SSDs), solid-state storage could eventually replace high-capacity HDDs as well.
For now, tape enjoys a prominent position as a low-cost archive tier in industries that have very large data sets comprised of large files, long retention requirements, a need to transport this data between physical locations or all of the above. In many cases, tape is being paired with object storage to address the need for lower latency file access and, in some cases, it is being replaced by object storage altogether. Given its widespread use in some industries, it's hard to see tape going away any time soon. A more likely scenario is for those use cases to more or less continue, but for newer applications to be deployed on object storage systems using disk and flash.
About the author:
Eric Slack is a senior analyst with Evaluator Group, where he focuses on scale-out architectures, virtual SAN, software-defined storage and hyper-convergence, as well as traditional storage and data protection.
How secure is tape archiving?
Tape's role in active archive operations
The importance of data archiving
Important tape archiving tips for a better storage strategy