Data archiving is the process of moving data that is no longer actively used to a separate storage device for long-term retention. Archive data consists of older data that remains important to the organization or must be retained for future reference or regulatory compliance reasons. Data archives are indexed and have search capabilities, so files can be located and retrieved.
Archived data is stored on a lower-cost tier of storage, serving as a way to reduce primary storage consumption and related costs. An important aspect of a business's data archiving strategy is to inventory its data and identify what data is a candidate for archiving.
Some archive systems treat archive data as read-only to protect it from modification, while other data archiving products enable writes, as well as reads. For example, WORM (write once, read many) technology uses media that is not rewritable.
Data archiving is most suitable for data that must be retained due to operational or regulatory requirements, such as document files, email messages and possibly old database records.
Data archiving benefits
The greatest benefit of archiving data is it reduces the cost of primary storage. Primary storage is typically expensive, because a storage array must produce a sufficient level of IOPS to meet operational requirements for user read/write activity. In contrast, archive storage costs less, because it is typically based on a low-performance, high-capacity storage medium. Data archives can be stored on low-cost hard disk drives (HDDs), tape or optical storage that is generally slower than performance disk or flash drives.
Archive storage also reduces the volume of data that must be backed up. Removing infrequently accessed data from the backup data set improves backup and restore performance. Typically, data deduplication is performed on data being moved to a lower storage tier, which reduces the overall storage footprint and lowers secondary storage costs.
Data archiving vs. backup
Data archives are not to be confused with data backups, which are copies of data. Although both are considered secondary storage and use a lower-performance, higher-capacity storage medium than primary storage, they serve different purposes. Archives fill a data retention purpose, whereas backups are used for data protection and disaster recovery.
Data archives can be thought of as a data repository for infrequently accessed, but still readily available data. Backups, on the other hand, are part of a data recovery mechanism that can be used to restore data in the event it is corrupted or destroyed. Backup data often consists of important information that must be restored quickly when lost or deleted.
Online vs. offline data storage
Data archives take a number of different forms. Some systems make use of online data storage, which places archive data onto disk systems where it is readily accessible. Archives are frequently file-based, but object storage is growing in popularity.
Other archival systems use offline data storage in which archive data is written to tape or other removable media using data archiving software, rather than being kept online. Because tape can be removed, tape-based archives consume far less power than disk systems. This translates to lower archive storage costs.
Cloud storage is another possible archive target. Amazon Glacier, for example, is designed for data archiving. This method is inexpensive, but requires an ongoing investment. In addition, costs can grow over time as more data is added to the storage cloud. Cloud providers usually store archived data on tape or slower, high-capacity hard disk drives.
Data archiving and data lifecycle management
The archival process is almost always automated using archiving software. The capabilities of such software vary from one vendor to the next, but most archiving software automatically moves aging data to the archives according to a data archival policy set by the storage administrator. This policy may also include specific retention requirements for each type of data.
Some archiving software will automatically purge data from the archives once it has exceeded the life span mandated by the organization's data retention policy. Many backup software and data management platforms have added archiving functionality to their products. Depending on your needs, this can be a cost-effective and efficient way to archive data. However, these products may not include all of the functionality found in a dedicated archive software product.
Some businesses are required to retain data for certain lengths of time due to regulatory compliance. Whether mandated by industry regulations or government legislation, staying within compliance guidelines is a prevalent business concern. Penalties for violating compliance can include payments for damages, fines and voided contracts.
Data archiving helps businesses meet compliance both by storing data long term and by consolidating data for easy access in case of an audit. The rules dictating the length for which data must be retained, where it can be stored and who has access to it varies by industry and the type of data businesses in that industry generate.
Some examples of regulations with which organizations may need to be in compliance include the Sarbanes-Oxley Act (SOX), Health Insurance Portability and Accountability Act (HIPAA) and General Data Protection Regulation (GDPR).