Create a data archiving process for your growing data sets


5 steps to creating a strong data archiving policy

There's a lot to consider in your data archive plan. From compliance to data integrity and retention, following these best practices will improve your data protection.

Almost every storage manager faces the ongoing issue of accommodating and storing ever-expanding data sets. Primary...

storage tends to be expensive and has a finite capacity, so a majority of organizations move older data to an archive. This practice helps to free up space on an organization's primary storage and to make room for new data.

On the surface, the concept of archiving data is simple. In practice, it often proves to be quite challenging. Careful planning is required before moving the first bits of data, so follow these five best practices to create a comprehensive data archiving policy.

1. Identify and sort the data to be archived

The first step is to determine which data should be archived. As a general rule, this means archiving static data that hasn't been modified in a while, perhaps several months. Some organizations start this process by looking at the date on which the data was last accessed.

But there are a number of other considerations that you must take into account for your data archiving policy, such as data type. For example, you'll likely find you need one method of archiving file server data, but a completely different one for unstructured data in SQL Server.

Unfortunately, there's no such thing as a universal archiving method that handles all data types in an equal manner. Moving file data is easy, but you usually can't get away with archiving an entire database table because an application likely requires the table. Instead, database archiving involves moving old data out of a table and into an archive database table.

2. Consider how data lifecycle management factors into a data archival plan

Another data archiving best practice is evaluating your overall data lifecycle management. Suppose you decide to archive data that hasn't been modified or accessed in three years. That decision leads to a number of other questions related to the data management. For example, should all the data that meets the three-year criteria be archived, or can some types of data simply be deleted rather than archived? Likewise, will data remain in your archives forever or will the data be purged at some point?

You must have specific plans that address the exact circumstances under which data should be archived, as well as a plan for what will eventually happen to archived data. Many companies assume that having a data archiving policy means they have a deletion policy; they eventually wind up wishing they had spelled out the specifics of deletion and archival.

Also, remember that backup and archive are not the same data protection process. While they have similar elements, backup is typically for data that an organization might need to access quickly, while archives are rarely touched.

Chart comparing backup and archive

3. Account for regulatory compliance

Regulatory compliance is also critical. Not every organization is subject to federal regulatory requirements surrounding data retention policy, but those that are can face severe penalties if they fail to properly retain required data. Multinational companies also must be aware of varying regulatory policies.

Your data archiving policy must be mindful of newer regulations. For example, to achieve GDPR compliance, you must know the rules regarding data storage, as they could affect your archives.

Because administrators can be subject to both civil and criminal charges for failing to properly archive data, some archive far more data than is required by law and retain those archives forever.

The problem with this approach is that it sometimes does more harm than good. Federal regulations require certain data to be retained so that it can be analyzed in the event an organization is accused of wrongdoing. Many litigation experts who represent companies undergoing e-discovery requests say that preserving data beyond what is required by law can lead to trouble. For starters, it often means more money is spent sifting through more data. In addition, more data can mean more vulnerability.

4. Establish your archiving criteria

There are a number of data archiving products available, ranging from backup applications with built-in archival capabilities to full-blown, dedicated archival data management applications. Regardless of the product you select, there are several key criteria to consider.

Search is the first essential capability. The e-discovery process typically involves examining huge amounts of archived data. An efficient search engine saves time. The software's search engine should be flexible enough that it allows you to perform granular searches based on the following:

  • data type;
  • data sources;
  • document author;
  • key pieces of data; and
  • data that matches a specific data structure rather than a specific piece of data, such as any data containing a Social Security number, rather than a specific Social Security number.

Audit tracking is an important feature. For reasons related to litigation holds and e-discovery, an audit trail can tell you which custodian has accessed the archives, when they were accessed and what specific data was accessed.

You should pick a data archive product that supports as many data platforms as possible. While there's no universal archive product, there are archival products on the market that are designed to work with a number of popular applications and platforms. Some of these even include the ability to archive social networking data, such as the contents of an organization's Facebook page.

Many companies assume that having a data archiving policy means they have a deletion policy; they eventually wind up wishing they had spelled out the specifics of deletion and archival.

A good data deduplication engine is an essential feature. Archives, by their very nature, can grow to be quite large. Fortunately, almost every modern archiving product supports deduplication.

Your archival product should be flexible concerning data sources and data targets. Just because an organization archives to tape today, it doesn't mean it will still do that tomorrow. A good archival product should allow you to write archives to disk, tape, the cloud or any other medium.

Similarly, diverse media should be supported for archive retrieval. When data is extracted from the archives, you might want to write that data to tape or some other medium.

Finally, the archival software should provide automation capabilities. You don't want to manually move data into or out of archives. A good archival product should adapt easily to your data archiving policy. The automation process ensures data is always archived according to policy and that nothing slips between the cracks. The software should also create a detailed log of the archive process.

5. Develop your data archiving policy

Once you have a clear idea about what data you want to archive, the next step is to finalize your comprehensive data archiving policy. This is a formalized set of procedures dictating the rules for the archival process. The archive policy should contain elements such as:

  • the criteria for archiving data;
  • the mechanisms that will be used to facilitate the archival process;
  • the type of media that will be used to store archived data;
  • the duration for which data will remain in the archive; and
  • rules for who may access the archives and under what circumstances.

Another important consideration for your data archiving policy is protecting the archive's integrity. This concept has two separate aspects. First, the archives must be protected against tampering. They must be secure enough that an end user can't make modifications to archived data as a way of covering up unethical behavior.

The other data protection aspect is guarding archived data against loss. For example, if an organization moved all of its data from 2015 to a tape-based archive and then the tape became demagnetized, all the archive data from that year would be gone.

To protect your archives against data loss, you should have multiple copies of the archived data. Cloud storage gateways help. A gateway appliance can store an on-premises copy of the archives, while also replicating the archives to the cloud.

When it comes to securing archive data, your approach depends on the level of access that users will need. At the very least, archived data must be encrypted at the storage volume level, and the data must be read-only to prevent tampering. Many organizations store archived data on storage servers or on cloud storage, isolated from the rest of the production network. This isolation provides another level of security.

Regardless of how you choose to store your archives, an auditing mechanism should protect them. Auditing can alert you anytime someone accesses or attempts to access the archives. If your archives are ever called into question, your audit logs can help you prove the archived data is authentic and that the data hasn't been altered.

While the concept of archiving seldom-accessed data is simple, putting that concept into practice can be a big undertaking. Having a clear and well-documented data archiving policy can make the process much smoother.

Next Steps

What you should include in your data archiving plan

Dig Deeper on Archiving and tape backup

Disaster Recovery