Compliance with the European Union's General Data Protection Regulation isn't optional. Noncompliance could be costly and possibly disastrous. Find out what you need to know.
On May 25, 2018, the European Union's General Data Protection Regulation, or GDPR, goes into effect, and it's a big deal. This stringent EU regulation for protecting residents' personal data and privacy is likely to change the way data is processed, stored, protected and archived.
It's no small matter to comply with GDPR requirements. And noncompliance could put organizations out of business. The best place to begin is to understand the significant differences between GDPR and laws already on the books (see "GDPR vs. existing regs").
The new regulation specifies roles, processes and technologies required to make EU residents' personal data secure, accessible, used appropriately and documented with consent. Any organization that supplies goods or services to EU residents, or simply collects data on them, regardless of where that data is used or stored, must comply. Compliance isn't optional. Don't make the mistake of assuming that information systems and processes compliant with current regulations are GDPR compliant. They probably aren't.
The challenge
GDPR has 87 pages of detailed requirements and 99 articles of specifications. It will be difficult to comply with GDPR requirements and obligations, in some cases extraordinarily so. While IT professionals should be familiar with all the requirements, some are more important because they may require adding or adjusting processes and systems.
What follows are the GDPR articles that IT pros should be particularly concerned about:
Article 6: Managing consent. Before an organization collects personal data on an EU resident, the resident must consent to that collection. The consent must be documented, stored, protected and easily producible. In addition, personal data collection must have a defined use case. Documentation of consent has to be provided to EU regulatory authorities on demand.
Article 25: Data protection by design and default. Data protection must be built into processes and tools that collect personal data.
Article 25: Data minimization. The amount of personal data collected and stored as well as timeframes for keeping that data must be minimized. Retention should be balanced with other record-keeping regulations, such as for health and criminal records that demand longer retention periods.
Articles 25 and 32: State of the art. This article necessitates future-proofing IT systems and processes against ongoing technological advancements. The latest technologies must be deployed to comply with GDPR requirements or justification must be provided as to why they weren't deployed based on cost, risk and context. Adherence must be regularly reviewed.
Article 32: Security of processing. Security must be appropriate to risk level. It should include the use of pseudonyms, aliases and encryption and ensure ongoing confidentiality, integrity, availability and resilience of processing systems and services. It also requires the ability to restore availability and access to personal data in a "timely manner," meaning hours to days, not weeks to months, following an outage or failure. A documented process for regularly testing and assessing effectiveness is required.
Articles 44 to 50: Data transfers. Primarily aimed at cloud service providers, these articles mandate that personal data captured during data transfers be stored in EU countries or countries such as Canada that have similar data privacy protection policies. When personal data is stored or transferred to countries such as the U.S., binding corporate rules that match GDPR requirements must be in place.
Article 17: The right to be erased. More commonly known as the "right to be forgotten," this requirement will likely cause IT the most compliance heartburn. It requires personal data be completely deleted in each of the following circumstances:
an EU resident's request;
personal data collection purpose is no longer necessary; and
withdrawal of an EU resident's consent.
Right to be erased demands erasure in a timely manner. This again means hours to days, not weeks to months. All copies of data, including backups, archives, DevOps, test-dev, mirrors and snapshots, must be erased. Article 17 will raise huge technological and process problems in areas such as unstructured data storage, databases, data protection processes, archives, mobile and cloud service providers. Deeper dives into each of these areas reveals why.
Unstructured data
Unstructured data is typically stored in multiple storage silos such as file servers, NAS, object storage, endpoints and cloud storage, but most organizations have limited visibility into that unstructured data. Among other things, they don't know what's in it, when it was created, who created it, when it was last opened or if it was copied, and the ability to search is limited. The metadata likely doesn't show what personal data is included in the file. Some storage systems let users add custom metadata, which will become a GDPR requirement for finding personal data in files or objects. In addition, searches must be done on every storage silo, a labor- and time-intensive process.
Solving this problem means implementing a global namespace and view over all unstructured storage with the ability to place custom metadata on every file and object that contains personal data. Files and objects with personal data can be tagged, searched, altered and erased. Alternatively, unstructured storage silos can be consolidated into a single storage system with a global namespace that allows for custom metadata. These approaches will do little to solve endpoint issues, such as the C drive or file sync-and-share cloud implementations.
Databases
Personal data is frequently part of a database, such as a customer relationship management (CRM) or e-commerce database. Finding and deleting data from a database is generally straightforward. But what happens when personal data exists in multiple databases? Databases get replicated for various reasons, including scalability, read-only applications, partner databases, DevOps, test-dev, data protection, and disaster recovery and business continuity (DR/BC). Every copy of an EU resident's personal data must be erased upon request. That means each database much go through a separate purging process -- not a trivial effort. Systems that have been built for decades to preserve data now must be able to seek out and erase that data.
On May 25, every organization will either be GDPR compliant or subject to outrageous fines. There is no in-between.
Database backups exacerbate the problem. Most databases have unique methodologies for backing up data. They generally have to be quiesced to make sure the database is recoverable without corruption or missing data, usually using a variation of image backup with changed block tracking (CBT). This process creates a backup file that can't be mounted directly. It must be recovered and restored for the database to search or delete data in the backup copy. Recovering and restoring database backups is time-consuming, taking minutes to hours for each one, depending on the size of the database and soundness of the backup. After the recovery and restoration, the EU resident's personal data is erased, the erasure documented and the database returned to a backed-up condition.
The question then becomes why can't this erasure process be completed on the master backup and propagated to each virtual copy? There are two reasons this doesn't work. The first is each CBT or incremental backup creates a virtual full volume image based on previous images. Erasing data from the master or golden image creates a void in the virtual image, as it points to data that no longer exists and results in a corrupt backup. The other reason is personal data changes over time. Erasing the data in an early backup won't erase changed data in more recent ones. Both situations require recovering and restoring each database backup generation and deleting personal data from each one.
Erasing personal data in a single backup copy is annoying, but doable. Unfortunately, there are many backup copies, and likely are far more than a single database backup, as database backups are generated daily, at a minimum, and retained from 30 days to years. That's a lot of backups to recover and restore in order to remove one person's personal information. Repeating this process for every request is an overwhelming task.
Solving this problem requires discipline as well as new processes and systems. It likely means keeping fewer database backup generations and rolling older generations more frequently into a searchable, consolidated archive. It can also mean consolidating personal data databases to simplify erasures and implementing a content data management system, connecting a database or databases to unstructured file or object storage in-band, out-of-band or in combination. Content data management continuously replicates every write while propagating erasures across all generations.
Data protection
Image-based data protection, such as storage snapshots, is the most popular type of data protection, and it's excellent for fast recoveries, especially for virtual machines. The problem with GDPR requirements is similar to database application-specific backups. Each incremental backup or CBT creates a virtual backup image pointing to unique data in the master or golden image. Erasing data from the master can corrupt the pointers in all subsequent virtual full backups. So each backup generation must have the personal data erased.
Erasing personal data requires mounting or recovering each backup generation, depending on the technology. Mounting backups is faster and not as arduous, but erasures are still laborious and escalate with the number of backup generations kept. The data has to be found in each mount, erased and the backup restored to its idle state. This painstaking task takes a lot of time, especially when dealing with multiple erasure requests.
Replications are another Article 17 concern. It's common to replicate data for DR/BC purposes. However, erasing it at the primary site won't necessarily remove it from the DR/BC site.
Solving this problem requires keeping fewer generations and rolling up previous backups into a searchable, consolidated archive. Some backup systems use file backup instead of image. File backup lets files and data be found and erased one time and then propagates that to all generations without having to recover every backup generation. Some content data management systems can do this.
The use of tape is another data protection issue because erasing personal data from tape takes a lot of time. Using linear tape file system (LTFS) tape libraries with a NAS or object storage front end can solve this problem. The front end acts as a cache, and data removed from the cache gets erased when the tapes are compacted.
Archives
Archives live on less-expensive storage, including object, scale-out software-defined file and cloud storage, as well as on cold storage, such as tape and optical. Compliance with Article 17 requires that the storage system be searchable. It also must accept customizable metadata so personal data can be tagged and easily found. Many archives don't allow customizable metadata. Compounding the issue, reading or altering data in the archive may require bringing it back to the storage system from where it originated. Moving it to the archive via hierarchical storage management requires leaving a stub in the original storage. Deleting the stub doesn't delete the data; it still exists in the archive, and the process doesn't meet GDPR Article 17 compliance. Here, too, tape can compound the problem of personal data erasure.
These problems can be solved by implementing an archive with the aforementioned search and metadata characteristics using stub-free archival data movement. If tape is the archive target, then use LTFS tape libraries front-ended with either NAS or object storage.
Mobile
Many mobile business apps collect personal data, for instance, for insurance adjusters, financial planners, marketers and CRM. They typically upload files to the cloud, corporate data centers and archives. Erasing personal data in these apps requires programmatic adjustments to the apps. And those adjustments must make sure personal data erasures propagate to all copies.
Cloud service providers
Using a cloud service provider doesn't change an organization's GDPR responsibilities and liabilities. It's ultimately the responsibility of the organization collecting EU residents' personal data to be compliant whether their IT is on premises or in the cloud. That obligation can't be contractually removed or amended. That means the cloud service client is liable and must ensure the provider meets all GDPR requirements.
GDPR noncompliance
Failure to comply with GDPR will have unprecedented negative financial consequences. There are two penalty levels: For minor infractions, the penalty is a nontrivial one of up to 10 million euros -- currently more than $12 million -- or 2% of worldwide revenue during the preceding financial year, whichever's greater. An example of a minor infraction is the inability to document and prove an EU resident's data has been erased or deleted as requested. For major infractions, the penalty is up to 20 million euros or 4% of worldwide revenue during the preceding financial year, whichever's greater. An example of a major infraction is not erasing or deleting an EU resident's data (see "GDPR fines and penalties").
These fines could easily put a company out of business. Noncompliance isn't a financially rational option.
No in-between
May 25, 2018, is rapidly approaching. Meeting GDPR compliance requirements isn't optional for organizations that collect any personal data on EU residents. Failure to comply with GDPR could be financially ruinous. Every IT process that handles personal data for GDPR compliance must be evaluated. If a process isn't compliant, you must modify or replace it to be compliant -- even for processes that may technically be compliant but not practically so. For instance, if it takes months to delete all copies of a resident's personal data, GDPR deems it noncompliant because it isn't timely.
Time is of the essence. There's no GDPR ramp up or grace period. On May 25, every organization will either be GDPR compliant or subject to outrageous fines. There is no in-between.