Sergey Nivens - Fotolia
- Marc Staimer, Dragon Slayer Consulting
On May 25, 2018, the European Union's General Data Protection Regulation, or GDPR, goes into effect, and it's a big deal. This stringent EU regulation for protecting residents' personal data and privacy is likely to change the way data is processed, stored, protected and archived.
It's no small matter to comply with GDPR requirements. And noncompliance could put organizations out of business. The best place to begin is to understand the significant differences between GDPR and laws already on the books (see "GDPR vs. existing regs").
The new regulation specifies roles, processes and technologies required to make EU residents' personal data secure, accessible, used appropriately and documented with consent. Any organization that supplies goods or services to EU residents, or simply collects data on them, regardless of where that data is used or stored, must comply. Compliance isn't optional. Don't make the mistake of assuming that information systems and processes compliant with current regulations are GDPR compliant. They probably aren't.
GDPR has 87 pages of detailed requirements and 99 articles of specifications. It will be difficult to comply with GDPR requirements and obligations, in some cases extraordinarily so. While IT professionals should be familiar with all the requirements, some are more important because they may require adding or adjusting processes and systems.
GDPR vs. existing regs
Per the EU Office of the Data Protection Supervisor, these are the essential differences between the General Data Protection Regulation and current EU laws and regulations:
- strengthens rights for individuals;
- higher protection levels for children;
- risk-based approach to governance;
- increases documentary evidence;
- mandatory breach reporting;
- accountability for data controllers and data processors;
- legitimate interests processing condition removed for public authorities;
- new and far-reaching areas both geographically and procedurally; and
- costly fines for noncompliance.
What follows are the GDPR articles that IT pros should be particularly concerned about:
Article 6: Managing consent. Before an organization collects personal data on an EU resident, the resident must consent to that collection. The consent must be documented, stored, protected and easily producible. In addition, personal data collection must have a defined use case. Documentation of consent has to be provided to EU regulatory authorities on demand.
Article 25: Data protection by design and default. Data protection must be built into processes and tools that collect personal data.
Article 25: Data minimization. The amount of personal data collected and stored as well as timeframes for keeping that data must be minimized. Retention should be balanced with other record-keeping regulations, such as for health and criminal records that demand longer retention periods.
Articles 25 and 32: State of the art. This article necessitates future-proofing IT systems and processes against ongoing technological advancements. The latest technologies must be deployed to comply with GDPR requirements or justification must be provided as to why they weren't deployed based on cost, risk and context. Adherence must be regularly reviewed.
Article 32: Security of processing. Security must be appropriate to risk level. It should include the use of pseudonyms, aliases and encryption and ensure ongoing confidentiality, integrity, availability and resilience of processing systems and services. It also requires the ability to restore availability and access to personal data in a "timely manner," meaning hours to days, not weeks to months, following an outage or failure. A documented process for regularly testing and assessing effectiveness is required.
Articles 44 to 50: Data transfers. Primarily aimed at cloud service providers, these articles mandate that personal data captured during data transfers be stored in EU countries or countries such as Canada that have similar data privacy protection policies. When personal data is stored or transferred to countries such as the U.S., binding corporate rules that match GDPR requirements must be in place.
Article 17: The right to be erased. More commonly known as the "right to be forgotten," this requirement will likely cause IT the most compliance heartburn. It requires personal data be completely deleted in each of the following circumstances:
- an EU resident's request;
- personal data collection purpose is no longer necessary; and
- withdrawal of an EU resident's consent.
Right to be erased demands erasure in a timely manner. This again means hours to days, not weeks to months. All copies of data, including backups, archives, DevOps, test-dev, mirrors and snapshots, must be erased. Article 17 will raise huge technological and process problems in areas such as unstructured data storage, databases, data protection processes, archives, mobile and cloud service providers. Deeper dives into each of these areas reveals why.
Unstructured data is typically stored in multiple storage silos such as file servers, NAS, object storage, endpoints and cloud storage, but most organizations have limited visibility into that unstructured data. Among other things, they don't know what's in it, when it was created, who created it, when it was last opened or if it was copied, and the ability to search is limited. The metadata likely doesn't show what personal data is included in the file. Some storage systems let users add custom metadata, which will become a GDPR requirement for finding personal data in files or objects. In addition, searches must be done on every storage silo, a labor- and time-intensive process.
Solving this problem means implementing a global namespace and view over all unstructured storage with the ability to place custom metadata on every file and object that contains personal data. Files and objects with personal data can be tagged, searched, altered and erased. Alternatively, unstructured storage silos can be consolidated into a single storage system with a global namespace that allows for custom metadata. These approaches will do little to solve endpoint issues, such as the C drive or file sync-and-share cloud implementations.
Personal data is frequently part of a database, such as a customer relationship management (CRM) or e-commerce database. Finding and deleting data from a database is generally straightforward. But what happens when personal data exists in multiple databases? Databases get replicated for various reasons, including scalability, read-only applications, partner databases, DevOps, test-dev, data protection, and disaster recovery and business continuity (DR/BC). Every copy of an EU resident's personal data must be erased upon request. That means each database much go through a separate purging process -- not a trivial effort. Systems that have been built for decades to preserve data now must be able to seek out and erase that data.
Database backups exacerbate the problem. Most databases have unique methodologies for backing up data. They generally have to be quiesced to make sure the database is recoverable without corruption or missing data, usually using a variation of image backup with changed block tracking (CBT). This process creates a backup file that can't be mounted directly. It must be recovered and restored for the database to search or delete data in the backup copy. Recovering and restoring database backups is time-consuming, taking minutes to hours for each one, depending on the size of the database and soundness of the backup. After the recovery and restoration, the EU resident's personal data is erased, the erasure documented and the database returned to a backed-up condition.
The question then becomes why can't this erasure process be completed on the master backup and propagated to each virtual copy? There are two reasons this doesn't work. The first is each CBT or incremental backup creates a virtual full volume image based on previous images. Erasing data from the master or golden image creates a void in the virtual image, as it points to data that no longer exists and results in a corrupt backup. The other reason is personal data changes over time. Erasing the data in an early backup won't erase changed data in more recent ones. Both situations require recovering and restoring each database backup generation and deleting personal data from each one.
Erasing personal data in a single backup copy is annoying, but doable. Unfortunately, there are many backup copies, and likely are far more than a single database backup, as database backups are generated daily, at a minimum, and retained from 30 days to years. That's a lot of backups to recover and restore in order to remove one person's personal information. Repeating this process for every request is an overwhelming task.
Solving this problem requires discipline as well as new processes and systems. It likely means keeping fewer database backup generations and rolling older generations more frequently into a searchable, consolidated archive. It can also mean consolidating personal data databases to simplify erasures and implementing a content data management system, connecting a database or databases to unstructured file or object storage in-band, out-of-band or in combination. Content data management continuously replicates every write while propagating erasures across all generations.
Image-based data protection, such as storage snapshots, is the most popular type of data protection, and it's excellent for fast recoveries, especially for virtual machines. The problem with GDPR requirements is similar to database application-specific backups. Each incremental backup or CBT creates a virtual backup image pointing to unique data in the master or golden image. Erasing data from the master can corrupt the pointers in all subsequent virtual full backups. So each backup generation must have the personal data erased.
Erasing personal data requires mounting or recovering each backup generation, depending on the technology. Mounting backups is faster and not as arduous, but erasures are still laborious and escalate with the number of backup generations kept. The data has to be found in each mount, erased and the backup restored to its idle state. This painstaking task takes a lot of time, especially when dealing with multiple erasure requests.
Replications are another Article 17 concern. It's common to replicate data for DR/BC purposes. However, erasing it at the primary site won't necessarily remove it from the DR/BC site.
Solving this problem requires keeping fewer generations and rolling up previous backups into a searchable, consolidated archive. Some backup systems use file backup instead of image. File backup lets files and data be found and erased one time and then propagates that to all generations without having to recover every backup generation. Some content data management systems can do this.
The use of tape is another data protection issue because erasing personal data from tape takes a lot of time. Using linear tape file system (LTFS) tape libraries with a NAS or object storage front end can solve this problem. The front end acts as a cache, and data removed from the cache gets erased when the tapes are compacted.
Archives live on less-expensive storage, including object, scale-out software-defined file and cloud storage, as well as on cold storage, such as tape and optical. Compliance with Article 17 requires that the storage system be searchable. It also must accept customizable metadata so personal data can be tagged and easily found. Many archives don't allow customizable metadata. Compounding the issue, reading or altering data in the archive may require bringing it back to the storage system from where it originated. Moving it to the archive via hierarchical storage management requires leaving a stub in the original storage. Deleting the stub doesn't delete the data; it still exists in the archive, and the process doesn't meet GDPR Article 17 compliance. Here, too, tape can compound the problem of personal data erasure.
These problems can be solved by implementing an archive with the aforementioned search and metadata characteristics using stub-free archival data movement. If tape is the archive target, then use LTFS tape libraries front-ended with either NAS or object storage.
Many mobile business apps collect personal data, for instance, for insurance adjusters, financial planners, marketers and CRM. They typically upload files to the cloud, corporate data centers and archives. Erasing personal data in these apps requires programmatic adjustments to the apps. And those adjustments must make sure personal data erasures propagate to all copies.
Cloud service providers
Using a cloud service provider doesn't change an organization's GDPR responsibilities and liabilities. It's ultimately the responsibility of the organization collecting EU residents' personal data to be compliant whether their IT is on premises or in the cloud. That obligation can't be contractually removed or amended. That means the cloud service client is liable and must ensure the provider meets all GDPR requirements.
Failure to comply with GDPR will have unprecedented negative financial consequences. There are two penalty levels: For minor infractions, the penalty is a nontrivial one of up to 10 million euros -- currently more than $12 million -- or 2% of worldwide revenue during the preceding financial year, whichever's greater. An example of a minor infraction is the inability to document and prove an EU resident's data has been erased or deleted as requested. For major infractions, the penalty is up to 20 million euros or 4% of worldwide revenue during the preceding financial year, whichever's greater. An example of a major infraction is not erasing or deleting an EU resident's data (see "GDPR fines and penalties").
These fines could easily put a company out of business. Noncompliance isn't a financially rational option.
GDPR fines and penalties
The General Data Protection Regulation is a new regulation that will likely go through major changes as implementation shows what will and won't work in the real world. The gears of bureaucratic regulatory change grind slowly, however. In the meantime, the GDPR imposes stiff fines for noncompliance.
EU regulators plan to use the following 10 benchmarks to determine the amount of the fine for GDPR noncompliance:
- Nature of infringement. This includes the number of people affected, damage suffered, infringement duration and processing purpose.
- Intention. This determines whether infringement is intentional or negligent.
- Mitigation. These are actions taken to mitigate damage to data subjects.
- Preventative measures. This looks at how much technical and organizational preparation was previously implemented to prevent noncompliance.
- History. This examines past relevant infringements, which may be interpreted to include infringements under the GDPR's predecessor, the Data Protection Directive, and not just the GDPR, and past administrative corrective actions under GDPR -- from warnings to bans on processing and fines.
- Cooperation. How cooperative the organization has been with the supervisory authority to remedy the infringement will affect the fine.
- Data type. What types of data the infringement affects will also affect the fine.
- Notification. Was the infringement proactively reported to the supervisory authority by the organization itself or by a third party?
- Certification. This takes into consideration whether the organization qualified under approved certifications or adhered to approved conduct codes.
- Other. Other aggravating or mitigating factors may include financial impact on the organization from the infringement.
If an organization infringes on multiple GDPR provisions, it will be fined according to the gravest infringement, as opposed to being separately penalized for each provision. However, this may not offer much relief considering the potential fine amounts.
Organizations will be subject to up 10 million euros or 2% of the worldwide annual revenue of the preceding financial year, whichever is higher, for infringements of the following:
- controllers and processors under Articles 8, 11, 25 to 39, 42 and 43;
- certification body under Articles 42 and 43; and
- monitoring body under Article 41 (4).
Organizations will be subject to up to 20 million euros or 4% of the worldwide annual revenue of the preceding financial year, whichever is higher, for infringements of the following:
- the basic principles for processing, including conditions for consent, under Articles 5, 6, 7 and 9;
- the data subjects' rights under Articles 12 to 22;
- the transfer of personal data to a recipient in a third country or an international organization under Articles 44 to 49;
- any obligations pursuant to member state law adopted under Chapter IX; and
- any noncompliance with an order by a supervisory authority.
May 25, 2018, is rapidly approaching. Meeting GDPR compliance requirements isn't optional for organizations that collect any personal data on EU residents. Failure to comply with GDPR could be financially ruinous. Every IT process that handles personal data for GDPR compliance must be evaluated. If a process isn't compliant, you must modify or replace it to be compliant -- even for processes that may technically be compliant but not practically so. For instance, if it takes months to delete all copies of a resident's personal data, GDPR deems it noncompliant because it isn't timely.
Time is of the essence. There's no GDPR ramp up or grace period. On May 25, every organization will either be GDPR compliant or subject to outrageous fines. There is no in-between.