Information security professionals who think they've got it all under control may be surprised to find that some...
parts of the EU's new General Data Protection Regulation, like the GDPR's right to be forgotten, may be more challenging than they expect.
Under the regulation, set to begin enforcement on May 25, the right to be forgotten under the GDPR means more than just deleting all of a data subject's data after they submit the request. Organizations have the option to either delete all the relevant consumer data or use anonymization techniques to skim identifying data from their databases and leave the non-identifying data for their own use for marketing or research.
While anonymization may seem to be preferable for organizations that are not willing to literally purge the data subject's data from their databases, it may not be enough to satisfy the intent of the GDPR right to be forgotten.
Marc French, senior vice president, chief trust officer and data protection officer for GDPR compliance at Mimecast UK Ltd., a cloud email security company headquartered in Lexington, Mass., explained how one of the approaches to forgetting data subjects who request it -- anonymization of all data -- is not quite the panacea that some make it out to be.
While, in some cases, it might be enough, anonymization must take into account all of the pieces of data that are still accessible, and, potentially, capable of identifying an individual without disclosing their name, phone number or other data that are considered classically identifiable.
SearchSecurity asked French about the GDPR right to be forgotten and deleting personal data versus anonymizing it. Here is his answer:
Marc French: Every data subject that's in the European Union, anybody that we're storing data for, has a right to be forgotten, which effectively means the purging of the data that I've collected on you within the systems that I store it in. Now, there's a couple of interesting nuances to this, and I'll use the word purge.
There's been a lot of discussion about what the word purge actually means, and there's two trains of thought that are running down the track right now.
One is anonymization can equal purge, in that if I, as Marc French, I reach out to Acme Corporation, [and] you've tracked me because you're a search engine, as an example. Can I anonymize Marc French when requested so that I still get to keep the data for my own purposes, in that I need to understand it to drive my ad revenue, and the mere fact that I anonymize it is sufficient to purge, or do I actually have to physically delete the data out of the environment?
What's happening right now is, if you look at the Google train that's running in Europe today, I think they would attest that they like the anonymization piece. I think the challenge that you have with anonymization is, can you truly anonymize the data sufficiently that it can't be traced back to the data subject?
It's kind of hard; if I'm searching from where I live here in Massachusetts for a very esoteric function, you might be able to glean enough data to tie it back to me. I think the anonymization argument fails in that you can buy so many data sets and interconnect so many things that, if you take the anonymization route, it's probably not going to end well for you because, I think, ultimately, you'll be able to tie it back together.
So that leaves us with purge. Now, purge is based on the foundation that you actually know all the data that you're capturing for somebody. If you don't have that foundational piece, you'll never really have the ability to remove it, and I think most companies struggle with the act of removal. So how do I actually delete it?
The act of deletion is a very difficult thing to do. You're caught between a rock and this hard place where I think anonymization fails because you can ultimately tie it back, but you have this technical challenge of deletion. So, right now, I think the state of the art is [to] delete with the hope that we're going to get more guidance on this anonymization thing going forward to overcome some of the technical hurdles that I think deletion will bring on us.
Marc FrenchMimecast senior vice president, chief trust officer and data protection officer for GDPR compliance
That's the fundamental problem with anonymization. The Census Department here in the U.S. struggles with this all the time as they try to create data sets that are abstracted, and there's a whole science around the statistics for this.
Given two or three elements of your life outside your name, you could probably discern who you are with some level of certainty. So, if I gave you the fact ... I live in Sturbridge, Mass. If I gave you that, along with my job title, you could easily figure out who I am based on the fact that my population count is only about eight thousand.
It's not just date of birth and zip code. I think there are other data elements you could aggregate together which you can freely get today. A lot of this you can just find on LinkedIn, or even here [in Mass.] the municipalities publish your real estate records. With all of that together, you could easily discern who that person is and, hence, the reason why anonymization is going to be extremely difficult to prove.