Sikov - stock.adobe.com
A hot-button topic in tech today is privacy, especially as companies collect more and more sensitive data and then inevitably suffer devastating data breaches.
Privacy is the right of individuals to control or influence how their information may be collected, used and stored, as well as by whom and how that information may be disclosed. What data people provide should not be traced back to them directly or from statistical outputs. This last requirement makes it hard for enterprises to collect and analyze user data for behavioral insights, to improve decision-making processes and to measure the performance of products, clinical trials or ad campaigns, especially as this data is often shared with third parties.
To continue using this data, be compliant with data privacy and protection regulations, such as CCPA and GDPR, and avoid noncompliance fines, organizations are turning to privacy-enhancing technologies (PETs). PETs ensure personal or sensitive information stays private throughout its lifecycle. This tech covers a broad range of technologies designed to support privacy and data protection principles, while maintaining the ability to extract value from user-supplied data. Most PETs do so by using cryptography and statistical techniques to obfuscate sensitive data or reduce the amount of real data processed.
Let's take a look at some of the most common cryptographic and statistical PETs and their use cases.
Cryptographic privacy-enhancing technologies
Differential privacy adds calculated noise to a data set, so group patterns within the data set can still be identified, while maintaining the anonymity of individuals. This enables large data sets to be released for public research. Differential privacy is also used by tech companies to analyze and draw insights from large amounts of user data.
Homomorphic encryption enables computational operations on encrypted data. The results of any analysis remain encrypted, and only the data owner can decrypt and view them. This encryption method enables companies to analyze encrypted data in cloud storage or share sensitive data with third parties. Google has released open source libraries and tools to perform fully homomorphic encryption operations on an encrypted data set.
Secure multiparty computation (SMPC)
SMPC is a subfield of homomorphic encryption that distributes computation across systems and multiple encrypted data sources. This technique ensures no party can see the entire data set and limits the information any party can acquire. OpenMined uses SMPC in its PyGrid peer-to-peer platform for private data science and federated learning.
Zero-knowledge proof (ZKP)
ZKP is a set of cryptographic algorithms that enable information to be validated without revealing data that proves it. It plays a crucial role in identity authentication. An individual's age, for example, can be authenticated with ZKP without disclosing their actual date of birth.
Statistical privacy-enhancing technologies
Federated learning is a machine learning technique that enables individual devices or systems to collaboratively learn a shared prediction model, while keeping data stored locally. A mobile phone, for example, downloads the current model, improves it by learning from data on the phone and uploads only its summarized changes to the centralized model. From there, the changes are averaged with other device updates to improve the shared model.
Multiple entities can build smarter machine learning models without sharing data using federated learning. It also reduces the amount of data that must be stored on a centralized server or in cloud storage. Google uses federated learning in Gboard on Android to suggest improvements to the next iteration of Gboard's query suggestion model.
Generative adversarial networks (GANs)
GANs generate new, synthetic instances of data that mimic a real data set. This method provides analysists, researchers and machine learning systems with large amounts of high-quality synthetic data. GANs' ability to recognize complex patterns within data is being used to quickly find anomalies in medical tests and network traffic.
Various methods, including pseudonymization, obfuscation and data masking, can be used to replace or obscure sensitive information by interchanging sensitive data with fictitious, distracting or misleading data. This is a common practice used by businesses to protect users' sensitive data and comply with privacy laws. Certain anonymization techniques, such as removing columns containing personally identifiable information (PII) or masking data, can be susceptible to reidentification.
Users' actions are analyzed on their devices to identify patterns without sending individual data to a remote server. On-device learning can be used to make algorithms smarter, such as autocorrect. Apple's Face ID uses on-device learning to gather data on the different ways a user's face may look so its identification methodology is more accurate and secure.
Synthetic data generation (SDG)
SDG is artificially created data from a raw data set that has the same statistical characteristics. As SDG data sets can be far larger than the original data set, this technique is used in test environments, as well as in AI and machine learning use cases, to reduce data sharing and the amount of real data required.
Get access to PII but keep it secure with PETs
Ensuring personal data is securely stored with encryption and strong access controls is essential to maintain the privacy and confidentiality of users' data.
PETs are one way to share and analyze data by multiple parties. This has huge potential benefits for users, organizations and society as accessibility and availability of high-quality data is the first step in innovation. The U.K.'s Centre for Data Ethics and Innovation published an Adoption Guide for PETs intended to help organizations consider how PETs could unlock opportunities for data-driven innovation.
PETs are already used in different areas, such as application and system testing, particularly in the fields of IoT, financial transactions and healthcare services. The European Data Protection Board, which oversees the enforcement of GDPR, and the European Union Agency for Cybersecurity have published technical guidance supporting SMPC as a valid privacy-preserving safeguard, with healthcare and cybersecurity use cases. This could revolutionize the world of medical research as the World Economic Forum estimated hospitals produce 50 petabytes of data per year, yet 97% is never used.