Tip

Privacy-preserving machine learning assuages infosec fears

Implementing privacy-preserving machine learning controls, such as federated learning and homomorphic encryption, can address top cloud security and privacy concerns. Learn how.

The use of machine learning and AI technologies is rapidly expanding. This growth mirrors the increasing use of cloud service environments where mass-scale computing resources are readily available. Large cloud service providers all offer ready-made machine learning and AI capabilities and tools that make it easier than ever to build more intelligent applications and data mining scenarios.

Of course, this is a concern for security and privacy professionals that must be addressed. Mining data with machine learning and AI requires staggering quantities of data, and some of that data is bound to be sensitive in nature. On top of this, increasing numbers of regulations mandate data privacy measures for cloud services, making privacy-preserving machine learning techniques all the more critical.

AI and machine learning privacy concerns

Much of the concern around privacy with AI and machine learning boils down to two key areas. First, many traditional data protection controls come into play for data that organizations upload into cloud service environments and analyze using compute services and specialized tools. These controls include encryption, transport security, tokenization and obfuscation. Most traditional data storage services in major cloud providers offer some or all of these controls, but this contrasts significantly with specialized AI and machine learning services, such as Amazon SageMaker and Amazon Rekognition, which use AI to extract and analyze images and video; Azure Machine Learning and Azure Cognitive Services; and Google Cloud AI. Not all of these services can use existing encryption key management and usage models and controls that organizations may have deployed.

The second privacy issue focuses on the cloud service providers themselves. Much of the providers' use of data is centered around consumer devices and digital voice assistants, such as Amazon Echo, Amazon Alexa, Apple Siri and Google Assistant. Voice transactions are analyzed to improve product and service offerings, as well as voice assistant accuracy. Aside from services in use, the geographic location of sensitive data used in machine learning and AI operations is a major regulatory and compliance focus, too.

To combat these security and privacy concerns, a new security-focused set of controls collectively known as privacy-preserving machine learning has emerged.

Privacy-preserving machine learning addresses security and privacy concerns

To combat these security and privacy concerns, a new security-focused set of controls collectively known as privacy-preserving machine learning has emerged. The most promising of these types of controls include the following:

  • Federated learning. A federated learning model of machine learning and AI uses distinctly separate nodes that cooperate to train AI algorithms but do not actually share data. Both centralized and peer-to-peer architectures can be set up this way. Google has used this recently with its TensorFlow Federated open source machine learning framework.
  • Differential privacy. This privacy-preserving method selectively shares public information presented while deliberately withholding anything personal or sensitive. Some amount of noise or junk data is often introduced into processing algorithms as well to make prediction and detection of sensitive data within algorithms more difficult for malicious actors. Differential privacy can be used in tandem with federated learning. Each federated node simply behaves this way when working together in this distributed model. The TensorFlow Privacy module can be used within TensorFlow installations to create a differential privacy model.
  • Homomorphic encryption. Homomorphic encryption is a specialized type of encryption algorithm and tool set that enables encrypted data to be processed and produce the same outputs as the original plaintext data would have. With homomorphic encryption, data sets can be processed for machine learning and AI without ever being exposed to the applications and systems using the data. Though this type of encryption has existed for over a decade, it has been exceedingly difficult to implement in an efficient way. Newer libraries, including PySyft and TF Encrypted, are helping advance the possibility of homomorphic encryption in neural network programming and machine learning operations.

Many of these privacy-preserving machine learning technologies are evolving rapidly. Much of the work is driven by cloud providers and technology organizations, such as Apple, Facebook and Google, which collect staggering quantities of data yet need to meet privacy concerns globally. More and more machine learning and AI use cases are emerging all the time -- from commercial options to assist marketing and consumer purchasing and behaviors to law enforcement seeking out criminals using photos and other biometric features captured in airport cameras. Startups such as Duality Technologies are attracting funding to help build more commercially viable controls, like homomorphic encryption, as well.

For enterprises, ensuring privacy is protected with machine learning and AI in the cloud will likely require implementing both cloud-native and third-party controls. Assurances from cloud service providers will also be critical in addressing valid privacy concerns. It is certain privacy will drive the rapid development of machine learning data protection measures in more ways than IT professionals can currently envision.

Dig Deeper on Cloud security