Getty Images


Adversarial machine learning: Threats and countermeasures

As machine learning becomes widespread, threat actors are developing clever attacks to manipulate and exploit ML applications. Review potential threats and how to combat them.

Machine learning offers numerous benefits for organizations and individuals, such as automating routine tasks or discovering trends and patterns in vast data sets. However, all these benefits come with a risk: security.

Although threats such as evasion, data poisoning and model extraction can undermine the security and integrity of ML systems, researchers are rapidly developing innovative defenses. Security-conscious training procedures, algorithmic enhancements and secure development practices can harden ML systems against common adversarial ML attacks. Technical countermeasures such as differential privacy, watermarking and model encryption can all improve the security of ML systems.

Adversarial attacks pose unique security challenges that organizations must address to ensure ML systems' safe and seamless operation. Explore the primary types of cyber attacks against ML systems and potential countermeasures to mitigate them.

Motivations for attacking ML systems

There are two main motivations for attacking an ML system: stealing sensitive information and disrupting normal operations.

Some ML models learn from sensitive information, such as customers' or employees' personally identifiable information; health records; and sensitive corporate, governmental or military data. Threat actors might try to attack ML systems to gain unauthorized access to this information for reasons such as identity theft, commercial gain or espionage.

Attackers might also seek to understand how the target ML system behaves and then provide malicious input to the ML model, thus forcing it to give a preferred output. This is known as adversarial ML and encompasses four primary attack types: poisoning, evasion, inference and extraction.

How does an adversarial ML attack work?

Because ML models are data-driven, adversarial ML attacks introduce unique security challenges during model development, deployment and inference. Security teams might also intentionally simulate adversarial ML attacks and analyze the results in order to design models and applications that can withstand such attacks.

Adversarial ML attacks can be white box or black box. In a white-box attack, the attacker has a deep knowledge of the ML model, including its underlying architecture, training data and the optimization algorithm used during the training process. This information gives the attacker significant insight into the model, enabling them to craft highly targeted exploits.

In a black-box attack, the attacker has limited or no knowledge of the ML model, including its architecture, training data, decision boundaries and optimization algorithm. The attacker therefore must interact with the ML model as an external user via prompts using a trial-and-error approach, attempting to discover exploitable vulnerabilities through observing its responses.

Adversarial ML attack types and mitigations

There are four primary types of adversarial ML attacks.

1. Poisoning

In a poisoning attack, threat actors try to inject malicious data into an ML model's training data. This forces the model to learn something it should not, causing it to produce inaccurate responses based on the attacker's objectives. The earliest recorded poisoning attack was executed in 2004, when threat actors fooled spam classifiers to evade detection.

Another example of poisoning is inserting harmful files containing malicious scripts into the training data of an ML system designed to identify malware. Thus, when the corrupted ML system is deployed, it will not be able to recognize malicious files and thus allows them to pass detection. One study by Stanford researchers found that adding 3% poisoned data to an ML training data set resulted in 12% to 23% test error.

To prevent poisoning attacks, follow these best practices:

  • Validate training data before using it to train an ML model using appropriate security controls and tools, and ensure that data comes only from trusted sources.
  • Use anomaly detection techniques on the training data sets to discover suspicious samples.
  • Use ML models that are less susceptible to poisoning attacks, such as ensembles and deep learning models.
  • Experiment with feeding malicious inputs into the ML system and observing its responses to reveal backdoor vulnerabilities.
  • Monitor the system's performance after feeding it new data. If the model's accuracy or precision notably degrades, this could be a sign of poisoned samples and should be investigated further.

2. Evasion

Evasion attacks occur during the inference or testing phase, after the ML system has finished training. These attacks involve sending carefully crafted inputs containing a small perturbation to the ML system to encourage it to make a mistake. Evasion attacks require the attacker to have some knowledge about the inner workings of the ML system so that they can craft the malicious input samples correctly.

As an example of an evasion attack, consider an image classifier designed to detect objects. An adversary could take a correctly classified photo of a dog and add some slight amount of carefully constructed noise to the ML system. Although this minor modification is imperceptible to the naked eye, it is noticeable to the model, causing the classifier to label the photo incorrectly -- say, as a castle instead of a dog.

To prevent evasion attacks, enforce the following best practices:

  • Train the ML system with adversarial samples to strengthen its ability to recognize them and become more resilient to evasion attacks.
  • Perform input sanitization on training data.
  • Use different ML models with varied training data sets to reduce the effectiveness of evasion attacks.
  • Continually monitor ML models to detect potential evasion attack attempts.
  • Keep training data in a secure storage location with strong access controls to avoid data tampering.

3. Inference

In inference attacks, adversaries attempt to reverse-engineer an ML system by providing specific inputs to reconstruct the model's training samples. For instance, certain training data sets might contain sensitive personal information about customers, which attackers could aim to extract using specific inputs to the model.

There are three primary types of inference attacks:

  • Membership inference attacks, in which the attacker tries to determine whether a specific data record was used in model training.
  • Property inference attacks, in which the adversary attempts to guess specific properties about the training data that the system owner does not want to share, such as demographic information, financial data or sensitive personal information.
  • Recovery of training data, in which the attacker aims to reconstruct the training data itself to reveal sensitive information.

Defense strategies to mitigate inference attacks include the following:

  • Use cryptography to protect ML data.
  • Remove sensitive information from inputs before it reaches the ML model.
  • Augment the data by adding nonsensitive data to training data sets, making it harder for attackers to infer information about specific data records.
  • Limit access to ML systems and their training data to authorized users only.
  • Configure the ML system to give random output to increase the difficulty of predicting ML model outputs.

4. Extraction

In an extraction attack, adversaries attempt to extract information about the ML model or the data used to train it. This type of attack aims to understand the ML model's architecture and extract the sensitive data used during the training phase.

There are several types of extraction attacks:

  • Model extraction, where the attacker extracts or replicates the entire target ML model.
  • Training data extraction, where the adversary extracts the data used to train the ML model.
  • Hyperparameter extraction, where the attacker determines the key features of the target ML model, such as architecture, learning rate and complexity.

Enforce the following defense strategies to mitigate extraction attacks:

  • Encrypt ML model parameters before deployment to prevent threat actors from replicating the model.
  • Use a unique watermark to prove ownership of model training data.
  • Add noise to the generated output to hide sensitive patterns.
  • Enforce access control to restrict access to the ML system and its training data.

Obfuscate variables and scramble source code to make reverse-engineering the ML system more difficult.

Next Steps

How hackers use AI and machine learning to target enterprises

Dig Deeper on AI technologies

Business Analytics
Data Management