Getty Images

Tip

Adversarial machine learning: Threats and countermeasures

As machine learning becomes widespread, threat actors are developing clever attacks to manipulate and exploit ML applications. Review potential threats and how to combat them.

Nihad Hassan

By

Nihad Hassan

Published: 30 Oct 2023

Machine learning offers numerous benefits for organizations and individuals, such as automating routine tasks or discovering trends and patterns in vast data sets. However, all these benefits come with a risk: security.

Although threats such as evasion, data poisoning and model extraction can undermine the security and integrity of ML systems, researchers are rapidly developing innovative defenses. Security-conscious training procedures, algorithmic enhancements and secure development practices can harden ML systems against common adversarial ML attacks. Technical countermeasures such as differential privacy, watermarking and model encryption can all improve the security of ML systems.

Adversarial attacks pose unique security challenges that organizations must address to ensure ML systems' safe and seamless operation. Explore the primary types of cyber attacks against ML systems and potential countermeasures to mitigate them.

Motivations for attacking ML systems

There are two main motivations for attacking an ML system: stealing sensitive information and disrupting normal operations.

Some ML models learn from sensitive information, such as customers' or employees' personally identifiable information; health records; and sensitive corporate, governmental or military data. Threat actors might try to attack ML systems to gain unauthorized access to this information for reasons such as identity theft, commercial gain or espionage.

Attackers might also seek to understand how the target ML system behaves and then provide malicious input to the ML model, thus forcing it to give a preferred output. This is known as adversarial ML and encompasses four primary attack types: poisoning, evasion, inference and extraction.

How does an adversarial ML attack work?

Because ML models are data-driven, adversarial ML attacks introduce unique security challenges during model development, deployment and inference. Security teams might also intentionally simulate adversarial ML attacks and analyze the results in order to design models and applications that can withstand such attacks.

Adversarial ML attacks can be white box or black box. In a white-box attack, the attacker has a deep knowledge of the ML model, including its underlying architecture, training data and the optimization algorithm used during the training process. This information gives the attacker significant insight into the model, enabling them to craft highly targeted exploits.

In a black-box attack, the attacker has limited or no knowledge of the ML model, including its architecture, training data, decision boundaries and optimization algorithm. The attacker therefore must interact with the ML model as an external user via prompts using a trial-and-error approach, attempting to discover exploitable vulnerabilities through observing its responses.

Adversarial ML attack types and mitigations

There are four primary types of adversarial ML attacks.

1. Poisoning

In a poisoning attack, threat actors try to inject malicious data into an ML model's training data. This forces the model to learn something it should not, causing it to produce inaccurate responses based on the attacker's objectives. The earliest recorded poisoning attack was executed in 2004, when threat actors fooled spam classifiers to evade detection.

Another example of poisoning is inserting harmful files containing malicious scripts into the training data of an ML system designed to identify malware. Thus, when the corrupted ML system is deployed, it will not be able to recognize malicious files and thus allows them to pass detection. One study by Stanford researchers found that adding 3% poisoned data to an ML training data set resulted in 12% to 23% test error.

To prevent poisoning attacks, follow these best practices:

Validate training data before using it to train an ML model using appropriate security controls and tools, and ensure that data comes only from trusted sources.
Use anomaly detection techniques on the training data sets to discover suspicious samples.
Use ML models that are less susceptible to poisoning attacks, such as ensembles and deep learning models.
Experiment with feeding malicious inputs into the ML system and observing its responses to reveal backdoor vulnerabilities.
Monitor the system's performance after feeding it new data. If the model's accuracy or precision notably degrades, this could be a sign of poisoned samples and should be investigated further.

2. Evasion

Evasion attacks occur during the inference or testing phase, after the ML system has finished training. These attacks involve sending carefully crafted inputs containing a small perturbation to the ML system to encourage it to make a mistake. Evasion attacks require the attacker to have some knowledge about the inner workings of the ML system so that they can craft the malicious input samples correctly.

As an example of an evasion attack, consider an image classifier designed to detect objects. An adversary could take a correctly classified photo of a dog and add some slight amount of carefully constructed noise to the ML system. Although this minor modification is imperceptible to the naked eye, it is noticeable to the model, causing the classifier to label the photo incorrectly -- say, as a castle instead of a dog.

To prevent evasion attacks, enforce the following best practices:

Train the ML system with adversarial samples to strengthen its ability to recognize them and become more resilient to evasion attacks.
Perform input sanitization on training data.
Use different ML models with varied training data sets to reduce the effectiveness of evasion attacks.
Continually monitor ML models to detect potential evasion attack attempts.
Keep training data in a secure storage location with strong access controls to avoid data tampering.

3. Inference

In inference attacks, adversaries attempt to reverse-engineer an ML system by providing specific inputs to reconstruct the model's training samples. For instance, certain training data sets might contain sensitive personal information about customers, which attackers could aim to extract using specific inputs to the model.

There are three primary types of inference attacks:

Membership inference attacks, in which the attacker tries to determine whether a specific data record was used in model training.
Property inference attacks, in which the adversary attempts to guess specific properties about the training data that the system owner does not want to share, such as demographic information, financial data or sensitive personal information.
Recovery of training data, in which the attacker aims to reconstruct the training data itself to reveal sensitive information.

Defense strategies to mitigate inference attacks include the following:

Use cryptography to protect ML data.
Remove sensitive information from inputs before it reaches the ML model.
Augment the data by adding nonsensitive data to training data sets, making it harder for attackers to infer information about specific data records.
Limit access to ML systems and their training data to authorized users only.
Configure the ML system to give random output to increase the difficulty of predicting ML model outputs.

4. Extraction

In an extraction attack, adversaries attempt to extract information about the ML model or the data used to train it. This type of attack aims to understand the ML model's architecture and extract the sensitive data used during the training phase.

There are several types of extraction attacks:

Model extraction, where the attacker extracts or replicates the entire target ML model.
Training data extraction, where the adversary extracts the data used to train the ML model.
Hyperparameter extraction, where the attacker determines the key features of the target ML model, such as architecture, learning rate and complexity.

Enforce the following defense strategies to mitigate extraction attacks:

Encrypt ML model parameters before deployment to prevent threat actors from replicating the model.
Use a unique watermark to prove ownership of model training data.
Add noise to the generated output to hide sensitive patterns.
Enforce access control to restrict access to the ML system and its training data.

Obfuscate variables and scramble source code to make reverse-engineering the ML system more difficult.

Next Steps

How hackers use AI and machine learning to target enterprises

Dig Deeper on AI technologies

Search Business Analytics

Synthetic data vs. real data for predictive analytics
Synthetic data helps simulate rare events and meet privacy compliance, while real data preserves natural variability needed to ...
7 predictive analytics skills to improve simulation modeling
Predictive analytics skills such as statistical analysis, data preprocessing and model evaluation can help data professionals ...
Knime updates framework for agentic AI development
The open source analytics vendor is keeping up with competitors by providing features aimed at enabling users to create ...

Search CIO

9 common risk management failures and how to avoid them
As enterprises rework their business models and strategies to meet various new challenges, risks abound. Here are nine risk ...
Traditional vs. enterprise risk management: How do they differ?
Traditional risk management and enterprise risk management are similar in their aim to mitigate risks that can harm a company. ...
Domestic manufacturing policy emphasizes U.S. tech, products
Bringing manufacturing back to the U.S. might be a lofty goal for some products, but companies like Apple are making moves to ...

Search Data Management

Informatica adds MCP support, spate of AI-fueled features
With Model Context Protocol helping standardize how enterprises develop and deploy agents, support for the open standard is ...
What is data lineage? Techniques, best practices and tools
Organizations can bolster data governance efforts by tracking the lineage of data in their systems. Get advice on how to do so ...
Collibra's acquisition of Deasy targets unstructured data
With AI development on the rise, the vendor's latest purchase better enables customers to combine the complete array of relevant ...

Search ERP

6 benefits of using low-code ERP
Using low-code ERP can result in easier user training and more agility, among other benefits. Learn more about how the software ...
Ultimo adds digital labor to org chart, EAM system
The EAM vendor is building out a digital workforce at 'light speed' to become an AI-first business. It also wants to help ...
8 ways ERP software can improve customer service
By integrating sales, inventory and shipping data, ERP software helps companies avoid delays and stockouts. Learn more about how ...

Close