https://www.techtarget.com/searchenterpriseai/definition/anomaly-detection
Anomaly detection is the process of identifying data points, entities or events that fall outside the normal range. An anomaly is anything that deviates from what is standard or expected. Humans and animals do this habitually when they spot a ripe fruit in a tree or a rustle in the grass that stands out from the background and could represent an opportunity or threat. Thus, the concept is sometimes framed as outlier detection or novelty detection.
Anomaly detection has a long history in statistics, driven by analysts and scientists who pored over charts to find elements that stood out. Over the last several decades, researchers have started automating this process using machine learning training techniques designed to find more efficient ways to detect different types of outliers.
In practice, anomaly detection is often used to detect suspicious events, unexpected opportunities or bad data buried in time series data. A suspicious event might indicate a network breach, fraud, crime, disease or faulty equipment. An unexpected opportunity could involve finding a store, product or salesperson that's performing much better than others and should be investigated for insight into improving the business.
An anomaly could also be the result of faulty equipment, broken sensors or a disconnected network. In these instances, a data scientist might want to remove the anomalous data records from further analysis so as not to compromise the development of new algorithms.
There are several ways of training machine learning algorithms to detect anomalies. Supervised machine learning techniques are used when you have a labeled data set indicating normal vs. abnormal conditions. For example, a bank or credit card company can develop a process for labeling fraudulent credit card transactions after those transactions have been reported. Medical researchers might similarly label images or data sets indicative of future disease diagnosis. In such instances, supervised machine learning models can be trained to detect these known anomalies.
Researchers might start with some previously discovered outliers but suspect that other anomalies also exist. In the scenario of fraudulent credit card transactions, consumers might fail to report suspicious transactions with innocuous-sounding names and of a small value. A data scientist might use reports that include these types of fraudulent transactions to automatically label other like transactions as fraud, using semi-supervised machine learning techniques.
The supervised and semi-supervised techniques can only detect known anomalies. However, the vast majority of data is unlabeled. In these cases, data scientists might use unsupervised anomaly detection techniques, which can automatically identify exceptional or rare events.
For example, a cloud cost estimator might look for unusual upticks in data egress charges or processing costs that could be caused by a poorly written algorithm. Similarly, an intrusion detection algorithm might look for novel network traffic patterns or a rise in authentication requests. In both cases, unsupervised machine learning techniques might be used to identify data points indicating things that are well outside the range of normal behavior. In contrast, supervised techniques would have to be explicitly trained using examples of previously known deviant behavior.
Broadly speaking, there are three different types of anomalies.
Many different kinds of machine learning algorithms can be trained to detect anomalies. Some of the most popular anomaly detection methods include the following:
Anomaly detection systems can be used in various ways to improve business, IT and application performance. These systems can also enhance the detection of fraud, security incidents and opportunities for innovation. The following are some other common use cases for anomaly detection:
Challenges in anomaly detection include the following:
Data scientists, IT managers, security managers and business teams must consider several aspects when designing anomaly detection apps to provide the appropriate value.
Anomaly detection is generally baked into most modern security, IT management, and fraud detection systems and applications. However, enterprises that want to develop their own anomaly detection algorithms may wish to turn to popular statistics, data science, and mathematical packages and tools. A sampling of popular ones include the following:
Anomaly detection is a complicated endeavor. It is one thing to experiment with new tools for detecting anomalies. But in practice, it isn't easy to reliably detect anomalies of value without inundating users with false positives.
In most cases, it will probably be easier to take advantage of domain-specific tools with built-in anomaly detection capabilities for applications like cloud cost management, IT service management or fraud detection.
Bespoke anomaly detection development makes more sense for companies that want to add anomaly detection capabilities to their own products and services. In these cases, it makes sense to take advantage of open source and proprietary data science platforms like scikit-learn or Mathematica.
29 Jul 2024