Why signature-based detection isn't enough for enterprises
Signature-based detection and machine learning algorithms identify malicious code and threats. Expert Michael Cobb explains how both techniques defend networks and endpoints.
Detection of malicious code and behavior serves as the first line of defense against hackers and cybercriminals. It can be divided into two main techniques: signature-based techniques and anomaly-based techniques. Signature-based detection is the older technology, dating back to the 1990s, and is very effective at identifying known threats. Each signature is a string of code or pattern of actions that corresponds to a known attack or malicious code. Network traffic and files can be checked against a database of such signatures to see if any known threats are present; if they are, an alert is issued and mitigation processes begin. The big problem with this knowledge-based detection is that it can't detect malicious code or events that don't have signatures and the threat landscape is making it harder to keep lists of such signatures up to date. According to Symantec, nearly a million new threats are released every day.
The Einstein program, developed by the National Protection and Programs Directorate, the Department of Homeland Security's cybersecurity division, has recently been criticized for relying too heavily on this type of signature to detect and block malicious traffic. Malware developers can constantly change their code or the way it is packaged to make sure it does not produce the same signature as previous versions, detection of which may have been added to existing signature lists of known bad code. For example, the way in which instructions in the code are written may be changed, or the syntax altered while preserving its functionality. Metamorphic malware is even more sophisticated, as it's capable of changing itself to a completely new instance with each fresh infection, while polymorphic malware encrypts itself each time with a different encryption key. This code mutation makes unique signature generation extremely difficult.
How machine learning algorithms can help
To combat the shortcomings of the classic approach to signature-based detection, most full-featured antimalware products now use a combination of complimentary techniques, including anomaly-based behavior detection and heuristic scanning. This improves overall detection rates, particularly of unknown malware or behavior. Behavior-based techniques observe program behavior to determine whether it is harmful or not. Heuristic techniques mainly use machine learning algorithms and data mining methods to identify malicious indicators embedded in the running program.
Machine learning algorithms have the ability to learn and adapt when exposed to data. By analyzing known malware activity, a program can develop the ability to find and detect new threat patterns and determine the probability that an unknown program is in fact malware. Unlike classic signature-based detection, machine learning methods can spot malware that mutates to change its signature, as classification is based on the execution and behavior of a program, not just the static code. The use of cloud-based machine learning, as well as advanced clustering and data mining techniques, also increases the speed and efficiency of malware analysis. However, malware experts and detection engineers need to verify new inputs, interpretations and classifications; otherwise, an unstable feedback loop that can create false positives or reduce the level of positive detection may occur.
Machine learning enables a more sophisticated and effective type of signature, capable of detecting and classifying a broader range of threats. Hashes are still used to detect specific malicious binaries, but DNA signatures -- as antivirus and security software company ESET calls them -- provide more complex definitions of malicious behavior and malware characteristics. The behavior of malware is a lot harder to change than its code, and by using deep code analysis, the indicators of compromise responsible for its behavior can be used to construct a signature. This can be used to identify new variants of a known malware family, or even previously unseen or unknown malware that contains DNA indicative of malicious behavior. This data can also be fed back into machine learning algorithms to identify additional malicious genes and behavioral patterns. Part of DARPA's Cyber Genome Project is using machine learning to mine malware for insights, and score new malware samples in the wild.
Conclusion
No security system should rely on just one method of detecting malicious code or activity. Security is always about defense-in-depth and diversity, and the overall effectiveness of security controls and techniques working together is what counts. A combination of detection methods creates the most effective antimalware solution. Despite any shortcomings, signature-based detection continues to play an integral role in keeping networks and endpoints secure. In classic form, they are a direct impediment to previously identified threats. With more evolved signature technology, their added intelligence makes signatures a serious line of defense, even against new threats.