Testing email security products: Challenges and methodologies
Kevin Tolly of the Tolly Group offers a look at how his company set out to test several email security products, as well as the challenges it faced to come up with sound methodologies.
Email security has been an important issue for decades now -- ever since spammers started inundating corporate email inboxes with unwanted mail. But where spam was mostly a nuisance, today's phishing attacks represent significant threats that can compromise both corporate and personal information.
A successful attack can compromise confidential information, potentially resulting in direct and indirect losses in the millions of dollars. And, as security providers build new defenses, black hats are devising new methods of attack.
There is no question that you need to find the best email security product for your company -- but doing so can be a big challenge.
Evaluating the effectiveness of potential security products is something of a black art itself. Unlike, say, benchmarking network infrastructure hardware, there is no simple and straightforward method to evaluate competing systems. Email security products provide precious little detail of how their systems actually operate and focus instead on the big picture, with vague claims of stopping more attacks than their competitors.
The vendors might legitimately claim that they don't want to tip off hackers on how to circumvent protections. Still, anyone who has been involved in IT for any amount of time likely knows that it isn't prudent to select a key infrastructure component based on who has a better marketing department. Evaluation is essential.
Recognizing the quandary faced by prospective customers, a major security vendor tasked the Tolly Group with shedding some light on how best to go about the evaluation process by having us put ourselves in the role of a prospective customer [Editor's note: The Tolly Group's research and findings will be presented in a forthcoming vendor-sponsored webinar]. As experts in IT testing and without external influence, we researched, built and ran a series of test scenarios against four leading email security products.
Rather than determining winners and losers, we focused on how to build and execute the tests and, naturally, observed how the various email security products responded to identical threats. In this article series, we'll give you a look at what we found.
The challenges are manifold. There are no industry standard methods or methodologies for email security testing in general -- and specifically antiphishing. And, given the rapid evolution of attack strategies, even if standards existed, they would likely be outdated and of little use. Thus, the first challenge is building a set of threats around which the tests can be centered.
Then, those test messages need to be introduced into the system without tipping off the system such that it shuts down your test stream.
Finally, we needed to evaluate the results. Surprisingly, this was more challenging than it seemed at first glance.
Testing threats and benign messages
To test, we needed to send both threat and benign messages into our systems under test (SUTs). How many? More is always better, but where antispam tests might have message counts in the thousands, the labor-intensive nature of antiphishing tests can limit you to dozens or hundreds of messages at most.
We'll discuss both attacks and benign messages in turn, but remember that we need the products to stop threats and allow benign messages. A False positive can occur when an email security system incorrectly classifies a good message as being a threat.
Email security threats
First and foremost, we needed to make sure that the SUT detected and stopped threats -- emails containing malicious phishing links. The simplest way to obtain links to use for email security testing is to go to a community site where users submit suspected phishing links.
For our research project, we used PhishTank. These sites are one of the few ways that you can harvest messages for email security testing. It is important to keep in mind, though, that just because an email message has been posted on PhishTank -- or other sites -- as potential phishing, that doesn't mean that it really is a phishing message. Still, you will need to fish around -- ahem -- in the tank and likely click through to confirm that a link does, in fact, lead to a site that is masquerading as something that it is not.
This immediately diverted us to another issue. How does one click through a potentially malicious link without exposing one's own computer to threats? Fortunately, there are virtualized browsers like Browserling that you can use to click through to a URL in a safe, virtualized sandbox environment. Be sure to use a similar approach.
Once you find a suitable link, in the simplest case, you can paste it into the body of an email message and send it to a user in a system protected by your SUT. We decided to send only one phishing link per message. While it takes longer to test that way, it is the only way of knowing that your SUT is evaluating a specific link. If you sent a message with, say, 20 phishing links and the SUT blocked the message, you would have no way of knowing if it identified one link, all 20 or something in between.
If you are concerned about using recycled threats -- and there are some legitimate concerns, such as if the vendor you are testing is using that feed for their product -- your other option is to build your own threats. This is a very labor-intensive process and involves extra expenses. Taking this approach, you would need, at a minimum, to have or register domains that would become the targets for phishing attacks. You would then need websites available to host the attack URLs.
The biggest challenge would be building out fake websites with interactive PHP code and images to present what looks like a credible threat for the SUTs to analyze. If you were to test links to malicious code, as we did, you would need to harvest malware from the internet and then place that on your test website. Even just moving test malware around is a challenge, as it will typically be caught and quarantined by whatever endpoint security system you are running.
Your test will likely consist mostly of in-the-wild messages harvested from sites like PhishTank. Because those sites show the newest messages first, I recommend that you source the newest messages, as that is the closest that you will get to testing a zero-day attack. One would expect that most SUTs would be able to detect phishing attacks that are days or weeks old.
Finally, we wanted to send messages with benign links. This, fortunately, was much easier. For this, you can find links to PDFs or Excel files hosted on non-malicious, publicly accessible sites, such as U.S. government or university sites.
Stay tuned for part two of this series, which will examine the results of the Tolly Group's email security product testing.