When Netflix introduced its streaming service almost 15 years ago, performance issues were common. To boost resilience as its customer base grew, the company developed chaos engineering, a discipline that tests systems to determine their fault tolerance under unstable circumstances.
Today, this methodology is being adapted for a security context. While it's early days for security chaos engineering, many industry professionals are interested in its potential, and there are a few tools available on the market.
"Cybersecurity teams don't always have the right situational awareness of how systems are interrelated internally. [Security chaos engineering] is insanely valuable for security teams because it would give teams better insight into their environment and what tools are doing," said Jeff Pollard, an analyst at Forrester Research.
For companies and security teams interested in security chaos engineering tools, here are some options to consider.
Verica, co-founded by Aaron Rinehart and Casey Rosenthal in January 2019, is one of the few dedicated security chaos engineering tools. Rosenthal led the chaos engineering team at Netflix, and Rinehart used chaos engineering for security when he was chief security architect at UnitedHealth Group (UHG).
Verica's platform can be used in the cloud or on premises. It uses "Continuous Verification" to handle experiments around availability and security. The platform is based on Netflix's Chaos Automation Platform (ChAP) and integrates with Kafka and Kubernetes.
ChaoSlingr was one of the first security engineering tools available. Rinehart helped develop the tool when he was at UHG. Code for the open source tool, which was deprecated after Rinehart left UHG, is available for use on GitHub for companies to write their own experiments.
ChaoSlingr is comprised of the following four AWS Lambda functions written in Python:
- Generatr identifies what will be affected by the failure.
- Slingr injects the failure.
- Trackr provides event logs.
- Experiment description provides testing information.
Kelly Shortridge, senior principal of product technology at Fastly, designed the security tree generator Deciduous, which provides a design phase for security chaos engineering. "Harness the scientific method, which is what chaos engineering is all about, and come up with hypotheses," Shortridge said.
Creating security decision trees enables security teams to effectively threat model systems. In a chaos engineering context, they help teams map how tools and systems are intended to work versus how they actually work. Using Deciduous, security teams can visualize potential attacker actions and defender mitigations in graph form.
Adapt chaos engineering tools for security
Companies should also consider using existing chaos engineering tools in security scenarios.
"A lot of existing chaos engineering tools conduct experiments related to availability," Shortridge said. "In theory, you can repurpose some of them for security, for example to simulate a distributed denial-of-service attack or excess traffic."
"Production systems frequently experience various levels of degradation and misconfiguration," said Jim Scheibmeir, an analyst at Gartner. Fortunately for companies, he said, the majority of chaos engineering tools providing this hypothesis-driven experimentation are open source.
For example, Netflix has the following suite of tools companies can customize to their needs:
- Chaos Monkey is an open source tool that introduces random failures into applications. Netflix uses the tool to randomly turn its servers on and off to observe the resulting behavior.
- Chaos Kong takes Chaos Monkey to the next level. It simulates turning off entire AWS Regions to help engineers discover systemic issues and fix them ahead of potential real-world failures.
- ChAP tests for system failures at the microservice level.
Chaos Toolkit is another open source chaos engineering project that can be adapted for security. The extensible tool enables developers to create and automate experiments for their specific use cases. Developers can implement the Chaos Toolkit via Python functions, HTTP requests or separate processes. With prewritten extensions, developers can connect to a variety of systems through Open API.
Write custom Python or Bash scripts
If existing security chaos engineering tools won't fit the bill, another option is to create your own. Security teams can use Python and Bash to write custom scripts to introduce failures to specific systems and know exactly where issues arise. Custom scripts also make it easier to roll back the system following experiments.
Future of security chaos engineering tools
Because the concept is in its infancy, there aren't many security chaos engineering tools on the market. Expect to see more as the topic takes off.
Rinehart said a to-be-released tool developed by software engineer Matas Kulkovas will run Kubernetes-specific experiments for security resiliency.
Researchers at the University of Potsdam in Germany published a 2020 paper detailing CloudStrike, a tool designed to test security resiliency in cloud infrastructure. It uses security chaos engineering techniques to help security teams find misconfigurations and availability issues in AWS and Google Cloud Platform. The tool has not yet been released.