twobee - Fotolia
Data dumps, ransomware attacks, card skimmers and malware are regular occurrences these days, which means incident management planning is no longer optional. It isn't a matter of if an incident will occur but when. And due to the hyper-connected nature of current IT environments, cloud-based workloads are particularly vulnerable.
Without an established response process, an organization won't be able to properly react to security threats or unexpected infrastructure or application problems. Thankfully, incident management is a well-established process.
To ease the stress of putting a plan in place, review these five steps to identify, remediate and adapt to incidents as -- and before -- they occur.
Step 1: Prepare
One of the most important things IT teams can do when they establish a cloud incident response process is to prepare for the inevitability that an incident will occur. While preparation can come in different forms, it's generally broken down into three categories: training, documentation and aggregation.
Having the right people in place is only half of the equation when it comes to cloud incident response. An organization also needs its people to be well-trained, well-informed and well-supported to handle events as they occur.
Cloud-native organizations must ensure their employees understand how to navigate their chosen provider's interface to gather information and react to what they find. This also means that employees should be aware of the company's incident management plan and what's expected of them.
Every tech company should have solid documentation to operate efficiently. In order to support employees that might get pulled into an incident response, this means creating and maintaining accurate runbooks. Runbooks are a set of routine operations and procedures that employees can carry out when reacting to predictable events in a production environment.
Runbooks aren't limited to security incidents, as they may also walk employees through tasks such as how to scale a database or restart a stuck process. When it comes to incident management, runbooks are the first line of defense for any employees that may not be familiar with the company's architecture.
Data is key when it comes to an incident response team's ability to identify what happened, how it happened and why it happened. Although log aggregation and analytics can be incredibly expensive, this information is the backbone of any identification, triage and remediation efforts that will be carried out in later steps.
Step 2: Identify
Before teams can respond to an incident, they need to identify when one is occurring. This can happen in several ways, but it generally requires recognition of abnormal behavior. This is often a manual process handled by combing through user reports or reviewing log and analytics data, but the implementation of automated tools is the only scalable way to recognize aberrant behavior in large cloud environments.
When an incident is identified through either manual or automated processes, many organizations may choose to notify and cross-validate the observed incident with their cloud provider. This step can go a long way towards ensuring that you're only reacting to real incidents. A cloud provider's support can help quickly close the loop in situations where time is critical.
Step 3: Coordinate
Once an incident has been identified, the next step is to get organized. Before anything can actually be fixed, you need to understand the nature and severity of the issue, and define and engage with the response team. In this step, the on-call engineer or employee would be responsible for identifying the nature of the report and would make an initial assessment of the severity before passing it onto an appropriate team member.
Establishing a response team
Many cloud-native organizations might be small enough to engage the same set of team members for every incident. But the larger an organization gets, the more important it is to identify the relevant experts for the issue that occurred.
After an incident occurs, it is transferred to an appropriate team member, also known as the incident commander, who will then identify the cross-functional leads from relevant teams to create the cloud incident response team. This team will be responsible for investigating and remediating the issue moving forward.
Step 4: Remediate
With the active incident identified and a response team in place, start to investigate and contain the problem. As the response team investigates, additional team members and resources may be required to gather as much information as possible.
Due to the inherent unpredictability of incidents, it's difficult to put a timeline on this process. Keep lines of communication open internally to track progress and understand the overall impact.
When it comes to cybersecurity, there's no faster way to lose customers' trust than to fail to notify them of issues that may affect them personally. It's important to provide a clear view into incidents -- as they happen -- and how to remediate them. Be aware of the optics of choosing not to report an incident to customers, and then having it come to light at a later date. When in doubt, err on the side of transparency.
Step 5: Review
The cornerstone of any agile cloud incident response process, retrospectives allow us to learn from past mistakes and make corrective action to ensure that they're always improvements. Highlight what went well and identify areas for improvement to help define action items. This enables response teams to learn from past incidents and be better prepared for the next incident.