See what's new in Tailwarden and Komiser this month and what we're gradually rolling out.
6 min read
Feb 2, 2023
This article is the capstone to a 6-part endeavor to condense the security pillar of the AWS Well architected framework. We have spoken about the Security foundations, IAM, Detection, Infrastructure as well as Data Protection, and Incident response is the missing piece of the puzzle. It goes to show just what an amazing body of work the AWS framework is that it took over 12.000 words to make a “summary” in the hopes of providing useful and actionable advice. Hopefully, it goes without saying that by no means do I propose this series of articles to be any sort of substitute for the official, full-length AWS documentation corpus. These blog posts are simply the words and thoughts of your humble correspondent over at Tailwarden.
The importance of the different elements of cloud security has no correlation with the order in which they are organized in the well-architected framework, they are topics that individually combine to deliver to the cloud professional a holistic and comprehensive understanding of how cloud security should be approached.
It’s also noteworthy to point out that an incident can originate from many sources. And from a business value point of view. As an organization, you should be equally as equipped to react to an external offensive as well as some sort of internal change or blunder. What matters is how customers are affected and how effective your team is at balancing the SLO in a comfortable range. Having said that, let’s jump into the topic at hand.
Incident response is the process of preparing for, identifying, triaging, and responding to incidents that could compromise the security of an organization's systems and data. There are a number of frameworks that could be used such as the NIST Computer Security Incident Handling guide to prepare and inform your incident response approach but additional considerations should be taken when responding to incidents in the cloud. Incidents include security breaches, data breaches, network outages, system failures, and other disruptions to an organization's IT infrastructure.
The goal of incident response is to minimize the impact of incidents on the organization's business operations and protect its systems and data from further harm. To achieve this goal, organizations typically have an incident response plan in place outlining the steps to take in the event of an incident.
Typical steps in an incident response plan include:
Additionally, it’s important to couple these steps with automated processes and use methods of redeployment aligned with your level of expertise and tech stack. By understanding the cloud and how your application is built you will then be in the best position to understand where the events and data will need to be acted upon in the case of an incident. It’s important to place these incident response steps inside the context of your team's and your awareness of the cloud in general and your environment in particular, it’s only when the terrain is familiar that the steps can be effective. An analogy that comes to mind is that of you on your holidays in Japan, you take the subway to downtown Tokyo but suddenly the train stops, alarms start ringing and people start frantically heading for the exits. Assuming you are not a Japanese speaker you won’t be able to understand the instructions bellowing from the megaphones and you will quickly find yourself confused, lost, and in need of help.
Developing an incident management plan is crucial because it helps prepare for and respond to incidents that could compromise the security of their systems and data in a quick systematic and hopefully rehearsed way. The correct assumption is to think that you will more than likely face an incident in your environment at some point, more than likely sooner than you think. And nobody is safe.
Having an incident management plan in place helps minimize the impact of incidents on business operations and protects systems and data from further harm. A well-developed incident management plan should include:
Developing an incident management plan requires organizations to consider the types of incidents that could occur, the potential impact on their business operations, and the resources needed to effectively respond to these incidents. It also requires organizations to establish an incident management team, define roles and responsibilities, and conduct training and drills to ensure that team members are prepared to respond to incidents.
Most of us have been there, but if you haven’t, this is what it’s like ⬇️
Let’s break down the steps the computer engineers in the comic above went through, the story revolves around a hypothetical situation where a company's web server is compromised by a malicious actor:
Training your team in incident response methods is essential to ensure that they are prepared to effectively respond to incidents that could compromise the security of your systems and data. Without proper training, team members may not know what to do or how to proceed during an incident, leading to a slower response and longer downtime. This can result in a greater impact on your organization's business operations and a higher risk of financial loss.
A way of training might involve junior members showing more experienced senior team members as they go through a drill rehearsal if a certain incident takes place. Comprehensive access to well-documented post-mortems on previous incidents is a great bundle or required reading that can be woven into the onboarding process of new team members. An emphasis on senior-to junior-knowledge transfer sessions in the form of rehearsals or pre-mortem sessions can be hugely beneficial.
Effective communication is also crucial during an incident response situation. Without proper communication, team members may be unsure of their roles and responsibilities, leading to confusion and misunderstandings. This can hinder the response and prolong downtime. It is important to establish clear lines of communication within the incident response team, as well as with other stakeholders (e.g., customers, partners, regulators) to ensure that everyone is informed and aware of the situation.
At a previous company, I was most positively influenced by a senior team member who kept their cool and professionalism, during a particularly hairy outage that was directly affecting the logging page for a large sub-section of the customer base, effectively rendering the platform inaccessible to them. He impressively kept his cool when the AWS support team was pulled in and they were equally baffled at the peculiar networking anomaly and needed time to come up with a solution. At all times the team lead was not only calm but had the awareness to hear everybody's opinion and tried to include the whole troubleshooting party by routinely reciting out loud what had been tried, what the running hypothesis was, and what might be possible ways forward. It was through this constant rehashing of the issue out loud over and over that someone finally voices something along the lines, “Have we tried looking at route x on Internet gateway y?” that ended up pointing us in the direction of a positive outcome.
Of course, nobody goes to bed or wakes up hoping to have to scrap the open tasks and tackle a live incident, for those of you who have spent time on an on-call rotation you know this too well. But once we accept that incidents are a matter of time, you can then open the door to correct preparation and planning. When you surround yourself with a well-synchronized and trained team as well as implement thoughtful environment-specific security automation or deployment rollbacks you can learn the language of incident response if you will and then when the time comes you can face the issue like the chap below.
Regardless if you are a Developer, DevOps, or Cloud engineer. Dealing with the cloud can be tough at times, especially on your own. If you are using Tailwarden or Komiser and want to share your thoughts doubts and insights with other cloud practitioners feel free to join our Tailwarden Discord server. Where you will find tips, community calls, and much more.