Runbooks are a great way to define your process and ensure that all steps are considered during an incident. You can define a runbook to describe how your engineering team triages a SEV1 incident, how to troubleshoot issues with a specific service, how to perform a database restoration or even what needs to happen to bring a new service to production.
Runbooks in FireHydrant can be associated with incident severities, a specific service or environment, or an incident role. When an incident is opened or updated to include any of these criteria, your runbook will automatically be attached to the incident.
Infrastructure runbooks are used to define steps that should be taken when responding to an incident with a specific service or environment. You can include links to relevant graphs, GitHub commits, external guides, anything that would be helpful to restore your service. If you have a third-party application with a severe memory leak but it's not safe to restart without following a specific shutdown process that's a great candidate for becoming a runbook.
Incident runbooks allow you to define your incident response process to ensure it's executed in a consistent manner. Creating a easy to follow checklist for your process means that anyone can run an incident and allows you to run gamedays to practice your incident response process.
These are tasks to be followed by someone assigned to a specific role during an incident, like the incident commander.