-
Notifications
You must be signed in to change notification settings - Fork 84
Closed
Labels
outageIssues tracking P1 PagerDuty incidentsIssues tracking P1 PagerDuty incidents
Description
Context
Is it ultimately the Team Lead's responsibility to validate that the resolution to an outage was correct and the appropriate follow-up action items were correctly identified.
What we need to do
1. Review the action items and incident resolution
- review the resolution and the action items of the Outage ongoing: [FIRING:1] At least two servers failed to start in the last 30m projectpythia-binder binderhub 3 (kubeconfig immediate action needed) #7847' PD postmortem
- let the team know in the
#post-outage-actionsSlack channel and ask any clarifying questions
2. Publish incident report
- open a PR in https://github.com/2i2c-org/incident-reports if you're happy with the resolution
- or ask another team member to do it via the #post-outage-actions Slack channel
Definition of Done
- The PD postmortem is marked as
Closed - The postmortem was exported as a PDF and added to the https://github.com/2i2c-org/incident-reports repo
Reactions are currently unavailable
Metadata
Metadata
Labels
outageIssues tracking P1 PagerDuty incidentsIssues tracking P1 PagerDuty incidents
Type
Projects
Status
Done