-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Emergency Runbook
Zack Stickles edited this page Feb 24, 2021
·
11 revisions
This site is hosted on AWS Amplify and is maintained / supported by New Relic's Developer Experience team. If you have any questions or comments, feel free to create an issue or reach out to us at [email protected].
- Troubleshooting dashboard
- #help-dev-experience (for engineering requests)
- #documentation (for content requests)
- #devex-bots (alert and deployment updates)
- Alert policy
- Architecture notes
| Scenario | Severity | Resolution |
|---|---|---|
| Site is not loading | ❗ High | Rollback a release |
| Functionality is broken | Rollback a release | |
| Alert has been triggered | Respond to an incident | |
| Copy needs to be adjusted | 👀 Unknown |
Create an issue or ping @hero in #documentation
|
If the site is not loading, or a piece of functionality is broken, you may will likely need to rollback to a stable release using the following steps. There are two ways to rollback a release:
If you are an engineer who has access to the Amplify console, the following steps will be the fastest way to rollback a release:
- Log into the Amplify console via nr-prod okta.
- Select the
developer-webisteapp. - Under
Frontend environments, selectmain. - Click the
View build historybutton to see the previous builds that have run. - Find the appropriate build and click
Build #xxxto select it. - Click
Redeploy this versionto initiate a validation & deployment. - Notify the engineering team so that the issue which caused the rollback can be addressed.
If you do not have access to Amplify, you can perform a rollback using Github:
-
Find the pull request (into
main) that you would like to rollback. - Click
Revertto create a new pull request that undoes this work. - Have someone review the rollback and approve the pull request.
- Once the necessary checks have passed, merge into
main. - A build will be triggered in Amplify. Once complete, the rollback will be released.
- Notify the engineering team so that the issue which caused the rollback can be addressed.
The following steps are for on-call engineers working at New Relic:
- Don't panic, you've got this!
- Check to see if there is already an ongoing incident in #emergency-room (or in 2, 3, and 4).
- If there is not an ongoing incident, start one by following the steps in the Incident Commander Runbook.
- Refer to the troubleshooting dashboard to get an idea for what could be going on.
- Look at the recent deployments to production to identify a PR that can be reverted.