-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Emergency Runbook
This site is hosted on Gatsby Cloud and is maintained / supported by New Relic's Developer Experience team. If you have any questions or comments, feel free to create an issue or reach out to us at [email protected].
- Troubleshooting dashboard
- #help-dev-experience (for engineering requests)
- #documentation (for content requests)
- #devex-bots (alert and deployment updates)
- Alert policy
- Architecture notes
| Scenario | Severity | Resolution |
|---|---|---|
| Site is not loading | ❗ High | Rollback a release |
| All localized pages are throwing 500s | ❗ High | Rollback a release |
| Functionality is broken | Rollback a release | |
| Alert has been triggered | Respond to an incident | |
| Copy needs to be adjusted | 👀 Unknown |
Create an issue or ping @hero in #documentation
|
If the site is not loading, or a piece of functionality is broken, you may will likely need to rollback to a stable release using the following steps. There are two ways to rollback a release:
- Log into Gatsby Cloud with Github two-factor.
- Select the developer-website site.
- Click View production history to see all the previous builds that have run.
- Find the appropriate build to roll back to. Click Publish to deploy that build of the site.
- Validate once the deploy is finished.
- Notify The Developer Experience Team and the @hero in #help-deven-websites of the rollback so we can address the underlying issue.
If you do not have access to Gatsby Cloud, you can perform a rollback using Github:
-
Find the pull request (into
main) that you would like to rollback. - Click
Revertto create a new pull request that undoes this work. - Have someone review the rollback and approve the pull request.
- Once the necessary checks have passed, merge into
main. - A build will be triggered in Gatsby Cloud. Once complete, the rollback will be released.
- Notify the engineering team so that the issue which caused the rollback can be addressed.
The following steps are for on-call engineers working at New Relic:
- Don't panic, you've got this!
- Check to see if there is already an ongoing incident in #emergency-room (or in 2, 3, and 4).
- If there is not an ongoing incident, start one by following the steps in the Incident Commander Runbook.
- Refer to the troubleshooting dashboard to get an idea for what could be going on.
- Look at the recent deployments to production to identify a PR that can be reverted.
TypeError:Cannot read property 'slug' of undefined
If you run into this error, it is likely the generic error coming from Gatsby (#1164). The solution to this is to "reset" the cache by making an update to one of the gatsby- files.
We had a build succeed and deploy once that had some kind of DSG error. This resulted in all our localized pages throwing 500s. This was a fluke on gatby cloud but if it occurs again you can roll back to a build in which the localized pages are showing up. You can also try rebuilding the current build as it's unlikely this would happen twice in a row.