Skip to content

Emergency Runbook

LizBaker edited this page Apr 14, 2022 · 11 revisions

This site is hosted on Gatsby Cloud and is maintained / supported by New Relic's Developer Experience team. If you have any questions or comments, feel free to create an issue or reach out to us at [email protected].

Helpful links

Troubleshooting

Scenario Severity Resolution
Site is not loading ❗ High Rollback a release
All localized pages are throwing 500s ❗ High Rollback a release
Functionality is broken ⚠️ Medium Rollback a release
Alert has been triggered ⚠️ Medium Respond to an incident
Copy needs to be adjusted 👀 Unknown Create an issue or ping @hero in #documentation

Rollback a release

If the site is not loading, or a piece of functionality is broken, you may will likely need to rollback to a stable release using the following steps. There are two ways to rollback a release:

Via Gatsby Cloud

  1. Log into Gatsby Cloud with Github two-factor.
  2. Select the developer-website site.
  3. Click View production history to see all the previous builds that have run.
  4. Find the appropriate build to roll back to. Click Publish to deploy that build of the site.
  5. Validate once the deploy is finished.
  6. Notify The Developer Experience Team and the @hero in #help-deven-websites of the rollback so we can address the underlying issue.

Via Github

If you do not have access to Gatsby Cloud, you can perform a rollback using Github:

  1. Find the pull request (into main) that you would like to rollback.
  2. Click Revert to create a new pull request that undoes this work.
  3. Have someone review the rollback and approve the pull request.
  4. Once the necessary checks have passed, merge into main.
  5. A build will be triggered in Gatsby Cloud. Once complete, the rollback will be released.
  6. Notify the engineering team so that the issue which caused the rollback can be addressed.

Respond to an incident

The following steps are for on-call engineers working at New Relic:

  1. Don't panic, you've got this!
  2. Check to see if there is already an ongoing incident in #emergency-room (or in 2, 3, and 4).
  3. If there is not an ongoing incident, start one by following the steps in the Incident Commander Runbook.
  4. Refer to the troubleshooting dashboard to get an idea for what could be going on.
  5. Look at the recent deployments to production to identify a PR that can be reverted.

Previous Issues

Build Failing

TypeError:Cannot read property 'slug' of undefined

If you run into this error, it is likely the generic error coming from Gatsby (#1164). The solution to this is to "reset" the cache by making an update to one of the gatsby- files.

All localized pages are throwing 500s

We had a build succeed and deploy once that had some kind of DSG error. This resulted in all our localized pages throwing 500s. This was a fluke on gatby cloud but if it occurs again you can roll back to a build in which the localized pages are showing up. You can also try rebuilding the current build as it's unlikely this would happen twice in a row.

Clone this wiki locally