Skip to content

Emergency Runbook

Zack Stickles edited this page Mar 5, 2021 · 11 revisions

This site is hosted on AWS Amplify and is maintained / supported by New Relic's Developer Experience team. If you have any questions or comments, feel free to create an issue or reach out to us at [email protected].

Helpful links

Troubleshooting

Scenario Severity Resolution
Site is not loading ❗ High Rollback a release
Functionality is broken ⚠️ Medium Rollback a release
Alert has been triggered ⚠️ Medium Respond to an incident
Copy needs to be adjusted 👀 Unknown Create an issue or ping @hero in #documentation

Rollback a release

If the site is not loading, or a piece of functionality is broken, you may will likely need to rollback to a stable release using the following steps. There are two ways to rollback a release:

Via Amplify

If you are an engineer who has access to the Amplify console, the following steps will be the fastest way to rollback a release:

  1. Log into the Amplify console via nr-prod okta.
  2. Select the developer-webiste app.
  3. Under Frontend environments, select main.
  4. Click the View build history button to see the previous builds that have run.
  5. Find the appropriate build and click Build #xxx to select it.
  6. Click Redeploy this version to initiate a validation & deployment.
  7. Notify the engineering team so that the issue which caused the rollback can be addressed.

Via Github

If you do not have access to Amplify, you can perform a rollback using Github:

  1. Find the pull request (into main) that you would like to rollback.
  2. Click Revert to create a new pull request that undoes this work.
  3. Have someone review the rollback and approve the pull request.
  4. Once the necessary checks have passed, merge into main.
  5. A build will be triggered in Amplify. Once complete, the rollback will be released.
  6. Notify the engineering team so that the issue which caused the rollback can be addressed.

Respond to an incident

The following steps are for on-call engineers working at New Relic:

  1. Don't panic, you've got this!
  2. Check to see if there is already an ongoing incident in #emergency-room (or in 2, 3, and 4).
  3. If there is not an ongoing incident, start one by following the steps in the Incident Commander Runbook.
  4. Refer to the troubleshooting dashboard to get an idea for what could be going on.
  5. Look at the recent deployments to production to identify a PR that can be reverted.

Common Issues

Build Failing

TypeError:Cannot read property 'slug' of undefined

If you run into this error, it is likely the generic error coming from Gatsby (#1164). The solution to this is to "reset" the cache by making an update to one of the gatsby- files.

Clone this wiki locally