Incident Response Plan (IRP)

Scope

This IRP covers incidents affecting Node.js web properties and supporting services operated by the @nodejs/web team.

For a list of covered services and repositories, refer to PERMISSIONS.md.

IC & Escalation

Incident Commander (IC): Any @nodejs/web member who first takes charge.

Escalation: IC → @nodejs/web-infra → @nodejs/web-admins → @nodejs/build (Cloudflare account/zone-critical) and/or @nodejs/security-wg (security incidents) -> @nodejs/tsc.

Severity Levels & SLAs

P0 – Critical user impact (global outage/defacement/security breach):
- Acknowledge: TBD
P1 – Major degradation (partial outage, broken downloads/docs on a locale/route):
- Acknowledge: TBD
P2 – Minor (noncritical errors, single integration down):
- Acknowledge: TBD

When in doubt, start at higher severity and downgrade later.

Canonical Response Workflow

Declare severity; assign IC and Comms Lead.
Stabilize users first:
- Roll back to last good deploy
- If needed, enable Cloudflare “Under Attack/WAF rules” and emergency caching on critical paths.
Communicate: post an initial status summary and known impact; repeat per SLA. (Use blog/announcements or org channel as appropriate; precedent: public post-mortem for March 17 incident.
Contain & eradicate: revoke keys/tokens, disable compromised deploy hooks, patch, and purge caches safely.
Recover: redeploy clean artifact, validate, then progressively relax mitigations.
Review: draft a blameless post-mortem, impact, root cause, and follow-up engineering actions + process fixes

Common Incidents — What Happens & What They Cause

Incident	Likely Cause	What users see	Immediate actions	Primary owner
Token/secret leak	Accidental commit or exposed CI logs.	Subsequent unauthorized changes/deploys.	Invalidate in provider; rotate in 1Password; hunt for usage in audit logs; force redeploy clean.	Service owner + Web-Admins.
Expired TLS/SSL certificate	Missed renewal or misconfigured auto-renew.	Browser warnings (“Connection not secure”), failed API calls.	Renew/redeploy certificate; validate chain; confirm monitoring alerts.	Infra + Build.
Outage due to misconfigured DNS	Incorrect DNS update or provider outage.	Users can’t reach service; domain not resolving.	Roll back DNS change; verify propagation; coordinate with DNS provider.	Infra + Build.
Compromised admin account	Phishing or weak MFA.	Unauthorized changes in systems.	Disable account; rotate credentials; audit changes; notify security.	Security WG + Account owner.

Communications

Internal (private): @nodejs/web or @nodejs/web-infra channel/thread; if Cloudflare account action is required, loop in @nodejs/build.

Public (as needed): short status updates; if user impact was material, publish a brief blog post or addendum to an incident page (example precedent exists).

Notes on authority & ownership

Cloudflare account-level actions (e.g., role changes) are coordinated with @nodejs/build; Web-Infra holds write/admin depending on team (web-infra vs web-admins). Keep this in mind when planning mitigations that require account scope.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incident Response Plan (IRP)

Scope

IC & Escalation

Severity Levels & SLAs

Canonical Response Workflow

Common Incidents — What Happens & What They Cause

Communications

Notes on authority & ownership

Uh oh!

FilesExpand file tree

INCIDENT_RESPONSE_PLAN.md

Latest commit

History

INCIDENT_RESPONSE_PLAN.md

File metadata and controls

Incident Response Plan (IRP)

Scope

IC & Escalation

Severity Levels & SLAs

Canonical Response Workflow

Common Incidents — What Happens & What They Cause

Communications

Notes on authority & ownership