This IRP covers incidents affecting Node.js web properties and supporting services operated by the @nodejs/web team.
For a list of covered services and repositories, refer to PERMISSIONS.md.
- Incident Commander (IC): Any
@nodejs/webmember who first takes charge.
Escalation:
IC → @nodejs/web-infra → @nodejs/web-admins → @nodejs/build (Cloudflare account/zone-critical) and/or @nodejs/security-wg (security incidents) -> @nodejs/tsc.
-
P0 – Critical user impact (global outage/defacement/security breach):
- Acknowledge: TBD
-
P1 – Major degradation (partial outage, broken downloads/docs on a locale/route):
- Acknowledge: TBD
-
P2 – Minor (noncritical errors, single integration down):
- Acknowledge: TBD
When in doubt, start at higher severity and downgrade later.
-
Declare severity; assign IC and Comms Lead.
-
Stabilize users first:
- Roll back to last good deploy
- If needed, enable Cloudflare “Under Attack/WAF rules” and emergency caching on critical paths.
-
Communicate: post an initial status summary and known impact; repeat per SLA. (Use blog/announcements or org channel as appropriate; precedent: public post-mortem for March 17 incident.
-
Contain & eradicate: revoke keys/tokens, disable compromised deploy hooks, patch, and purge caches safely.
-
Recover: redeploy clean artifact, validate, then progressively relax mitigations.
-
Review: draft a blameless post-mortem, impact, root cause, and follow-up engineering actions + process fixes
| Incident | Likely Cause | What users see | Immediate actions | Primary owner |
|---|---|---|---|---|
| Token/secret leak | Accidental commit or exposed CI logs. | Subsequent unauthorized changes/deploys. | Invalidate in provider; rotate in 1Password; hunt for usage in audit logs; force redeploy clean. | Service owner + Web-Admins. |
| Expired TLS/SSL certificate | Missed renewal or misconfigured auto-renew. | Browser warnings (“Connection not secure”), failed API calls. | Renew/redeploy certificate; validate chain; confirm monitoring alerts. | Infra + Build. |
| Outage due to misconfigured DNS | Incorrect DNS update or provider outage. | Users can’t reach service; domain not resolving. | Roll back DNS change; verify propagation; coordinate with DNS provider. | Infra + Build. |
| Compromised admin account | Phishing or weak MFA. | Unauthorized changes in systems. | Disable account; rotate credentials; audit changes; notify security. | Security WG + Account owner. |
Internal (private): @nodejs/web or @nodejs/web-infra channel/thread; if Cloudflare account action is required, loop in @nodejs/build.
Public (as needed): short status updates; if user impact was material, publish a brief blog post or addendum to an incident page (example precedent exists).
- Cloudflare account-level actions (e.g., role changes) are coordinated with @nodejs/build; Web-Infra holds write/admin depending on team (
web-infravsweb-admins). Keep this in mind when planning mitigations that require account scope.