|
1 | 1 | # pi-sre-mode |
2 | 2 |
|
3 | | -A Pi-native incident investigation package for Pi, with support for private overlays. |
| 3 | +An incident investigation mode for [Pi](https://github.com/mariozechner/pi-coding-agent). Open a terminal, start an incident, investigate with real tools, write a report — all without leaving Pi. |
4 | 4 |
|
5 | | -## What this is |
| 5 | +## Why |
6 | 6 |
|
7 | | -`pi-sre-mode` aims to distill the best parts of the llmduck idea into a Pi package: |
| 7 | +During an incident you're juggling metrics dashboards, log viewers, SSH sessions, and a dozen browser tabs. `pi-sre-mode` puts the investigation loop inside Pi so you can query metrics, grep logs, check service health, and build a timeline in one place. |
8 | 8 |
|
9 | | -- guided incident workflow inside Pi |
10 | | -- reusable SRE skills and prompts |
11 | | -- read-only safety guardrails |
12 | | -- connector / environment preflight checks |
13 | | -- support for private organization overlays without forking the public package |
| 9 | +It ships read-only guardrails by default so you don't accidentally `rm` or `systemctl restart` something mid-investigation. |
14 | 10 |
|
15 | | -This repo started docs-first and now includes the initial working scaffold: |
| 11 | +## What you get |
16 | 12 |
|
17 | | -- `extensions/incident-mode.ts` — main public extension |
18 | | -- `src/` — overlay types, state, checks, report helpers, template catalog |
19 | | -- `skills/` — generic SRE skills |
20 | | -- `prompts/` — generic incident prompt templates |
21 | | -- `examples/local-overlay/` — sample overlay package for local testing |
| 13 | +- **`/incident`** — set up investigation context: pick a template (5xx spike, high latency, OOM, etc.), name the service, set a time window. That context follows every subsequent prompt automatically. |
| 14 | +- **`/check-connectors`** — preflight check that your CLIs, auth, and environment are ready before you start digging. |
| 15 | +- **`/report`** — turn the investigation into a markdown report. |
| 16 | +- **`/sudo`** / **`/sudo-off`** — bypass or re-enable the read-only guardrails when you need to. |
| 17 | +- **Built-in investigation skills** — SRE methodology and a generic investigation playbook that guide Pi's reasoning. |
| 18 | +- **7 incident templates** — 5xx spike, high latency, OOM/crash loop, broker issues, service down, deploy regression, resource exhaustion, plus a blank "custom" template. |
22 | 19 |
|
23 | | -## Documentation map |
| 20 | +## Quick start |
24 | 21 |
|
25 | | -- `docs/README.md` — doc index |
26 | | -- `docs/000-overview.md` — project thesis, goals, and non-goals |
27 | | -- `docs/001-product-shape.md` — what the public package should feel like |
28 | | -- `docs/002-overlay-model.md` — how private overlays layer on top |
29 | | -- `docs/003-public-package-architecture.md` — public package structure and runtime behavior |
30 | | -- `docs/004-private-overlay-architecture.md` — private overlay package model |
31 | | -- `docs/005-mvp.md` — initial build scope |
32 | | -- `docs/006-build-plan.md` — phased implementation plan |
33 | | -- `docs/007-installation.md` — installation patterns for global public package + project overlay |
34 | | -- `docs/008-ecosystem-notes.md` — notes from the Pi package ecosystem |
35 | | -- `docs/009-release-checklist.md` — first public release checklist |
| 22 | +Install globally: |
36 | 23 |
|
37 | | -## Installation |
38 | | - |
39 | | -### Recommended real usage |
| 24 | +```bash |
| 25 | +pi install npm:pi-sre-mode |
| 26 | +``` |
40 | 27 |
|
41 | | -Install the public package globally in `~/.pi/agent/settings.json`: |
| 28 | +Or add to `~/.pi/agent/settings.json`: |
42 | 29 |
|
43 | 30 | ```json |
44 | 31 | { |
45 | | - "packages": [ |
46 | | - "npm:pi-sre-mode" |
47 | | - ] |
| 32 | + "packages": ["npm:pi-sre-mode"] |
48 | 33 | } |
49 | 34 | ``` |
50 | 35 |
|
51 | | -Install a private overlay project-locally in `.pi/settings.json`: |
| 36 | +Then in Pi: |
52 | 37 |
|
53 | | -```json |
54 | | -{ |
55 | | - "packages": [ |
56 | | - "git:git@github.com:your-org/pi-sre-overlay-zerodha.git" |
57 | | - ] |
58 | | -} |
| 38 | +``` |
| 39 | +/check-connectors # verify your environment |
| 40 | +/incident # set up context — pick a template, name the service |
| 41 | +investigate elevated p99 for payments-api, start with the timeline |
| 42 | +/report # generate a markdown report |
59 | 43 | ``` |
60 | 44 |
|
61 | | -### Local development |
| 45 | +## You don't always need `/incident` |
62 | 46 |
|
63 | | -```json |
64 | | -{ |
65 | | - "packages": [ |
66 | | - "/path/to/pi-sre-mode", |
67 | | - "/path/to/pi-sre-overlay-zerodha" |
68 | | - ] |
69 | | -} |
| 47 | +Use plain Pi for quick questions: |
| 48 | + |
| 49 | +- "check p99 latency for payments-api over the last 2h" |
| 50 | +- "compare error rates before and after the last deploy" |
| 51 | +- "summarize the Nomad allocation restarts today" |
| 52 | + |
| 53 | +Use `/incident` when you want persistent context, a structured template, guardrails, and a report at the end. |
| 54 | + |
| 55 | +## Private overlays |
| 56 | + |
| 57 | +The public package is generic on purpose. Your team's topology, runbooks, and internal tooling live in a **private overlay** — a separate Pi package that layers org-specific templates, skills, prompts, connector checks, and report paths on top. |
| 58 | + |
| 59 | +Install an overlay per-project: |
| 60 | + |
| 61 | +```bash |
| 62 | +pi install -l git:git@github.com:your-org/pi-sre-overlay.git |
70 | 63 | ``` |
71 | 64 |
|
72 | | -More detailed installation examples are in `docs/007-installation.md`. |
| 65 | +See the [overlay guide](./docs/overlay-guide.md) for how to build one. |
73 | 66 |
|
74 | | -## Status |
| 67 | +## Read-only by default |
75 | 68 |
|
76 | | -MVP scaffold implemented and validated with both public-only and overlay smoke tests. |
| 69 | +During an active incident, `pi-sre-mode` blocks: |
77 | 70 |
|
78 | | -## Smoke test |
| 71 | +- file writes and edits |
| 72 | +- `rm`, `mv`, `sudo`, `kill`, `chmod`, `chown` |
| 73 | +- `systemctl restart/stop`, `nomad job run/stop` |
| 74 | +- mutating AWS CLI commands (create, delete, terminate, etc.) |
| 75 | +- shell trampolines (`bash -c`, `eval`, subshells) |
79 | 76 |
|
80 | | -A local smoke test is included at: |
| 77 | +Use `/sudo` to temporarily disable these guardrails. `/sudo-off` re-enables them. |
81 | 78 |
|
82 | | -- `examples/smoke-test/smoke-test.mjs` |
| 79 | +## Commands |
83 | 80 |
|
84 | | -Example: |
| 81 | +| Command | Purpose | |
| 82 | +|---|---| |
| 83 | +| `/incident` | Start or update investigation context | |
| 84 | +| `/incident-reset` | Clear incident context | |
| 85 | +| `/check-connectors` | Run environment preflight checks | |
| 86 | +| `/report` | Generate a markdown investigation report | |
| 87 | +| `/sudo` | Bypass read-only guardrails | |
| 88 | +| `/sudo-off` | Re-enable read-only guardrails | |
85 | 89 |
|
86 | | -```bash |
87 | | -cd /path/to/pi-sre-mode |
| 90 | +## How it works |
| 91 | + |
| 92 | +`pi-sre-mode` is built entirely on Pi's extension API — no external server, no separate UI, no agent framework. Everything runs inside your Pi session. |
| 93 | + |
| 94 | +- **Prompt injection** — when incident mode is active, `before_agent_start` automatically prepends the incident context (template, service, time window, guardrails) to every prompt. Pi investigates with full awareness of what you're looking at. |
| 95 | +- **Tool interception** — `tool_call` hooks inspect every command before execution and block dangerous ones. This is how read-only guardrails work without a custom sandbox. |
| 96 | +- **Session state** — incident context is persisted in Pi's session entries, so it survives reloads, branches, and forks. Navigate the session tree and your incident follows. |
| 97 | +- **Interactive UI** — `/incident` uses Pi's built-in `select`, `input`, and `confirm` primitives for the setup wizard. Status line and widget show the active incident at a glance. |
| 98 | +- **Inter-extension events** — overlays register themselves by emitting events that the public package listens for. No tight coupling, no imports between packages. |
| 99 | +- **Skills and prompts** — shipped as standard Pi skills/prompts in the package manifest. Pi discovers them automatically. |
| 100 | + |
| 101 | +This means the extension is thin orchestration. The real value is in the skills, prompts, and templates — content that's easy to write and easy to override. |
| 102 | + |
| 103 | +## Docs |
| 104 | + |
| 105 | +- [Getting started](./docs/getting-started.md) |
| 106 | +- [Building an overlay](./docs/overlay-guide.md) |
| 107 | +- [Installation patterns](./docs/installation.md) |
| 108 | +- [Troubleshooting](./docs/troubleshooting.md) |
| 109 | + |
| 110 | +## Examples |
| 111 | + |
| 112 | +- [`examples/local-overlay/`](./examples/local-overlay/) — minimal overlay for testing |
| 113 | +- [`examples/smoke-test/`](./examples/smoke-test/) — automated smoke test via Pi RPC |
| 114 | + |
| 115 | +## Local development |
88 | 116 |
|
89 | | -# Public package only |
| 117 | +```json |
| 118 | +{ |
| 119 | + "packages": [ |
| 120 | + "/path/to/pi-sre-mode", |
| 121 | + "/path/to/your-overlay" |
| 122 | + ] |
| 123 | +} |
| 124 | +``` |
| 125 | + |
| 126 | +```bash |
| 127 | +# public package only |
90 | 128 | bun run smoke-test -- --public-only |
91 | 129 |
|
92 | | -# Public package + overlay |
| 130 | +# with an overlay |
93 | 131 | bun run smoke-test -- --overlay /path/to/private-overlay |
94 | 132 | ``` |
95 | | - |
96 | | -Current commands in the public extension: |
97 | | -- `/incident` |
98 | | -- `/incident-reset` |
99 | | -- `/sudo` |
100 | | -- `/sudo-off` |
101 | | -- `/check-connectors` |
102 | | -- `/report` |
103 | | - |
104 | | -Current features: |
105 | | -- persisted incident-mode session state |
106 | | -- prompt injection via `before_agent_start` |
107 | | -- read-only blocking for `write` / `edit` and unsafe bash patterns while incident guardrails are active |
108 | | -- separate `/sudo` mode to bypass incident permission checks when needed |
109 | | -- connector preflight checks |
110 | | -- markdown report generation |
111 | | -- overlay registration via `incident-mode:register-overlay` |
112 | | -- RPC-based smoke test for package + overlay integration |
|
0 commit comments