Skip to content

Commit bc0dd4c

Browse files
authored
ci(design): extract design documents (#1212)
* ci(design): extract design documents * refactor(design): review comments * refactor(design): replace gh calls * refactor(design): review comments * refactor(design): review comments * fix(design): add setup-node step for self-hosted runner
1 parent e8f7fa9 commit bc0dd4c

File tree

2 files changed

+487
-0
lines changed

2 files changed

+487
-0
lines changed
Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
---
2+
name: design-extraction
3+
description: Extract architectural and design decisions from a GitHub PR into ADR (Architecture Decision Record) files in design/docs/. Use this skill whenever the user wants to extract design decisions from a PR, create ADRs from PR discussions, document architectural choices from pull requests, or mentions "design extraction" or "design decisions" in the context of a PR. Also triggers for "extract ADRs", "document decisions from PR", or "pull design docs from PR".
4+
---
5+
6+
# Design Extraction
7+
8+
Extract architectural and design decisions from a GitHub PR and write them as ADR files in `design/docs/`.
9+
10+
## Command
11+
12+
```text
13+
/design-extraction <pr-number>
14+
```
15+
16+
## Arguments
17+
18+
- `pr-number` (required): The GitHub PR number to extract decisions from.
19+
20+
## What This Does
21+
22+
PRs often contain important architectural decisions buried in descriptions and comment threads. This skill pulls those decisions out and writes them as structured ADR (Architecture Decision Record) files — short documents that capture *what* was decided, *why*, and *what follows from it*. Future contributors can then understand the reasoning behind the system's design without trawling through old PRs.
23+
24+
## Execution Protocol
25+
26+
### 1. Resolve PR Details
27+
28+
Fetch the PR metadata, body, and all comments using curl and jq (the `gh` CLI is not available on the runner):
29+
30+
```bash
31+
curl -fsSL \
32+
-H "Authorization: Bearer $GH_TOKEN" \
33+
-H "Accept: application/vnd.github+json" \
34+
"https://api.github.com/repos/{owner}/{repo}/pulls/<pr-number>"
35+
```
36+
37+
Extract:
38+
- PR number, title
39+
- PR body (the description)
40+
- Head ref, base ref
41+
42+
Then fetch all three comment sources (paginate by capturing response headers with `curl -D`, extracting the `Link: <url>; rel="next"` URL, and repeating until no `rel="next"` link is present):
43+
- **Issue comments**: `https://api.github.com/repos/{owner}/{repo}/issues/<pr-number>/comments?per_page=100`
44+
- **Review comments** (inline on code): `https://api.github.com/repos/{owner}/{repo}/pulls/<pr-number>/comments?per_page=100`
45+
- **Review bodies**: `https://api.github.com/repos/{owner}/{repo}/pulls/<pr-number>/reviews?per_page=100`
46+
47+
Display the PR title and URL for context.
48+
49+
### 2. Extract the Design Plan
50+
51+
Search for a design plan between HTML comment markers in **both** the PR body and all PR comments:
52+
53+
```html
54+
<!-- design plan -->
55+
... design content here ...
56+
<!-- end of design plan -->
57+
```
58+
59+
Check for these markers in this order:
60+
1. **PR body** — check first
61+
2. **Issue comments** — check all comments chronologically
62+
3. **Review comments** (inline on code) — check all
63+
4. **Review bodies** — check all
64+
65+
Extract content from **every** location where markers are found. If markers appear in multiple places, concatenate the extracted sections in the order listed above — later sections may refine or extend earlier ones.
66+
67+
This combined extracted content is the **primary** source of decisions.
68+
69+
If no design plan markers exist anywhere, use the full PR body as the source material — but apply a higher bar for what counts as a "decision" (skip vague descriptions, feature lists without rationale, etc.).
70+
71+
### 3. Assess Whether There Are Decisions to Extract
72+
73+
If the design plan (or PR body) contains no concrete decisions — it's a placeholder like "TBD", "TODO", "see Slack", "WIP", or just a feature description without architectural rationale — stop and tell the user:
74+
75+
```text
76+
No actionable design decisions found in PR #<number>.
77+
```
78+
79+
A "decision" means a deliberate choice between alternatives with stated rationale — not just a description of what was built.
80+
81+
### 4. Read Existing Design Docs
82+
83+
```text
84+
Glob: design/docs/*.md
85+
```
86+
87+
Read every existing file to understand what's already documented. This is essential for deduplication and for deciding whether to create new files or append to existing ones.
88+
89+
If `design/docs/` doesn't exist, create it:
90+
91+
```bash
92+
mkdir -p design/docs
93+
```
94+
95+
### 5. Analyse PR Comments
96+
97+
PR comments may contain amendments, clarifications, or explicit decisions that refine or supersede the design plan. When processing comments:
98+
99+
- **Redact sensitive data** before writing to ADR files — strip API keys, tokens, secrets, credentials, and PII. Never persist sensitive values to `design/docs/`.
100+
- **Override the plan** only when a comment contains a clear resolution, correction, or final decision (e.g., "We decided to go with X instead", "After discussion, the approach is Y")
101+
- **Ignore** questions, speculative remarks, and casual discussion
102+
103+
#### CodeRabbit Comment Filtering
104+
105+
Comments from `@coderabbitai` (CodeRabbit) have a specific structure. Only the **issue description** at the top of the comment is useful input — the rest is machine-generated scaffolding that must be stripped. Specifically:
106+
107+
- **Keep**: The initial description of the issue (everything before the first section marker below)
108+
- **Strip entirely**:
109+
- `🧩 Analysis chain` section and all content under it
110+
- `🤖 Prompt for AI Agents` section and all content under it
111+
- Any `🏁 Scripts executed` section
112+
113+
For example, given a CodeRabbit comment like:
114+
115+
```text
116+
⚠️ Potential issue | 🟡 Minor
117+
118+
Improve handling of large integers in JSON-to-TOML conversion.
119+
120+
The fallback to as_f64() can lose precision for large integers...
121+
122+
🧩 Analysis chain
123+
<... strip everything from here down ...>
124+
125+
🤖 Prompt for AI Agents
126+
<... strip everything from here down ...>
127+
```
128+
129+
Only feed the text **above** the first `🧩` or `🤖` marker into the decision extraction pipeline. The analysis chain and AI agent prompts are CodeRabbit internals, not human design decisions.
130+
131+
#### Other Bot Comments
132+
133+
Ignore comments from other bots (GitHub Actions, CI bots, etc.) unless they contain design plan markers (`<!-- design plan -->`).
134+
135+
### 6. Deduplicate
136+
137+
Before writing anything, check each extracted decision against existing design docs. Match by **topic and substance** — if the same decision exists with different wording, skip it. This prevents duplication when the skill is run multiple times or across related PRs.
138+
139+
### 7. Determine Actions
140+
141+
For each decision or coherent set of related decisions:
142+
143+
- **New distinct topic**`create` a new file
144+
- **Extends an existing doc**`append` to that file
145+
- **Relates to multiple existing docs** → update the most closely related one and add cross-references: `See also: [Related Topic](related-topic.md)`
146+
147+
### 8. Write ADR Files
148+
149+
Each ADR follows this structure:
150+
151+
```markdown
152+
# <Descriptive Title>
153+
154+
## Status
155+
Accepted
156+
157+
## Context
158+
<What problem or requirement prompted this decision>
159+
160+
## Decision
161+
<What was decided and why>
162+
163+
## Consequences
164+
- <Positive, negative, and neutral consequences>
165+
166+
## Source
167+
PR #<number> — <title>
168+
```
169+
170+
**Filename rules:**
171+
- Kebab-case derived from the topic (e.g., `authentication-strategy.md`, `data-pipeline-architecture.md`)
172+
- No dates or sequence numbers
173+
- Only lowercase letters, digits, and hyphens; must start with a letter or digit
174+
- Must match: `^[a-z0-9][a-z0-9-]*\.md$`
175+
176+
**Content limits:**
177+
- 500 lines maximum per file. If a topic needs more, split into multiple files with cross-references.
178+
179+
Before any write or append, verify the resulting file will not exceed 500 lines. If it would, split the content into a new related ADR file with cross-references instead.
180+
181+
**For `create` actions:** Write the full ADR to a new file in `design/docs/`.
182+
183+
**For `append` actions:** Add two blank lines then the new ADR section to the end of the existing file.
184+
185+
### 9. Present Summary
186+
187+
After writing, display:
188+
189+
```text
190+
Design Extraction Complete — PR #<number>: <title>
191+
192+
Created: <list of new files>
193+
Updated: <list of appended files>
194+
Skipped: <count> (already documented)
195+
196+
Summary: <one-line description of what was extracted>
197+
```
198+
199+
## ADR Example
200+
201+
For reference, here's what a well-formed ADR looks like:
202+
203+
```markdown
204+
# Tmpfs for Test Isolation
205+
206+
## Status
207+
Accepted
208+
209+
## Context
210+
Test runners share the host filesystem, causing test pollution between concurrent jobs.
211+
212+
## Decision
213+
Mount an 8 GB tmpfs at `/workspace/tmp` for each test runner container. Size is configurable via the `TMPFS_SIZE` environment variable. Regular storage at `/workspace` is preserved for cross-step artifacts.
214+
215+
## Consequences
216+
- Eliminates test pollution between concurrent jobs
217+
- RAM-backed storage improves I/O performance for test artifacts
218+
- Reduces available host memory by the configured tmpfs size per runner
219+
220+
## Source
221+
PR #42 — Add tmpfs for test runners
222+
```
223+
224+
## Important Guidelines
225+
226+
1. **Decisions, not descriptions**: Only extract deliberate architectural choices with rationale. "We use Redis" is not a decision. "We chose Redis over Memcached because we need pub/sub for real-time invalidation" is.
227+
2. **Preserve intent**: Capture the *why* faithfully. Don't paraphrase away the reasoning.
228+
3. **Be concise**: ADRs should be short and scannable. A few paragraphs per section, not essays.
229+
4. **No git operations**: Write files only. The user handles staging and committing.
230+
5. **Idempotent**: Running twice on the same PR should not create duplicates if the docs already exist.

0 commit comments

Comments
 (0)