Skip to content

ci(design): extract design documents#1212

Merged
glottologist merged 6 commits intomasterfrom
jason/design_extraction_new
Mar 19, 2026
Merged

ci(design): extract design documents#1212
glottologist merged 6 commits intomasterfrom
jason/design_extraction_new

Conversation

@glottologist
Copy link
Contributor

@glottologist glottologist commented Mar 13, 2026

Describe the changes
A clear and concise description of what the bug fix, feature, or improvement is.

Related Issue(s)
Please link to the issue(s) that will be closed with this PR.

Checklist

  • Tests have been added/updated for the changes.
  • Documentation has been updated for the changes (if applicable).
  • The code follows Rust's style guidelines.

Additional Context
Add any other context about the pull request here.

Summary by CodeRabbit

  • New Features

    • Added a /design-extraction command to extract design and architecture decisions from a pull request and generate ADRs.
    • Added an automated workflow that runs on PR approval or via manual trigger to generate ADRs, push them to the PR, and mark the PR as design-extracted; includes delimiter-collision guards and fork/label safeguards.
  • Documentation

    • Added user-facing docs describing the command, required input, ADR format and filename/content constraints, idempotency rules, and expected behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

Important

Review skipped

This PR was authored by the user configured for CodeRabbit reviews. CodeRabbit does not review PRs authored by this user. It's recommended to use a dedicated user account to post CodeRabbit review feedback.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b30c6fc4-5105-4ce9-a15d-5a76568dcdb4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Design Extraction skill and a GitHub Actions workflow that locate a PR design plan, aggregate PR body/comments/reviews, invoke Claude to generate ADR markdown files under design/docs/, validate changes, push ADRs to the PR head, and apply a design-extracted label. (50 words)

Changes

Cohort / File(s) Summary
Design Extraction Skill
​.claude/skills/design-extraction/SKILL.md
New skill document declaring the public /design-extraction <pr-number> command and a detailed execution protocol: fetch PR metadata/comments, extract marker-delimited design plan, redact/filter comments, deduplicate against design/docs/*.md, decide create/append/update actions, enforce ADR filename/content rules and file-size limits, and emit a post-run summary.
GitHub Actions Workflow
.github/workflows/design-extraction.yml
New workflow triggered on PR review approval and manual dispatch (with guards). Resolves PR metadata, checks fork/label/delimiter collisions, checks out PR head and a trusted base snapshot (sparse), fetches paginated comments/reviews, invokes npx claude using ANTHROPIC_API_KEY to produce ADRs in design/docs/, validates only design/docs/ changed, commits/pushes to PR head, and applies the design-extracted label.

Sequence Diagram

sequenceDiagram
    actor Developer as Developer
    participant GitHub as GitHub
    participant Workflow as "GitHub Actions"
    participant API as "GitHub API"
    participant Git as "Git Operations"
    participant Claude as "Claude CLI"

    Developer->>GitHub: Open PR / Approve review
    GitHub->>Workflow: Trigger workflow
    Workflow->>API: Resolve PR details (number, SHAs, refs, labels, body)
    API-->>Workflow: PR metadata
    Workflow->>Workflow: Guards (fork, existing label, delimiter collisions)
    Workflow->>Git: Checkout PR head and trusted base (sparse)
    Workflow->>Workflow: Extract design plan from PR body
    Workflow->>API: Fetch PR comments & reviews (paginated)
    API-->>Workflow: Aggregated comments blob
    Workflow->>Claude: Send extraction prompt (plan + comments)
    Claude-->>Workflow: Generated ADR files in design/docs/
    Workflow->>Workflow: Validate changed paths (only design/docs/)
    alt changes valid
      Workflow->>Git: Commit & push ADRs to PR head
      Workflow->>API: Apply `design-extracted` label
      API-->>GitHub: Label updated
    else invalid changes
      Workflow->>Workflow: Fail workflow
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • DanMacDonald
  • antouhou
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'ci(design): extract design documents' directly summarizes the main change: a CI workflow that extracts design documentation from PRs into ADR files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/design-extraction/SKILL.md:
- Around line 109-123: The fenced code block showing the CodeRabbit comment (the
triple-backtick block that begins with "⚠️ Potential issue | 🟡 Minor") should
include a language specifier to improve rendering consistency; update that
opening fence from ``` to ```text (or ```markdown) so the block reads as a
labeled code block and keep the content unchanged inside the block.

In @.github/workflows/design-extraction.yml:
- Around line 199-209: The validation step "Validate output" currently excludes
.claude/skills/ from the INVALID detection (see the INVALID assignment and the
grep -v '^\.claude/skills/'); remove that exclusion so any changes outside
design/docs/ — including .claude/skills/ — are treated as invalid, or explicitly
add a guard that fails if git diff or git ls-files shows changes under
.claude/skills/ (i.e., update the INVALID command or add a separate check
referencing the same git commands and the .claude/skills/ path) so only files in
design/docs/ are permitted.
- Around line 142-156: The current curl queries that populate COMMENTS,
REVIEW_COMMENTS, and REVIEWS use per_page=100 and will drop results when there
are more than 100 items; update these three fetches to handle pagination (either
replace the curl+jq calls with gh api --paginate for each endpoint or implement
Link-header pagination loop with curl) so all issue comments, PR review
comments, and reviews are aggregated; ensure the final variable assembly and jq
processing remain the same (still producing the combined markdown strings) after
switching to paginated fetching.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f7b26458-fe99-4daa-a99c-de854982f5ac

📥 Commits

Reviewing files that changed from the base of the PR and between faeb1bd and 33fcc93.

📒 Files selected for processing (2)
  • .claude/skills/design-extraction/SKILL.md
  • .github/workflows/design-extraction.yml

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/design-extraction.yml:
- Around line 87-94: Remove the nested "Checkout trusted skill from base branch"
checkout (the step that uses actions/checkout@v4 with path: trusted-base and
sparse-checkout: .claude/skills/design-extraction) that runs before the output
validation; this nested checkout causes git ls-files --others --exclude-standard
to see trusted-base/ as untracked and fail validation. Move or remove that
checkout so validation runs against only the main worktree (e.g., perform the
trusted-base checkout after validation or use a checkout strategy that does not
place files inside the current worktree), and ensure any references to
trusted-base in subsequent steps are updated accordingly.
- Around line 234-243: The label step is conditioned on
steps.design-plan.outputs.content but can run even when no files were committed;
update the "Commit and push" step (or its equivalent) to detect whether a commit
actually occurred (reuse the existing git diff --staged --quiet check) and set a
step output like design_changed=true only when git commit/push runs, then change
the label step's if condition to depend on that output (e.g.,
steps.commit-and-push.outputs.design_changed == 'true') so the label is only
applied when ADR files were actually written; apply the same pattern to the
other similar commit/label block as well.
- Around line 118-120: The workflow's PR_BODY env is only set from
steps.pr.outputs.body so comment-only design plans posted as issue/review
comments (steps.design-plan.outputs.content) are ignored; update the env
assignment for PR_BODY to prefer/merge steps.design-plan.outputs.content when it
is non-empty (fall back to steps.pr.outputs.body /
github.event.pull_request.body otherwise) and also adjust the gating condition
that checks steps.design-plan.outputs.content so it accepts the combined PR_BODY
or checks both sources (references: PR_BODY, steps.pr.outputs.body,
steps.design-plan.outputs.content and the gate around the step at the later
block currently checking steps.design-plan.outputs.content).
- Around line 106-114: The "Install trusted skill" step currently blindly
removes and copies to .claude/skills/design-extraction and can follow symlinks;
before running rm -rf or cp -r, reject any symlinked paths by verifying .claude
and .claude/skills are not symlinks (use a test like -L) and/or resolve their
absolute paths (readlink -f) and confirm they reside inside the current checkout
(GITHUB_WORKSPACE); if either check fails, abort this step with a warning
instead of performing rm or cp to avoid deleting or writing outside the repo.
- Around line 142-168: paginate_curl builds an invalid JSON array by streaming
each item with jq -c '.[]' into a temp file that only inserts commas between
pages, not items; fix it by stopping the per-item emit and instead append each
page as a JSON array (preserve the raw "$body" for each page) and then combine
pages with jq -s 'add' (or equivalent) to flatten arrays correctly; update calls
inside paginate_curl (replace the jq -c '.[]' write and the final jq -s
'flatten' step) so the temp file contains valid JSON arrays per page and they
are merged safely at the end.
- Around line 121-126: The embedded Python passed to `python3 -c` (used to set
PLAN) is indented and will raise an IndentationError; replace this with a
heredoc invocation so the script is provided as a raw multi-line block (read
from stdin) instead of a quoted string. Update the block that sets PLAN (the
command using `python3 -c` and referencing os.environ['PR_BODY'] and the regex
search for '<!-- design plan -->...<!-- end of design plan -->') to use a
heredoc to feed the same Python code (no leading-space issues) and preserve the
existing logic that prints m.group(1).strip() if m else body.strip().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5dc140be-827b-47c8-91ab-5cde926ce04e

📥 Commits

Reviewing files that changed from the base of the PR and between 33fcc93 and 0498982.

📒 Files selected for processing (2)
  • .claude/skills/design-extraction/SKILL.md
  • .github/workflows/design-extraction.yml

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/design-extraction/SKILL.md:
- Around line 42-45: The pagination doc is incomplete: update the fetch flow for
the three sources
(`https://api.github.com/repos/{owner}/{repo}/issues/<pr-number>/comments?per_page=100`,
`https://api.github.com/repos/{owner}/{repo}/pulls/<pr-number>/comments?per_page=100`,
and
`https://api.github.com/repos/{owner}/{repo}/pulls/<pr-number>/reviews?per_page=100`)
to explicitly show capturing the HTTP response headers, parsing the Link header
for rel="next", and looping until no next link; specify example pseudocode
steps: perform request, read the response `Link` header, extract the URL for
rel="next" (if present), append results, then request the next URL repeatedly
until `rel="next"` is absent, and mention handling rate limits / empty Link
gracefully.
- Around line 175-180: Update the SKILL.md policy to explicitly require a
validation/enforcement step that counts lines before performing create/append
actions: for "create" ensure the new ADR content is <=500 lines and reject or
split into multiple cross-referenced files if it exceeds the limit; for "append"
compute current file length + new section length and either split the appended
content into new files with cross-references or reject the append if it would
exceed 500 lines. Add wording describing the enforcement behavior and expected
error/response when the limit is exceeded so implementations of the "create" and
"append" actions know to validate line counts prior to writing.
- Around line 95-104: Add a redaction guard to the PR comment processing flow
described under "Analyse PR Comments" so that any content written to ADRs
(design/docs/) is sanitized: before persisting, scan the PR comment body for
secrets/tokens/credentials/PII and either redact or remove matching values, and
ensure the existing "CodeRabbit Comment Filtering" logic still strips scaffolded
sections and only keeps the issue description; implement this sanitization step
immediately prior to the write-to-disk routine that creates files in
design/docs/ so no raw sensitive values are committed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0893b7e9-4c9f-4558-b7e5-03310018fce6

📥 Commits

Reviewing files that changed from the base of the PR and between 0498982 and 2909463.

📒 Files selected for processing (1)
  • .claude/skills/design-extraction/SKILL.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
.github/workflows/design-extraction.yml (2)

126-131: ⚠️ Potential issue | 🔴 Critical

The embedded Python code will fail with IndentationError due to leading whitespace.

The string passed to python3 -c preserves YAML indentation, causing Python to see leading spaces before import os, re. This will fail at runtime with "IndentationError: unexpected indent".

🐛 Proposed fix using heredoc
-          PLAN=$(python3 -c "
-          import os, re
-          body = os.environ['PR_BODY']
-          m = re.search(r'<!--\s*design plan\s*-->(.*?)<!--\s*end of design plan\s*-->', body, re.DOTALL)
-          print(m.group(1).strip() if m else body.strip())
-          ")
+          PLAN=$(python3 <<'PY'
+import os, re
+body = os.environ['PR_BODY']
+m = re.search(r'<!--\s*design plan\s*-->(.*?)<!--\s*end of design plan\s*-->', body, re.DOTALL)
+print(m.group(1).strip() if m else body.strip())
+PY
+          )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/design-extraction.yml around lines 126 - 131, The embedded
Python snippet assigned to PLAN is failing due to preserved YAML indentation
causing an IndentationError; replace the inline python3 -c string with a heredoc
so Python sees no leading spaces: run python3 with a here-document (e.g.,
python3 <<'PY' ... PY) or use a YAML block scalar that left-aligns the code,
keep the same logic (import os, re; read os.environ['PR_BODY']; regex search)
and print the matched group or body, and capture its stdout into PLAN to
preserve behavior; update the invocation that sets PLAN to use this
heredoc-based python invocation.

121-139: 🧹 Nitpick | 🔵 Trivial

Design plans in PR comments are not extracted.

The design plan extraction only searches PR_BODY for <!-- design plan --> markers. If a design plan is posted in a PR comment rather than the body, it will be included in PR_COMMENTS but not recognized as the primary design source. This may be intentional, but consider documenting this behavior or extending the search to comments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/design-extraction.yml around lines 121 - 139, The current
extraction step only reads PR_BODY (steps.pr.outputs.body) into PLAN and runs
the regex to find <!-- design plan --> blocks, so design plans posted in
comments (e.g., PR_COMMENTS) are ignored; update the job to also read and search
PR_COMMENTS (or concatenate steps.pr.outputs.comments with
steps.pr.outputs.body) before running the regex, then apply the same regex logic
to the combined text (the PLAN variable and regex/search block) and preserve the
DELIM collision check and GITHUB_OUTPUT writes; alternatively document in the
workflow that only the PR body is scanned if you choose not to change behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.github/workflows/design-extraction.yml:
- Around line 126-131: The embedded Python snippet assigned to PLAN is failing
due to preserved YAML indentation causing an IndentationError; replace the
inline python3 -c string with a heredoc so Python sees no leading spaces: run
python3 with a here-document (e.g., python3 <<'PY' ... PY) or use a YAML block
scalar that left-aligns the code, keep the same logic (import os, re; read
os.environ['PR_BODY']; regex search) and print the matched group or body, and
capture its stdout into PLAN to preserve behavior; update the invocation that
sets PLAN to use this heredoc-based python invocation.
- Around line 121-139: The current extraction step only reads PR_BODY
(steps.pr.outputs.body) into PLAN and runs the regex to find <!-- design plan
--> blocks, so design plans posted in comments (e.g., PR_COMMENTS) are ignored;
update the job to also read and search PR_COMMENTS (or concatenate
steps.pr.outputs.comments with steps.pr.outputs.body) before running the regex,
then apply the same regex logic to the combined text (the PLAN variable and
regex/search block) and preserve the DELIM collision check and GITHUB_OUTPUT
writes; alternatively document in the workflow that only the PR body is scanned
if you choose not to change behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6abcd407-ab6d-47c6-9d67-0f4fd1be7470

📥 Commits

Reviewing files that changed from the base of the PR and between 2909463 and f75f40a.

📒 Files selected for processing (1)
  • .github/workflows/design-extraction.yml

Copy link
Member

@JesseTheRobot JesseTheRobot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

@glottologist glottologist merged commit bc0dd4c into master Mar 19, 2026
28 of 29 checks passed
@glottologist glottologist deleted the jason/design_extraction_new branch March 19, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants