chore: comprehensive repo cleanup + evidence assessor#384
chore: comprehensive repo cleanup + evidence assessor#384
Conversation
Implements the Portable Agentic Evidence Standard for ColdVox: - docs/reviews/portable_standard_critique.md: Philosophy document explaining why tautological unit tests are insufficient and the case for empirical evidence-based PR review. - docs/reviews/reviewer_driven_evidence.md: Workflow strategy describing how the reviewer-driven evidence process works, evidence tiers (1-5), and semantic drift detection patterns. - docs/plans/agentic-evidence-preview.md: System architecture spec for the shadow mode assessor: permissions, git diff strategy, token budget, failure modes, and Phase 2 considerations. - .github/prompts/evidence-assessor.md: The hardened CoT prompt that Gemini executes in CI. Includes explicit anti-hallucination constraints, structured output format, and ColdVox-specific ground truths (Moonshine fragile, Parakeet not ready, stubs dead). - .github/workflows/agentic-evidence-preview.yml: GitHub Actions workflow triggering on PR events. Uses fetch-depth: 0 for correct git diff, truncates diffs at 2000 lines, composes the full prompt with pre-gathered context, runs gemini-cli in non-interactive mode, and pipes the report to GITHUB_STEP_SUMMARY. Shadow mode: never blocks merges. GEMINI_API_KEY secret must be configured in repo settings.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 56750304e4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| FULL_PROMPT="${FULL_PROMPT/\{PR_TITLE_PLACEHOLDER\}/${PR_TITLE}}" | ||
| FULL_PROMPT="${FULL_PROMPT/\{PR_BODY_PLACEHOLDER\}/${PR_BODY}}" | ||
| FULL_PROMPT="${FULL_PROMPT/\{GIT_DIFF_PLACEHOLDER\}/${GIT_DIFF}}" | ||
| FULL_PROMPT="${FULL_PROMPT/\{DOCS_INDEX_PLACEHOLDER\}/${DOCS_INDEX}}" | ||
| FULL_PROMPT="${FULL_PROMPT/\{NORTHSTAR_EXCERPT_PLACEHOLDER\}/${NORTHSTAR}}" |
There was a problem hiding this comment.
Escape replacements when interpolating PR fields
The prompt composition uses Bash pattern substitution (${FULL_PROMPT/.../${PR_TITLE}} etc.) at lines 80–84, but in Bash with patsub_replacement enabled (default on Bash 5.2), & in replacement text expands to the matched pattern. A common PR title/body like A & B therefore corrupts the composed prompt by re-inserting placeholder text, which can break claim extraction and produce incorrect evidence reports.
Useful? React with 👍 / 👎.
| --- | ||
| doc_type: plan | ||
| subsystem: general |
There was a problem hiding this comment.
Restore removed canonical plan or migrate existing references
This commit deletes docs/plans/windows-multi-agent-recovery.md, but many repository anchors still point to that exact path (for example in README.md, docs/architecture.md, and docs/standards/agent-rules.md). Removing the file without updating those references leaves broken canonical guidance links for contributors and agents, which is a maintainability regression introduced by this change.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Adds a “shadow mode” CI reviewer that uses Gemini to assess whether PR descriptions’ material claims are backed by evidence in the diff (and to flag potential semantic drift), outputting an advisory report to the GitHub Actions Step Summary.
Changes:
- Adds a new GitHub Actions workflow to gather PR context, run
@google/gemini-cli, and publish a Step Summary report (non-blocking). - Introduces a hardened prompt template for the evidence assessor and accompanying docs/reviews describing the evidence standard.
- Updates planning/docs around the assessor architecture and reviewer-driven evidence workflow (and removes the existing Windows recovery plan file).
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
docs/reviews/reviewer_driven_evidence.md |
Defines the reviewer-driven evidence workflow and evidence tiers. |
docs/reviews/portable_standard_critique.md |
Rationale/critique motivating the evidence standard approach. |
docs/prompts/review-and-implement-evidence-assessor.md |
Meta-prompt documenting how the implementation was reviewed/derived. |
docs/plans/windows-multi-agent-recovery.md |
Removes the prior Windows recovery plan (repo execution anchor). |
docs/plans/agentic-evidence-preview.md |
Specifies the assessor system architecture and intended workflow behavior. |
.github/workflows/agentic-evidence-preview.yml |
Implements the shadow-mode assessor workflow that runs Gemini CLI and writes Step Summary output. |
.github/prompts/evidence-assessor.md |
Prompt template the workflow fills with PR/diff/docs context. |
| FULL_PROMPT="${FULL_PROMPT/\{PR_TITLE_PLACEHOLDER\}/${PR_TITLE}}" | ||
| FULL_PROMPT="${FULL_PROMPT/\{PR_BODY_PLACEHOLDER\}/${PR_BODY}}" | ||
| FULL_PROMPT="${FULL_PROMPT/\{GIT_DIFF_PLACEHOLDER\}/${GIT_DIFF}}" | ||
| FULL_PROMPT="${FULL_PROMPT/\{DOCS_INDEX_PLACEHOLDER\}/${DOCS_INDEX}}" | ||
| FULL_PROMPT="${FULL_PROMPT/\{NORTHSTAR_EXCERPT_PLACEHOLDER\}/${NORTHSTAR}}" |
There was a problem hiding this comment.
The placeholder substitutions won’t match the template tokens because the pattern includes literal backslashes (e.g., \{PR_TITLE_PLACEHOLDER\}) while the prompt file contains {PR_TITLE_PLACEHOLDER}. As a result, the composed prompt will still contain placeholders and Gemini won’t receive the PR title/body/diff/docs context. Fix by matching the exact token strings (remove the backslashes) or switch to a more robust templating approach (e.g., a small Python script that reads the template and replaces tokens).
| FULL_PROMPT="${FULL_PROMPT/\{PR_TITLE_PLACEHOLDER\}/${PR_TITLE}}" | |
| FULL_PROMPT="${FULL_PROMPT/\{PR_BODY_PLACEHOLDER\}/${PR_BODY}}" | |
| FULL_PROMPT="${FULL_PROMPT/\{GIT_DIFF_PLACEHOLDER\}/${GIT_DIFF}}" | |
| FULL_PROMPT="${FULL_PROMPT/\{DOCS_INDEX_PLACEHOLDER\}/${DOCS_INDEX}}" | |
| FULL_PROMPT="${FULL_PROMPT/\{NORTHSTAR_EXCERPT_PLACEHOLDER\}/${NORTHSTAR}}" | |
| FULL_PROMPT="${FULL_PROMPT/{PR_TITLE_PLACEHOLDER}/${PR_TITLE}}" | |
| FULL_PROMPT="${FULL_PROMPT/{PR_BODY_PLACEHOLDER}/${PR_BODY}}" | |
| FULL_PROMPT="${FULL_PROMPT/{GIT_DIFF_PLACEHOLDER}/${GIT_DIFF}}" | |
| FULL_PROMPT="${FULL_PROMPT/{DOCS_INDEX_PLACEHOLDER}/${DOCS_INDEX}}" | |
| FULL_PROMPT="${FULL_PROMPT/{NORTHSTAR_EXCERPT_PLACEHOLDER}/${NORTHSTAR}}" |
| on: | ||
| pull_request: | ||
| types: [opened, synchronize, ready_for_review] | ||
| branches: | ||
| - main | ||
|
|
There was a problem hiding this comment.
This workflow is restricted to PRs targeting main via on.pull_request.branches. The spec/docs in this PR describe running on “every PR” and also call out supporting PRs targeting release branches (by using github.event.pull_request.base.ref). If you want the assessor to run for non-main base branches, remove the branches: [main] filter (or update the spec/docs to match the intended scope).
| # --model: gemini-2.5-flash balances quality and cost for this use case. | ||
| # Input is piped from the composed prompt file; stdout is the report. | ||
| npx --yes @google/gemini-cli \ | ||
| --model gemini-2.0-flash \ |
There was a problem hiding this comment.
The inline comment and the plan/spec say the workflow uses gemini-2.5-flash, but the actual CLI invocation uses --model gemini-2.0-flash. Please align the implementation and documentation (either update the workflow to the intended model, or update the docs/comment so reviewers know what is actually running).
| --model gemini-2.0-flash \ | |
| --model gemini-2.5-flash \ |
| │ Step 1: actions/checkout (fetch-depth: 0) │ | ||
| │ Step 2: Fetch base branch ref explicitly │ | ||
| │ Step 3: Generate git diff (base...HEAD), truncate at 2000 ln │ | ||
| │ Step 4: Collect docs index (ls docs/ recursive, head 100ln) │ | ||
| │ Step 5: Extract anchor docs (northstar.md, AGENTS.md) │ | ||
| │ Step 6: Compose full prompt (instructions + all context) │ | ||
| │ Step 7: npx @google/gemini-cli --model gemini-2.5-flash │ | ||
| │ Step 8: Write report → $GITHUB_STEP_SUMMARY │ | ||
| └─────────────────────────────────────────────────────────────────┘ | ||
| ``` |
There was a problem hiding this comment.
The spec claims the runner extracts anchor docs including AGENTS.md (and the token budget table includes an AGENTS excerpt), but the workflow currently only embeds docs/northstar.md plus a docs file list. This reduces the assessor’s ability to apply the repo’s stated “ground truth” rules. Consider embedding AGENTS.md (and, if intended, docs/plans/current-status.md) excerpts into the composed prompt and adding corresponding placeholders to the prompt template.
| ## Overview | ||
|
|
||
| This document specifies the implementation of a "Shadow Mode Agentic Evidence Assessor" for ColdVox. A Gemini-powered CI agent runs on every PR to audit whether the PR provides empirical evidence for its material claims and detects semantic drift between code and documentation. | ||
|
|
||
| **Shadow Mode**: The assessor is non-blocking in Phase 1. It writes a Markdown report to the GitHub Step Summary. It does not comment on the PR, does not set a check status, and cannot block merges. It is advisory only. | ||
|
|
There was a problem hiding this comment.
The PR description says this change “adds 6 new files” and implies no existing content changes, but the diff also deletes docs/plans/windows-multi-agent-recovery.md (the repo’s current execution plan referenced by README/docs/architecture/agent instructions). If this deletion is intentional, you’ll need to update the many in-repo references and regenerate/commit docs/index.md (docs-ci enforces scripts/build_docs_index.py). If it’s not intentional, restore the plan file.
…nces Root cleanup: - Delete CLAUDE.md, GEMINI.md (byte-identical copies of AGENTS.md) - Delete root junk: plugins.json, pr_365_details.json, test_enigo_live.rs - Archive 6 root reports to docs/archive/root/ Dead backend code: - Replace WHISPER_MODEL_PATH with STT_MODEL_PATH in types.rs and tests - Fix integration tests: whisper -> moonshine as preferred plugin - Update doc comments in plugin.rs, plugin_types.rs - Delete crates/app/plugins.json (had preferred_plugin: whisper) - Remove stale faster-whisper comment from Cargo.toml Dead reference fixes: - Replace ALL 20+ windows-multi-agent-recovery.md refs with current-status.md - Remove 'absolute truth' language from agent rules - Fix AGENTS.md pointer to nonexistent CI/policy.md -> CI/architecture.md - Fix README.md: remove CLAUDE.md reference - Update drive-project.prompt.md, gui-design-overview, todo.md Doc pruning: - Delete 15 empty/expired docs (stubs, chat transcripts, past-retention) - Archive 8 stale docs (Linux-only, org-wide, superseded) - Fix stt-overview.md: remove Whisper from Supported Backends - Fix aud-user-config-design.md: Moonshine is PyO3 not pure Rust - Fix fdn-testing-guide.md: add Parakeet validation warning Agent instruction restructure: - Sync AGENTS.md from .github/copilot-instructions.md (full content) - Update ensure_agent_hardlinks.sh: source is now copilot-instructions.md - Update check_markdown_placement.py: CLAUDE.md -> AGENTS.md - Update standards.md: remove CLAUDE.md/GEMINI.md references Dead vendor/scripts: - Delete vendor/vosk/ (stubs to dead Linux runner cache) - Delete scripts: setup-vosk-cache.sh, verify_vosk_model.sh, ensure_venv.sh, start-headless.sh
- SttRemoteAuthSettings: use #[derive(Default)] instead of manual impl (clippy::derivable_impls error in CI) - deny.toml: add RUSTSEC ignores for unmaintained transitive deps from Tauri (gtk3-rs, fxhash, unic-*, proc-macro-error - all from wry/tauri GUI layer, no safe upgrade available, no security impact) - docs/index.md: regenerate after doc cleanup changed file count/structure
Previous implementation failed because:
1. The 'Gather PR context' step failed with bash string substitution bugs
2. \{PLACEHOLDER\} patterns in bash expansion don't match {PLACEHOLDER} tokens
3. Large PRs caused the diff to be unavailable or truncated incorrectly
New approach:
- Use gemini-cli --approval-mode=yolo to give the agent autonomous tools
- Agent reads its instructions from the prompt file directly
- Agent runs git diff, reads files, and explores the repo itself
- No more brittle bash string replacement for prompt composition
- Combines two steps into one to avoid compose/run split failures
- Still uses gemini-2.0-flash (fixes model name/docs mismatch)
- Agent writes report to /tmp/report.md which is always checked
Addresses Copilot reviewer comments on bash substitution bugs and
model name mismatch between workflow comments and actual --model flag.
CI Feedback 🧐A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
Summary
Two sets of changes on this branch:
1. Agentic Evidence Assessor (shadow mode PR reviewer)
2. Comprehensive Repo Cleanup (70 files, -2,946 lines)
Systematic cleanup of dead code, dead documentation, dead references, and stale agent instructions. Three independent subagent audits confirmed every deletion.
Root Cleanup
Dead Backend Code (Rust)
Dead Reference Fixes (20+ occurrences)
Doc Pruning
Agent Instruction Restructure
Dead Vendor/Scripts
Evidence