feat: add reference verification pipeline skill#4
Conversation
Add a complete reference verification system for markdown documents with inline citations. The pipeline catches both accuracy errors (cited claims that don't match sources) and completeness gaps (unsourced assertions and overclaims). Components: - 5 scripts: ref_extract, ref_triage, ref_verify, ref_reconcile, ref_audit - 3 agents: fact-checker (isolated), claim-reviewer, claims-auditor - /verify-refs slash command with 6-phase workflow - Installable skill metadata in skills/verify-refs/ Key design: fact-checker agents have zero project context (only URL + claims), preventing circular verification. Phase 0 (completeness audit) runs in parallel with Phase 1 (extraction) for efficiency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a8001f5030
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| new_source = r.get("correct_source_id", "") | ||
| if new_source: | ||
| claim["source_id"] = new_source | ||
| # Keep as PENDING for verification with corrected source |
There was a problem hiding this comment.
Validate reattributed source IDs before updating claims
apply_triage writes correct_source_id directly into claim["source_id"], but the claim-reviewer contract allows returning a source name as well as an ID. If a name is returned, ref_verify.generate_all_prompts cannot find it in sources.csv and skips the claim, so that claim remains unverified (still pending) and silently drops out of the verification batches except for a warning. Please map/validate reattributions against known IDs before overwriting source_id.
Useful? React with 👍 / 👎.
| if classification == "EDITORIAL": | ||
| claim["verification_status"] = "NOT_FOUND" | ||
| claim["source_evidence"] = f"Triage: EDITORIAL — {r.get('reason', '')}" | ||
| claim["discrepancy_detail"] = "Filtered by claim-reviewer: not a sourced claim." | ||
| elif classification == "SYNTHESIS": | ||
| claim["verification_status"] = "NOT_FOUND" | ||
| claim["source_evidence"] = f"Triage: SYNTHESIS — {r.get('reason', '')}" | ||
| claim["discrepancy_detail"] = "Filtered by claim-reviewer: original synthesis, not verifiable against single source." |
There was a problem hiding this comment.
Keep filtered triage outcomes out of NOT_FOUND status
Setting EDITORIAL and SYNTHESIS claims to verification_status = "NOT_FOUND" makes reconciliation misreport them as source-verification misses. ref_reconcile.py interprets NOT_FOUND as “source accessible but claim absent,” so filtered non-source claims inflate failure counts and appear in the Not Found warning section as if fact-checking failed. Use a distinct triage-filtered status (or exclude these rows from reconcile) to preserve report accuracy.
Useful? React with 👍 / 👎.
| dedup_key = f"{draft_name}:{ratio}" | ||
| if dedup_key in seen_claims: | ||
| continue |
There was a problem hiding this comment.
Deduplicate ratio claims with contextual keys
The ratio dedup key only uses {draft_name}:{ratio}, so repeated ratios in the same draft collapse to one claim even when they are different assertions on different lines or tied to different citations. That drops valid claims from verification and can hide real contradictions/misattributions whenever multiple statements share the same numeric pattern (for example, two separate 50:1 claims). Include line/context/source in the dedup key instead of just the ratio value.
Useful? React with 👍 / 👎.
Summary
Components
skills/verify-refs/SKILL.md,agents/openai.yamlscripts/ref_{extract,triage,verify,reconcile,audit}.py.claude/agents/{fact-checker,claim-reviewer,claims-auditor}.md.claude/commands/verify-refs.md/verify-refsslash command with 6-phase workflowHow it works
Two complementary passes run in parallel:
Phase 0 — Completeness audit: LLM agents read each document and flag unsourced assertions and overclaims (e.g., specific stats without citations, claims that overstate evidence).
Phases 1-3 — Citation verification: Regex extraction → LLM triage → isolated fact-checking against original sources. Fact-checker agents have zero project context (only a URL and claims), preventing circular verification.
Phases 4-5: Register results in kb/ and apply fixes (with user approval).
Test plan
python3 scripts/ref_extract.py --help— all 5 scripts should show usagepython3 -c "import py_compile; py_compile.compile('scripts/ref_extract.py', doraise=True)"for each script — all compile without errors/verify-refscommand appears in Claude Code slash commands🤖 Generated with Claude Code