Skip to content

feat: add reference verification pipeline skill#4

Open
javiertoledo wants to merge 1 commit intomainfrom
feat/verify-refs-skill
Open

feat: add reference verification pipeline skill#4
javiertoledo wants to merge 1 commit intomainfrom
feat/verify-refs-skill

Conversation

@javiertoledo
Copy link
Copy Markdown
Member

Summary

  • Adds a complete reference verification system for markdown documents with inline citations
  • The pipeline catches both accuracy errors (cited claims that don't match sources) and completeness gaps (unsourced assertions and overclaims)
  • Battle-tested on 7 whitepapers with 192+ cited claims and 75 completeness flags across 5 verification passes

Components

Type Files Purpose
Skill skills/verify-refs/SKILL.md, agents/openai.yaml Installable skill metadata + Codex interface
Scripts scripts/ref_{extract,triage,verify,reconcile,audit}.py 5 stdlib-only pipeline scripts (no external deps)
Agents .claude/agents/{fact-checker,claim-reviewer,claims-auditor}.md 3 specialized agent definitions
Command .claude/commands/verify-refs.md /verify-refs slash command with 6-phase workflow

How it works

Two complementary passes run in parallel:

Phase 0 — Completeness audit: LLM agents read each document and flag unsourced assertions and overclaims (e.g., specific stats without citations, claims that overstate evidence).

Phases 1-3 — Citation verification: Regex extraction → LLM triage → isolated fact-checking against original sources. Fact-checker agents have zero project context (only a URL and claims), preventing circular verification.

Phases 4-5: Register results in kb/ and apply fixes (with user approval).

Test plan

  • Run python3 scripts/ref_extract.py --help — all 5 scripts should show usage
  • Run python3 -c "import py_compile; py_compile.compile('scripts/ref_extract.py', doraise=True)" for each script — all compile without errors
  • Verify /verify-refs command appears in Claude Code slash commands
  • Run Phase 0 on a sample markdown doc with references to validate end-to-end

🤖 Generated with Claude Code

Add a complete reference verification system for markdown documents with
inline citations. The pipeline catches both accuracy errors (cited claims
that don't match sources) and completeness gaps (unsourced assertions
and overclaims).

Components:
- 5 scripts: ref_extract, ref_triage, ref_verify, ref_reconcile, ref_audit
- 3 agents: fact-checker (isolated), claim-reviewer, claims-auditor
- /verify-refs slash command with 6-phase workflow
- Installable skill metadata in skills/verify-refs/

Key design: fact-checker agents have zero project context (only URL + claims),
preventing circular verification. Phase 0 (completeness audit) runs in
parallel with Phase 1 (extraction) for efficiency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8001f5030

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +146 to +149
new_source = r.get("correct_source_id", "")
if new_source:
claim["source_id"] = new_source
# Keep as PENDING for verification with corrected source
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate reattributed source IDs before updating claims

apply_triage writes correct_source_id directly into claim["source_id"], but the claim-reviewer contract allows returning a source name as well as an ID. If a name is returned, ref_verify.generate_all_prompts cannot find it in sources.csv and skips the claim, so that claim remains unverified (still pending) and silently drops out of the verification batches except for a warning. Please map/validate reattributions against known IDs before overwriting source_id.

Useful? React with 👍 / 👎.

Comment on lines +137 to +144
if classification == "EDITORIAL":
claim["verification_status"] = "NOT_FOUND"
claim["source_evidence"] = f"Triage: EDITORIAL — {r.get('reason', '')}"
claim["discrepancy_detail"] = "Filtered by claim-reviewer: not a sourced claim."
elif classification == "SYNTHESIS":
claim["verification_status"] = "NOT_FOUND"
claim["source_evidence"] = f"Triage: SYNTHESIS — {r.get('reason', '')}"
claim["discrepancy_detail"] = "Filtered by claim-reviewer: original synthesis, not verifiable against single source."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep filtered triage outcomes out of NOT_FOUND status

Setting EDITORIAL and SYNTHESIS claims to verification_status = "NOT_FOUND" makes reconciliation misreport them as source-verification misses. ref_reconcile.py interprets NOT_FOUND as “source accessible but claim absent,” so filtered non-source claims inflate failure counts and appear in the Not Found warning section as if fact-checking failed. Use a distinct triage-filtered status (or exclude these rows from reconcile) to preserve report accuracy.

Useful? React with 👍 / 👎.

Comment on lines +485 to +487
dedup_key = f"{draft_name}:{ratio}"
if dedup_key in seen_claims:
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Deduplicate ratio claims with contextual keys

The ratio dedup key only uses {draft_name}:{ratio}, so repeated ratios in the same draft collapse to one claim even when they are different assertions on different lines or tied to different citations. That drops valid claims from verification and can hide real contradictions/misattributions whenever multiple statements share the same numeric pattern (for example, two separate 50:1 claims). Include line/context/source in the dedup key instead of just the ratio value.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant