doc-verification-bridge

Linking theorems to what they assume, prove, validate, and depend on

A doc-gen4 plugin that automatically classifies theorems by their role in bridging mathematical specifications and computational implementations.

NFM 2026 Paper Submission

Overview

When writing verified software, proofs exist at different conceptual levels:

Classification	Role
Mathematical Property	Proves within the specification layer
Computational Property	Proves within the implementation layer
Bridging Property	Connects spec ↔ impl (soundness/completeness)
Soundness Property	Shows impl → spec (embedding)
Completeness Property	Shows spec → impl (representation)

This tool:

Automatically infers assumes/proves/dependsOn relationships from theorem types and proof terms
Classifies definitions by their ontological category (mathematical vs computational)
Identifies bridging theorems that connect Bool computations to Prop specifications
Tracks proof dependencies — which theorems a proof uses
Generates documentation showing verification coverage and proof relationships

Installation

Add to your lakefile.lean:

require «doc-verification-bridge» from git
  "https://github.com/NicolasRouquette/doc-verification-bridge" @ "main"

Using Annotations in Your Code

To use the @[api_type], @[api_def], @[api_theorem], and @[api_lemma] attributes in your Lean files, import the attributes module:

import DocVerificationBridge.Attributes

@[api_type { category := .mathematicalAbstraction, coverage := .complete }]
inductive PathWithLength {α : Type*} (r : α → α → Prop) : α → α → Nat → Prop
  | single {a b} : r a b → PathWithLength r a b 1
  | cons {a b c n} : r a b → PathWithLength r b c n → PathWithLength r a c (n + 1)

@[api_theorem {
  theoremKind := .soundnessProperty,
  assumes := #[`PathWithLength],
  proves := #[`Relation.TransGen]
}]
theorem PathWithLength_soundness {a b : α} {n : Nat} (h : PathWithLength r a b n) :
    Relation.TransGen r a b := by
  induction h <;> simp_all [Relation.TransGen.single, Relation.TransGen.trans]

Adding as a Dependency (Path-based)

For local development, add doc-verification-bridge as a path-based dependency in your lakefile.toml:

[[require]]
name = "doc-verification-bridge"
path = "../path/to/doc-verification-bridge"

Or in lakefile.lean:

require «doc-verification-bridge» from "../path/to/doc-verification-bridge"

Running on External Projects

To analyze a project you haven't modified (e.g., batteries, mathlib4), you need a nested docbuild directory similar to doc-gen4. This is required because:

The target project's compiled modules must be available to import
doc-verification-bridge and its dependencies (doc-gen4, Cli) must be in scope
The nested project approach allows both requirements without modifying the target

Setup Instructions

Create a docbuild subdirectory inside the target project:

cd /path/to/target-project  # e.g., batteries
mkdir docbuild
cd docbuild

Create lakefile.toml with the following content:

name = "docbuild"
reservoir = false
version = "0.1.0"
packagesDir = "../.lake/packages"

[[require]]
name = "batteries"       # Replace with your target library name
path = "../"

[[require]]
name = "doc-verification-bridge"
git = "https://github.com/NicolasRouquette/doc-verification-bridge"
rev = "main"

Copy the lean-toolchain from the parent project:

cp ../lean-toolchain .

Update dependencies:

lake update doc-verification-bridge

Build the target project (if not already built):

cd ..
lake build
cd docbuild

Run doc-verification-bridge:

lake exe doc-verification-bridge --output docs Batteries Batteries

Example: Analyzing batteries

# From batteries root
cd /path/to/batteries
mkdir -p docbuild
cd docbuild

# Create lakefile.toml
cat > lakefile.toml << 'EOF'
name = "docbuild"
reservoir = false
version = "0.1.0"
packagesDir = "../.lake/packages"

[[require]]
name = "batteries"
path = "../"

[[require]]
name = "doc-verification-bridge"
git = "https://github.com/NicolasRouquette/doc-verification-bridge"
rev = "main"
EOF

cp ../lean-toolchain .
lake update doc-verification-bridge

# Run with automatic classification (default)
lake exe unified-doc unified --auto --output docs Batteries

# Or with annotation-based classification
lake exe unified-doc unified --annotated --output docs Batteries

The generated documentation will be in docbuild/docs/.

Performance Tuning for Large Projects

For very large projects like mathlib4, proof dependency extraction can be slow because it traverses proof terms to find which lemmas each theorem uses.

CLI Flags

Flag	Effect
`--skip-proof-deps`	Skip proof dependency extraction entirely (fastest, no `dependsOn` data)
`--proof-dep-workers N`	Use up to N worker threads for parallel proof extraction
`--save-classification PATH`	Save classification results to cache (creates PATH.json + PATH.jsonl)
`--load-classification PATH`	Load classification from cache, skip classification phase
`--html-workers N`	Use N parallel workers for HTML file generation

Example:

# Skip proof deps entirely (fastest)
lake exe unified-doc unified --auto --skip-proof-deps --output docs Mathlib

# Parallel proof extraction with 8 workers
lake exe unified-doc unified --auto --proof-dep-workers 8 --output docs Mathlib

# Parallel HTML generation with 20 workers (speeds up file writing for large projects)
lake exe unified-doc unified --auto --html-workers 20 --output docs Mathlib

# Combined: parallel proof extraction + parallel HTML generation
lake exe unified-doc unified --auto --proof-dep-workers 50 --html-workers 20 --output docs Mathlib

# Save classification to cache (for large projects like mathlib4)
# Creates /tmp/mathlib-cache.json (metadata) and /tmp/mathlib-cache.jsonl (entries)
lake exe unified-doc unified --auto --save-classification /tmp/mathlib-cache --output docs Mathlib

# Load from cache and regenerate HTML only (fast iteration)
lake exe unified-doc unified --auto --load-classification /tmp/mathlib-cache --html-workers 20 --output docs Mathlib

Classification Cache Format

The classification cache uses a split format for streaming I/O with large projects:

<path>.json: Small metadata file with version and entry count
<path>.jsonl: Pure JSON Lines with one entry per line

This enables standard JSONL tooling (jq, wc -l, head, tail) and avoids stack overflow when serializing/deserializing 280K+ entries.

How Parallel Extraction Works

When --proof-dep-workers N is specified with N > 0, the classifier uses a two-phase approach:

Phase 1 (Sequential in MetaM): Extract type information, infer theorem kinds, and detect sorry usage
Phase 2 (Parallel in IO): Extract proof dependencies using worker threads

This provides good speedup while maintaining correctness, since proof term traversal is pure and can be safely parallelized.

Tip: For batch analysis of multiple projects, see the experiments pipeline which handles this configuration via TOML. Global settings like proof_dep_workers and html_workers can be set once and overridden per-project.

Two Modes of Operation

doc-verification-bridge supports two complementary classification modes:

Mode	Flag	Effort	Precision	Best For
Automatic	`--auto`	Zero annotations	Good (heuristic-based)	Quick overview, existing codebases
Annotated	`--annotated`	Explicit annotations	Exact	Production documentation, precise control

Mode 1: Automatic (No Annotations Required)

Run doc-verification-bridge on any Lean 4 project without modifying source code.

Batch Analysis: To automatically analyze multiple Lean 4 projects in parallel, see the experiments/README.md for an automated pipeline that clones, builds, and generates documentation for configured repositories.

Note: Run these commands from inside the docbuild directory after completing the setup instructions above.

lake exe unified-doc unified --auto --output docs MyProject.Core MyProject.Theorems

With source links:

lake exe unified-doc unified --auto \
  --repo https://github.com/org/repo \
  --output docs \
  MyProject.Core MyProject.Theorems

For projects with a single top-level module:

lake exe unified-doc unified --auto --output docs --project "Batteries" Batteries

How Automatic Inference Works

The inference engine analyzes theorem types to classify names into assumes, proves, validates, and dependsOn:

H1: Predicate Hypotheses → `assumes`

theorem foo (h : IsAcyclic g) : ...
--            ^^^^^^^^^^^^^ predicate hypothesis → assumes

H2: Equation Hypotheses → `proves`

theorem map_val (h : strings.mapM Identifier.mk? = some ids) : ...
--                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ equation → proves

When a hypothesis characterizes a function's behavior, the theorem proves a property about that function.

H3: Conclusion Analysis → `proves`

theorem foo ... : NoSelfLoop g ∧ IsTree g
--                ^^^^^^^^^^^^^^^^^^^^^^^ conclusion → proves

H4: Bool Functions in Hypotheses/Conclusions → `validates`

theorem sound : isAcyclicBool g = true → IsAcyclic g
--              ^^^^^^^^^^^^^                        → validates

H5: Proof Term Analysis → `dependsOn`

theorem bar : P := by
  apply foo  -- bar depends on foo
  exact baz  -- bar depends on baz

The tool examines the proof term to find all theorems and lemmas used in the proof. This creates a "depends on" relationship: if theorem A uses theorem B in its proof, then A depends on B. This is useful for understanding proof structure and identifying which foundational lemmas are most widely used.

Theorem Kind Inference

Pattern	Inferred Kind
`BoolFunc = true → PropSpec`	`bridgingProperty` (sound)
`PropSpec → BoolFunc = true`	`bridgingProperty` (complete)
`BoolFunc = true ↔ PropSpec`	`bridgingProperty` (iff)
`UserType → ExternalType`	`soundnessProperty`
`ExternalType → ∃..., UserType`	`completenessProperty`
Internal properties only	`mathematicalProperty`
Algebraic laws (BEq, Hashable)	`computationalProperty`

Mode 2: Annotated Mode (Precise Control)

For production documentation, use explicit annotations to ensure accuracy. Run with --annotated flag to only classify declarations with explicit @[api_*] attributes:

lake exe unified-doc unified --annotated --output docs MyProject

Three Attributes for the Four-Category Ontology

Attribute	For	Category Detection
`@[api_type]`	`structure`, `inductive`, `class`	User specifies
`@[api_def]`	`def`	Auto-detected from return type
`@[api_theorem]` / `@[api_lemma]`	`theorem`, `lemma`	N/A (use `theoremKind`)

`@[api_type]` — For Types

Category must be specified (abstract vs concrete can't be auto-detected):

@[api_type { category := .mathematicalAbstraction }]
structure PackageRegistrySpec where ...

@[api_type { category := .computationalDatatype }]
inductive DependencyKind where ...

`@[api_def]` — For Definitions

Category is auto-detected from return type:

@[api_def]  -- Returns Prop → mathematicalDefinition
def IsAcyclic (g : Graph) : Prop := ...

@[api_def]  -- Returns Option Value → computationalOperation
def lookup (m : Map) (k : Key) : Option Value := ...

@[api_def { coverage := .complete }]  -- Can specify other fields
def isEmpty (xs : List α) : Bool := ...

`@[api_theorem]` / `@[api_lemma]` — For Proofs

Use theoremKind to classify:

@[api_theorem {
  theoremKind := .bridgingProperty,
  bridgingDirection := .sound,
  proves := #[`IsPositive],
  validates := #[`isPositiveBool]
}]
theorem isPositiveBool_sound : isPositiveBool n = true → IsPositive n := ...

@[api_lemma {
  theoremKind := .mathematicalProperty,
  proves := #[`IsAcyclic]
}]
lemma acyclic_of_tree : IsTree g → IsAcyclic g := ...

Annotation Fields

`assumes` / `proves` / `validates`

Field	Meaning	Example
`assumes`	Preconditions the theorem relies on	`assumes := #[\`IsWellFormed]`
`proves`	What the theorem establishes	`proves := #[\`IsAcyclic]`
`validates`	Bool functions validated (bridging only)	`validates := #[\`isAcyclicBool]`

`bridgingDirection` (for `bridgingProperty`)

Direction	Pattern	Meaning
`.sound`	`comp = true → prop`	"If algorithm says yes, spec agrees"
`.complete`	`prop → comp = true`	"If spec says yes, algorithm finds it"
`.iff`	`comp = true ↔ prop`	Full decidability

Four-Category Ontology

Based on E.J. Lowe's Four-Category Ontology (Oxford, 2006):

	Mathematical (Universal, Prop)	Computational (Particular, data)
Substantial	`mathematicalAbstraction` — Kinds	`computationalDatatype` — Objects
Non-substantial	`mathematicalDefinition` — Attributes	`computationalOperation` — Modes

Theorem Classification Summary

Pattern	TheoremKind	proves	validates
`UserType → ExternalType`	`soundnessProperty`	ExternalType	∅
`ExternalType → ∃..., UserType`	`completenessProperty`	UserType	∅
Internal closure properties	`mathematicalProperty`	internal types	∅
`BoolFunc = true ↔ PropSpec`	`bridgingProperty`	PropSpec	BoolFunc
Algebraic laws (reflexivity, etc.)	`computationalProperty`	internal types	∅

Key distinctions:

soundnessProperty: Every UserType is a valid ExternalType ("PathWithLength is a valid TransGen")
completenessProperty: Every ExternalType can be represented as UserType
bridgingProperty: Links Bool computations to Prop specifications
mathematicalProperty: Internal closure/structural properties
computationalProperty: Algebraic laws about BEq, Hashable, Ord instances

Validation Rules

The annotation system validates at compile time:

Code	Severity	Description
ACE1	❌ error	`theoremKind` without `proves`
ACE2	❌ error	Unresolved `proves` reference
ACE3	❌ error	Unresolved `validates` reference
ACE13	❌ error	Missing `category` on type
ACE18	❌ error	`validates` on non-bridging theorem
ACE19	❌ error	Missing `validates` on bridging theorem
ACW9	⚠️ warning	Inference suggests different values
ACW10	⚠️ warning	`proves` without `theoremKind`
ACW19	⚠️ warning	Naming convention suggestion

Suppressing Warnings

@[api_theorem {
  theoremKind := .mathematicalProperty,
  proves := #[`MyDef],
  suppress := #["ACW15", "ACW19"]  -- Use string array
}]
theorem myTheorem : ... := ...

Output

Generates static HTML documentation (Python-free) with:

Cross-referenced definitions and theorems
Source code links with line numbers
Coverage status (✅ complete, ⚠️ axiom-dependent, 🔄 partial, ❌ unverified)
Theorem classification breakdown
Bidirectional verifiedBy links

Example Output Structure

site/
├── index.html
├── style.css
├── modules/
│   ├── index.html
│   └── <module>.html
└── api/
    └── (doc-gen4 output with verification badges)

Bidirectional Navigation

The unified pipeline enables true bidirectional navigation between API documentation and verification coverage:

From API docs → Coverage reports: Each declaration in the doc-gen4 API documentation displays a verification badge indicating its classification:

🔷 Mathematical abstraction type
🔶 Computational datatype
🔹 Mathematical definition
🔸 Computational operation
📐 Mathematical property (theorem)
⚙️ Computational property (theorem)
⬇️ Soundness theorem
⬆️ Completeness theorem
⇕ Equivalence theorem

Clicking a badge navigates to the corresponding coverage report page.

From Coverage reports → API docs: Each entry in the verification coverage reports links back to the doc-gen4 API page for that declaration.

This bidirectional linking is achieved through a custom DeclarationDecoratorFn hook in doc-gen4 (added in PR #344).

Serve locally:

python3 -m http.server -d site 8000

Comparison with Other Tools

Feature	doc-gen4	blueprint	doc-verification-bridge
Purpose	API docs	Proof dependency graphs	Semantic coverage tracking
Tracks	Types, signatures	Theorem dependencies	Ontological categories
Semantic classification	❌	❌	✅ Four-Category Ontology
Spec↔Impl links	❌	❌	✅ `proves`/`validates`
Coverage metrics	❌	✅ (sorry tracking)	✅ (unverified/partial/complete)

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.github/workflows		.github/workflows
DocVerificationBridge		DocVerificationBridge
experiments		experiments
paper		paper
scripts		scripts
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
DocVerificationBridge.lean		DocVerificationBridge.lean
ExperimentsMain.lean		ExperimentsMain.lean
README.md		README.md
SECURITY.md		SECURITY.md
UnifiedMain.lean		UnifiedMain.lean
lake-manifest.json		lake-manifest.json
lakefile.lean		lakefile.lean
lean-toolchain		lean-toolchain

NicolasRouquette/doc-verification-bridge

Folders and files

Latest commit

History

Repository files navigation

doc-verification-bridge

Overview

Installation

Using Annotations in Your Code

Adding as a Dependency (Path-based)

Running on External Projects

Setup Instructions

Example: Analyzing batteries

Performance Tuning for Large Projects

CLI Flags

Classification Cache Format

How Parallel Extraction Works

Two Modes of Operation

Mode 1: Automatic (No Annotations Required)

How Automatic Inference Works

H1: Predicate Hypotheses → assumes

H2: Equation Hypotheses → proves

H3: Conclusion Analysis → proves

H4: Bool Functions in Hypotheses/Conclusions → validates

H5: Proof Term Analysis → dependsOn

Theorem Kind Inference

Mode 2: Annotated Mode (Precise Control)

Three Attributes for the Four-Category Ontology

@[api_type] — For Types

@[api_def] — For Definitions

@[api_theorem] / @[api_lemma] — For Proofs

Annotation Fields

assumes / proves / validates

bridgingDirection (for bridgingProperty)

Four-Category Ontology

Theorem Classification Summary

Validation Rules

Suppressing Warnings

Output

Example Output Structure

Bidirectional Navigation

Comparison with Other Tools

License

About

Topics

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0