Linking theorems to what they assume, prove, validate, and depend on
A doc-gen4 plugin that automatically classifies theorems by their role in bridging mathematical specifications and computational implementations.
When writing verified software, proofs exist at different conceptual levels:
| Classification | Role |
|---|---|
| Mathematical Property | Proves within the specification layer |
| Computational Property | Proves within the implementation layer |
| Bridging Property | Connects spec ↔ impl (soundness/completeness) |
| Soundness Property | Shows impl → spec (embedding) |
| Completeness Property | Shows spec → impl (representation) |
This tool:
- Automatically infers
assumes/proves/dependsOnrelationships from theorem types and proof terms - Classifies definitions by their ontological category (mathematical vs computational)
- Identifies bridging theorems that connect Bool computations to Prop specifications
- Tracks proof dependencies — which theorems a proof uses
- Generates documentation showing verification coverage and proof relationships
Add to your lakefile.lean:
require «doc-verification-bridge» from git
"https://github.com/NicolasRouquette/doc-verification-bridge" @ "main"To use the @[api_type], @[api_def], @[api_theorem], and @[api_lemma] attributes in your Lean files, import the attributes module:
import DocVerificationBridge.Attributes
@[api_type { category := .mathematicalAbstraction, coverage := .complete }]
inductive PathWithLength {α : Type*} (r : α → α → Prop) : α → α → Nat → Prop
| single {a b} : r a b → PathWithLength r a b 1
| cons {a b c n} : r a b → PathWithLength r b c n → PathWithLength r a c (n + 1)
@[api_theorem {
theoremKind := .soundnessProperty,
assumes := #[`PathWithLength],
proves := #[`Relation.TransGen]
}]
theorem PathWithLength_soundness {a b : α} {n : Nat} (h : PathWithLength r a b n) :
Relation.TransGen r a b := by
induction h <;> simp_all [Relation.TransGen.single, Relation.TransGen.trans]For local development, add doc-verification-bridge as a path-based dependency in your lakefile.toml:
[[require]]
name = "doc-verification-bridge"
path = "../path/to/doc-verification-bridge"Or in lakefile.lean:
require «doc-verification-bridge» from "../path/to/doc-verification-bridge"To analyze a project you haven't modified (e.g., batteries, mathlib4), you need a nested docbuild directory similar to doc-gen4. This is required because:
- The target project's compiled modules must be available to import
- doc-verification-bridge and its dependencies (doc-gen4, Cli) must be in scope
- The nested project approach allows both requirements without modifying the target
- Create a
docbuildsubdirectory inside the target project:
cd /path/to/target-project # e.g., batteries
mkdir docbuild
cd docbuild- Create
lakefile.tomlwith the following content:
name = "docbuild"
reservoir = false
version = "0.1.0"
packagesDir = "../.lake/packages"
[[require]]
name = "batteries" # Replace with your target library name
path = "../"
[[require]]
name = "doc-verification-bridge"
git = "https://github.com/NicolasRouquette/doc-verification-bridge"
rev = "main"- Copy the
lean-toolchainfrom the parent project:
cp ../lean-toolchain .- Update dependencies:
lake update doc-verification-bridge- Build the target project (if not already built):
cd ..
lake build
cd docbuild- Run doc-verification-bridge:
lake exe doc-verification-bridge --output docs Batteries Batteries# From batteries root
cd /path/to/batteries
mkdir -p docbuild
cd docbuild
# Create lakefile.toml
cat > lakefile.toml << 'EOF'
name = "docbuild"
reservoir = false
version = "0.1.0"
packagesDir = "../.lake/packages"
[[require]]
name = "batteries"
path = "../"
[[require]]
name = "doc-verification-bridge"
git = "https://github.com/NicolasRouquette/doc-verification-bridge"
rev = "main"
EOF
cp ../lean-toolchain .
lake update doc-verification-bridge
# Run with automatic classification (default)
lake exe unified-doc unified --auto --output docs Batteries
# Or with annotation-based classification
lake exe unified-doc unified --annotated --output docs BatteriesThe generated documentation will be in docbuild/docs/.
For very large projects like mathlib4, proof dependency extraction can be slow because it traverses proof terms to find which lemmas each theorem uses.
| Flag | Effect |
|---|---|
--skip-proof-deps |
Skip proof dependency extraction entirely (fastest, no dependsOn data) |
--proof-dep-workers N |
Use up to N worker threads for parallel proof extraction |
--save-classification PATH |
Save classification results to cache (creates PATH.json + PATH.jsonl) |
--load-classification PATH |
Load classification from cache, skip classification phase |
--html-workers N |
Use N parallel workers for HTML file generation |
Example:
# Skip proof deps entirely (fastest)
lake exe unified-doc unified --auto --skip-proof-deps --output docs Mathlib
# Parallel proof extraction with 8 workers
lake exe unified-doc unified --auto --proof-dep-workers 8 --output docs Mathlib
# Parallel HTML generation with 20 workers (speeds up file writing for large projects)
lake exe unified-doc unified --auto --html-workers 20 --output docs Mathlib
# Combined: parallel proof extraction + parallel HTML generation
lake exe unified-doc unified --auto --proof-dep-workers 50 --html-workers 20 --output docs Mathlib
# Save classification to cache (for large projects like mathlib4)
# Creates /tmp/mathlib-cache.json (metadata) and /tmp/mathlib-cache.jsonl (entries)
lake exe unified-doc unified --auto --save-classification /tmp/mathlib-cache --output docs Mathlib
# Load from cache and regenerate HTML only (fast iteration)
lake exe unified-doc unified --auto --load-classification /tmp/mathlib-cache --html-workers 20 --output docs MathlibThe classification cache uses a split format for streaming I/O with large projects:
<path>.json: Small metadata file with version and entry count<path>.jsonl: Pure JSON Lines with one entry per line
This enables standard JSONL tooling (jq, wc -l, head, tail) and avoids stack overflow when serializing/deserializing 280K+ entries.
When --proof-dep-workers N is specified with N > 0, the classifier uses a two-phase approach:
- Phase 1 (Sequential in MetaM): Extract type information, infer theorem kinds, and detect
sorryusage - Phase 2 (Parallel in IO): Extract proof dependencies using worker threads
This provides good speedup while maintaining correctness, since proof term traversal is pure and can be safely parallelized.
Tip: For batch analysis of multiple projects, see the experiments pipeline which handles this configuration via TOML. Global settings like
proof_dep_workersandhtml_workerscan be set once and overridden per-project.
doc-verification-bridge supports two complementary classification modes:
| Mode | Flag | Effort | Precision | Best For |
|---|---|---|---|---|
| Automatic | --auto |
Zero annotations | Good (heuristic-based) | Quick overview, existing codebases |
| Annotated | --annotated |
Explicit annotations | Exact | Production documentation, precise control |
Run doc-verification-bridge on any Lean 4 project without modifying source code.
Batch Analysis: To automatically analyze multiple Lean 4 projects in parallel, see the experiments/README.md for an automated pipeline that clones, builds, and generates documentation for configured repositories.
Note: Run these commands from inside the
docbuilddirectory after completing the setup instructions above.
lake exe unified-doc unified --auto --output docs MyProject.Core MyProject.TheoremsWith source links:
lake exe unified-doc unified --auto \
--repo https://github.com/org/repo \
--output docs \
MyProject.Core MyProject.TheoremsFor projects with a single top-level module:
lake exe unified-doc unified --auto --output docs --project "Batteries" BatteriesThe inference engine analyzes theorem types to classify names into assumes, proves, validates, and dependsOn:
theorem foo (h : IsAcyclic g) : ...
-- ^^^^^^^^^^^^^ predicate hypothesis → assumestheorem map_val (h : strings.mapM Identifier.mk? = some ids) : ...
-- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ equation → provesWhen a hypothesis characterizes a function's behavior, the theorem proves a property about that function.
theorem foo ... : NoSelfLoop g ∧ IsTree g
-- ^^^^^^^^^^^^^^^^^^^^^^^ conclusion → provestheorem sound : isAcyclicBool g = true → IsAcyclic g
-- ^^^^^^^^^^^^^ → validatestheorem bar : P := by
apply foo -- bar depends on foo
exact baz -- bar depends on bazThe tool examines the proof term to find all theorems and lemmas used in the proof. This creates a "depends on" relationship: if theorem A uses theorem B in its proof, then A depends on B. This is useful for understanding proof structure and identifying which foundational lemmas are most widely used.
| Pattern | Inferred Kind |
|---|---|
BoolFunc = true → PropSpec |
bridgingProperty (sound) |
PropSpec → BoolFunc = true |
bridgingProperty (complete) |
BoolFunc = true ↔ PropSpec |
bridgingProperty (iff) |
UserType → ExternalType |
soundnessProperty |
ExternalType → ∃..., UserType |
completenessProperty |
| Internal properties only | mathematicalProperty |
| Algebraic laws (BEq, Hashable) | computationalProperty |
For production documentation, use explicit annotations to ensure accuracy.
Run with --annotated flag to only classify declarations with explicit @[api_*] attributes:
lake exe unified-doc unified --annotated --output docs MyProject| Attribute | For | Category Detection |
|---|---|---|
@[api_type] |
structure, inductive, class |
User specifies |
@[api_def] |
def |
Auto-detected from return type |
@[api_theorem] / @[api_lemma] |
theorem, lemma |
N/A (use theoremKind) |
Category must be specified (abstract vs concrete can't be auto-detected):
@[api_type { category := .mathematicalAbstraction }]
structure PackageRegistrySpec where ...
@[api_type { category := .computationalDatatype }]
inductive DependencyKind where ...Category is auto-detected from return type:
@[api_def] -- Returns Prop → mathematicalDefinition
def IsAcyclic (g : Graph) : Prop := ...
@[api_def] -- Returns Option Value → computationalOperation
def lookup (m : Map) (k : Key) : Option Value := ...
@[api_def { coverage := .complete }] -- Can specify other fields
def isEmpty (xs : List α) : Bool := ...Use theoremKind to classify:
@[api_theorem {
theoremKind := .bridgingProperty,
bridgingDirection := .sound,
proves := #[`IsPositive],
validates := #[`isPositiveBool]
}]
theorem isPositiveBool_sound : isPositiveBool n = true → IsPositive n := ...
@[api_lemma {
theoremKind := .mathematicalProperty,
proves := #[`IsAcyclic]
}]
lemma acyclic_of_tree : IsTree g → IsAcyclic g := ...| Field | Meaning | Example |
|---|---|---|
assumes |
Preconditions the theorem relies on | assumes := #[\IsWellFormed]` |
proves |
What the theorem establishes | proves := #[\IsAcyclic]` |
validates |
Bool functions validated (bridging only) | validates := #[\isAcyclicBool]` |
| Direction | Pattern | Meaning |
|---|---|---|
.sound |
comp = true → prop |
"If algorithm says yes, spec agrees" |
.complete |
prop → comp = true |
"If spec says yes, algorithm finds it" |
.iff |
comp = true ↔ prop |
Full decidability |
Based on E.J. Lowe's Four-Category Ontology (Oxford, 2006):
| Mathematical (Universal, Prop) | Computational (Particular, data) | |
|---|---|---|
| Substantial | mathematicalAbstraction — Kinds |
computationalDatatype — Objects |
| Non-substantial | mathematicalDefinition — Attributes |
computationalOperation — Modes |
| Pattern | TheoremKind | proves | validates |
|---|---|---|---|
UserType → ExternalType |
soundnessProperty |
ExternalType | ∅ |
ExternalType → ∃..., UserType |
completenessProperty |
UserType | ∅ |
| Internal closure properties | mathematicalProperty |
internal types | ∅ |
BoolFunc = true ↔ PropSpec |
bridgingProperty |
PropSpec | BoolFunc |
| Algebraic laws (reflexivity, etc.) | computationalProperty |
internal types | ∅ |
Key distinctions:
- soundnessProperty: Every UserType is a valid ExternalType ("PathWithLength is a valid TransGen")
- completenessProperty: Every ExternalType can be represented as UserType
- bridgingProperty: Links Bool computations to Prop specifications
- mathematicalProperty: Internal closure/structural properties
- computationalProperty: Algebraic laws about BEq, Hashable, Ord instances
The annotation system validates at compile time:
| Code | Severity | Description |
|---|---|---|
| ACE1 | ❌ error | theoremKind without proves |
| ACE2 | ❌ error | Unresolved proves reference |
| ACE3 | ❌ error | Unresolved validates reference |
| ACE13 | ❌ error | Missing category on type |
| ACE18 | ❌ error | validates on non-bridging theorem |
| ACE19 | ❌ error | Missing validates on bridging theorem |
| ACW9 | Inference suggests different values | |
| ACW10 | proves without theoremKind |
|
| ACW19 | Naming convention suggestion |
@[api_theorem {
theoremKind := .mathematicalProperty,
proves := #[`MyDef],
suppress := #["ACW15", "ACW19"] -- Use string array
}]
theorem myTheorem : ... := ...Generates static HTML documentation (Python-free) with:
- Cross-referenced definitions and theorems
- Source code links with line numbers
- Coverage status (✅ complete,
⚠️ axiom-dependent, 🔄 partial, ❌ unverified) - Theorem classification breakdown
- Bidirectional
verifiedBylinks
site/
├── index.html
├── style.css
├── modules/
│ ├── index.html
│ └── <module>.html
└── api/
└── (doc-gen4 output with verification badges)
The unified pipeline enables true bidirectional navigation between API documentation and verification coverage:
From API docs → Coverage reports: Each declaration in the doc-gen4 API documentation displays a verification badge indicating its classification:
- 🔷 Mathematical abstraction type
- 🔶 Computational datatype
- 🔹 Mathematical definition
- 🔸 Computational operation
- 📐 Mathematical property (theorem)
- ⚙️ Computational property (theorem)
- ⬇️ Soundness theorem
- ⬆️ Completeness theorem
- ⇕ Equivalence theorem
Clicking a badge navigates to the corresponding coverage report page.
From Coverage reports → API docs: Each entry in the verification coverage reports links back to the doc-gen4 API page for that declaration.
This bidirectional linking is achieved through a custom DeclarationDecoratorFn hook in doc-gen4 (added in PR #344).
Serve locally:
python3 -m http.server -d site 8000| Feature | doc-gen4 | blueprint | doc-verification-bridge |
|---|---|---|---|
| Purpose | API docs | Proof dependency graphs | Semantic coverage tracking |
| Tracks | Types, signatures | Theorem dependencies | Ontological categories |
| Semantic classification | ❌ | ❌ | ✅ Four-Category Ontology |
| Spec↔Impl links | ❌ | ❌ | ✅ proves/validates |
| Coverage metrics | ❌ | ✅ (sorry tracking) | ✅ (unverified/partial/complete) |
Apache 2.0