ADR-002: Lean Formal Verification Integration

Status

Proposed

Date

2026-01-16

Context

We need to add formal specification and verification capabilities to rlm-core. The goals are:

Interactive theorem proving - Allow users to develop and verify formal proofs
Specification validation - Verify that formal specs are consistent and type-check
Implementation verification - Prove properties about algorithms and systems
AI-assisted specification - Help users create and refine formal specifications

Current state:

rlm-core has a Python REPL with ReplEnvironment trait for extension
We have Topos as a semantic contract language for human-AI collaboration
AI proof assistants (DeepSeek-Prover, LeanDojo) show LLMs can generate proofs effectively

Decision

We will implement:

1. Lean REPL Integration

Decision: Add a Lean 4 REPL as a new ReplEnvironment implementation using the same subprocess + JSON-RPC pattern as the Python REPL.

Rationale:

leanprover-community/repl already provides JSON protocol
Same architecture as Python REPL ensures consistency
Environment pickling allows proof state persistence
Subprocess isolation prevents crashes from affecting host

Alternatives considered:

Embedded Lean via FFI: Too complex, Lean's runtime is substantial
HTTP API: More overhead, no environment persistence
LSP only: Less flexible for programmatic interaction

2. Spec Agent

Decision: Create a specialized agent for specification creation and refinement with first-class Topos integration.

Rationale:

Specifications are the bottleneck; proofs can be automated
Human-AI collaboration on specs is more valuable than pure AI generation
Topos provides the semantic contract layer; Lean provides formal verification
Dual-track approach gives both human readability and machine verifiability

Alternatives considered:

Lean-only specs: Too hard for non-experts to read/review
Topos-only specs: No machine verification
Separate tools: Loses traceability and sync benefits

3. Dual-Track Sync (Topos ↔ Lean)

Decision: Maintain bidirectional links between Topos specs and Lean formalizations with drift detection.

Rationale:

Humans review Topos (readable), machines verify Lean (precise)
Links ensure they stay in sync
Drift detection catches divergence early
Progressive formalization allows starting simple

Implementation:

@lean annotations in Topos link to Lean artifacts
@topos comments in Lean link back
Sync engine generates Lean from Topos changes
Drift detection compares semantically

4. Progressive Proof Automation

Decision: Four-tier automation strategy: decidable → automation → AI-assisted → human-in-loop.

Rationale:

Many proofs are trivial (decidable tactics handle instantly)
Automation tactics (aesop, linarith) handle common patterns
AI proof search works because invalid proofs are rejected
Human fallback ensures no blocking on hard proofs

Strategy:

Decidable: decide, native_decide, omega, simp (instant)
Automation: aesop, linarith, ring (seconds)
AI-assisted: LLM generates tactics, type checker validates (seconds-minutes)
Human-in-loop: sorry with TODO, create task (async)

5. Formalization Levels

Decision: Support configurable formalization levels: Types → Types+Invariants → Contracts → Full Proofs.

Rationale:

Not all specs need full proofs initially
Progressive refinement matches real development
Default to Full Proofs per user requirement
Allow relaxation when proofs are too expensive

6. Domain-Specific Strategies

Decision: Support multiple domains with specialized proof strategies.

Domains:

Algorithms & data structures (induction, recursion)
Distributed systems (bisimulation, invariants)
APIs & protocols (state transitions, session types)
Security properties (information flow, access control)
Application flow (termination, safety)

Consequences

Positive

Formal verification becomes accessible through AI assistance
Specifications are both human-readable (Topos) and machine-verifiable (Lean)
Traceability from requirements to proofs
Progressive automation handles most proofs automatically
Learning from successful proof strategies improves over time

Negative

Lean dependency adds complexity (build system, versions, mathlib)
Dual-track sync requires maintenance
AI proof generation has latency and cost
Some proofs will always need human expertise

Risks

Risk	Mitigation
Lean version fragmentation	Per-project version via elan
Mathlib size (2GB+)	Shared cache, lazy download
Proof maintenance burden	Drift detection, automated updates
AI proof quality	Type checker validation, human review

Implementation

See Lean Formal Verification Design for detailed design.

Phases

Lean REPL Foundation (2-3 weeks) - Core REPL, project management
Topos Integration (1-2 weeks) - MCP client, link annotations
Dual-Track Sync (2 weeks) - Drift detection, sync engine
Spec Agent (2-3 weeks) - Intake, refine, formalize phases
Proof Automation (2-3 weeks) - Progressive automation tiers
DP Integration (1-2 weeks) - Spec coverage, proof tracking

Success Metrics

Lean REPL executes commands correctly
70% auto-proof rate on simple theorems
Drift detection catches spec divergence
DP integration tracks formal spec coverage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR-002: Lean Formal Verification Integration

Status

Date

Context

Decision

1. Lean REPL Integration

2. Spec Agent

3. Dual-Track Sync (Topos ↔ Lean)

4. Progressive Proof Automation

5. Formalization Levels

6. Domain-Specific Strategies

Consequences

Positive

Negative

Risks

Implementation

Phases

Success Metrics

References

FilesExpand file tree

ADR-002-lean-formal-verification.md

Latest commit

History

ADR-002-lean-formal-verification.md

File metadata and controls

ADR-002: Lean Formal Verification Integration

Status

Date

Context

Decision

1. Lean REPL Integration

2. Spec Agent

3. Dual-Track Sync (Topos ↔ Lean)

4. Progressive Proof Automation

5. Formalization Levels

6. Domain-Specific Strategies

Consequences

Positive

Negative

Risks

Implementation

Phases

Success Metrics

References