Proposed
2026-01-16
We need to add formal specification and verification capabilities to rlm-core. The goals are:
- Interactive theorem proving - Allow users to develop and verify formal proofs
- Specification validation - Verify that formal specs are consistent and type-check
- Implementation verification - Prove properties about algorithms and systems
- AI-assisted specification - Help users create and refine formal specifications
Current state:
- rlm-core has a Python REPL with
ReplEnvironmenttrait for extension - We have Topos as a semantic contract language for human-AI collaboration
- AI proof assistants (DeepSeek-Prover, LeanDojo) show LLMs can generate proofs effectively
We will implement:
Decision: Add a Lean 4 REPL as a new ReplEnvironment implementation using the same subprocess + JSON-RPC pattern as the Python REPL.
Rationale:
- leanprover-community/repl already provides JSON protocol
- Same architecture as Python REPL ensures consistency
- Environment pickling allows proof state persistence
- Subprocess isolation prevents crashes from affecting host
Alternatives considered:
- Embedded Lean via FFI: Too complex, Lean's runtime is substantial
- HTTP API: More overhead, no environment persistence
- LSP only: Less flexible for programmatic interaction
Decision: Create a specialized agent for specification creation and refinement with first-class Topos integration.
Rationale:
- Specifications are the bottleneck; proofs can be automated
- Human-AI collaboration on specs is more valuable than pure AI generation
- Topos provides the semantic contract layer; Lean provides formal verification
- Dual-track approach gives both human readability and machine verifiability
Alternatives considered:
- Lean-only specs: Too hard for non-experts to read/review
- Topos-only specs: No machine verification
- Separate tools: Loses traceability and sync benefits
Decision: Maintain bidirectional links between Topos specs and Lean formalizations with drift detection.
Rationale:
- Humans review Topos (readable), machines verify Lean (precise)
- Links ensure they stay in sync
- Drift detection catches divergence early
- Progressive formalization allows starting simple
Implementation:
@leanannotations in Topos link to Lean artifacts@toposcomments in Lean link back- Sync engine generates Lean from Topos changes
- Drift detection compares semantically
Decision: Four-tier automation strategy: decidable → automation → AI-assisted → human-in-loop.
Rationale:
- Many proofs are trivial (decidable tactics handle instantly)
- Automation tactics (aesop, linarith) handle common patterns
- AI proof search works because invalid proofs are rejected
- Human fallback ensures no blocking on hard proofs
Strategy:
- Decidable: decide, native_decide, omega, simp (instant)
- Automation: aesop, linarith, ring (seconds)
- AI-assisted: LLM generates tactics, type checker validates (seconds-minutes)
- Human-in-loop: sorry with TODO, create task (async)
Decision: Support configurable formalization levels: Types → Types+Invariants → Contracts → Full Proofs.
Rationale:
- Not all specs need full proofs initially
- Progressive refinement matches real development
- Default to Full Proofs per user requirement
- Allow relaxation when proofs are too expensive
Decision: Support multiple domains with specialized proof strategies.
Domains:
- Algorithms & data structures (induction, recursion)
- Distributed systems (bisimulation, invariants)
- APIs & protocols (state transitions, session types)
- Security properties (information flow, access control)
- Application flow (termination, safety)
- Formal verification becomes accessible through AI assistance
- Specifications are both human-readable (Topos) and machine-verifiable (Lean)
- Traceability from requirements to proofs
- Progressive automation handles most proofs automatically
- Learning from successful proof strategies improves over time
- Lean dependency adds complexity (build system, versions, mathlib)
- Dual-track sync requires maintenance
- AI proof generation has latency and cost
- Some proofs will always need human expertise
| Risk | Mitigation |
|---|---|
| Lean version fragmentation | Per-project version via elan |
| Mathlib size (2GB+) | Shared cache, lazy download |
| Proof maintenance burden | Drift detection, automated updates |
| AI proof quality | Type checker validation, human review |
See Lean Formal Verification Design for detailed design.
- Lean REPL Foundation (2-3 weeks) - Core REPL, project management
- Topos Integration (1-2 weeks) - MCP client, link annotations
- Dual-Track Sync (2 weeks) - Drift detection, sync engine
- Spec Agent (2-3 weeks) - Intake, refine, formalize phases
- Proof Automation (2-3 weeks) - Progressive automation tiers
- DP Integration (1-2 weeks) - Spec coverage, proof tracking
- Lean REPL executes commands correctly
-
70% auto-proof rate on simple theorems
- Drift detection catches spec divergence
- DP integration tracks formal spec coverage