A minimal agent for automated theorem proving in Lean 4
A simple, modular agent that proves Lean 4 theorems through iterative refinement. It uses off-the-shelf LLMs (no fine-tuning) with a feedback loop, a memory system, and library search tools to achieve competitive results against highly-engineered systems that rely on specialized training and orders of magnitude more compute.
| Benchmark | AxProverBase | Best Comparable |
|---|---|---|
| PutnamBench | 54.7% (pass@1) | 13.0% (Goedel V2, pass@184) |
| FATE-M | 98.0% | 62.7% (DeepSeek V2, pass@64) |
| FATE-H | 66.0% | 3.0% (DeepSeek V2) |
| FATE-X | 24.0% | 0.0% (all others) |
| LeanCat | 59.0% | 14.0% (Gemini 3 Pro) |
All results with Claude Opus 4.5, 50 iterations, pass@1. See our paper for full details and comparisons.
The agent runs an iterative loop:
- Proposer — An LLM writes Lean 4 proof code, optionally using tools (LeanSearch, web search) to find relevant Mathlib lemmas
- Compiler — Builds the code with
lake; extracts goal states atsorrylocations to provide structured feedback - Reviewer — Verifies statement preservation and proof validity (no
sorry, no cheating tactics) - Memory — Summarizes lessons from failed attempts into a concise "lab notebook" to prevent repeating mistakes
The loop continues until the proof is complete or the iteration budget is exhausted (default: 50).
pip install ax-prover# Configure your API keys
ax-prover configure
# Navigate to a Lean 4 project
cd /path/to/lean4-project
# Prove a theorem
ax-prover prove MyModule:my_theorempip install ax-prover
# or
uv add ax-prover
# For development (includes ruff, pytest, pre-commit)
pip install -e ".[dev]"Prerequisites
- Python 3.11+
- Lean 4 with
lakeavailable on PATH (installation guide) - LLM API key — at least one of:
ANTHROPIC_API_KEY(recommended — Claude Opus 4.5 gives best results)OPENAI_API_KEYGOOGLE_API_KEY
- Tavily API key (optional, for web search) —
TAVILY_API_KEY
Set up your API keys interactively:
ax-prover configureOr export them directly in your shell:
export ANTHROPIC_API_KEY=sk-ant-...# Prove a specific theorem by module path
ax-prover prove MyModule.Path:theorem_name
# Prove a specific theorem by file path
ax-prover prove MyProject/Algebra/Ring.lean:theorem_name
# Prove the theorem at a specific line
ax-prover prove MyProject/Algebra/Ring.lean#L42
# Prove all unproven theorems in a file
ax-prover prove MyProject/Algebra/Ring.lean
# Skip lake build (if repo is already built)
ax-prover prove MyModule:theorem_name --skip-build
# Save JSON output to file (for scripting/automation)
ax-prover prove MyModule:theorem_name -o result.jsonRun batch evaluations on LangSmith datasets:
# Run experiment on a dataset
ax-prover experiment dataset_name
# With custom concurrency
ax-prover experiment dataset_name --max-concurrency 8Configuration
Customize behavior with YAML config files and CLI overrides:
# my_config.yaml
prover:
max_iterations: 75
prover_llm:
model: "anthropic:claude-opus-4-20250514"
temperature: 0.5
thinking:
type: enabled
budget_tokens: 32000# Use a config file
ax-prover --config my_config.yaml prove MyModule:theorem
# Override values from the CLI
ax-prover prove MyModule:theorem prover.max_iterations=100
# Save your current configuration for later reuse
ax-prover --save-config my_setup prove MyModule:theoremWe welcome contributions of all kinds — bug reports, feature requests, documentation, and code. See our Contributing Guide to get started.
This project is licensed under the AGPL-3.0.
If you use ax-prover in your research, please cite:
@article{axproverbase2026,
title={A Minimal Agent for Automated Theorem Proving},
author={Requena Pozo, Borja and Letson, Austin and Nowakowski, Krystian and Beltran Ferreiro, Izan and Sarra, Leopoldo},
year={2026}
}