Skip to content

Commit b8e039a

Browse files
lsarra-axaustinletson-axBorjaRequenaAX-izanKrystianNowakowski
committed
A Minimal Agent for Automated Theorem Proving
Co-authored-by: Austin Letson <austin.letson@axiomatic-ai.com> Co-authored-by: Borja Requena <borja.requena.pozo@gmail.com> Co-authored-by: Izan Beltran Ferreiro <izan@axiomatic-ai.com> Co-authored-by: Krystian Nowakowski <krystian@axiomatic-ai.com> Co-authored-by: Leopoldo Sarra <leopoldo@axiomatic-ai.com>
1 parent b51a01a commit b8e039a

File tree

100 files changed

+1170
-345
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

100 files changed

+1170
-345
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,4 +164,4 @@ outputs/
164164
outputs/experiment_analysis/
165165

166166
# Generated version file (setuptools-scm)
167-
src/ax_agent/_version.py
167+
src/ax_prover/_version.py

CLAUDE.md

Lines changed: 41 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
44

55
## Project Overview
66

7-
ax-agent is a minimal LangGraph-based agent for automated Lean 4 theorem proving. It uses off-the-shelf LLMs (no fine-tuning) with iterative proof refinement, a memory system, and library search tools to prove theorems.
7+
ax-prover is a minimal LangGraph-based agent for automated Lean 4 theorem proving. It uses off-the-shelf LLMs (no fine-tuning) with iterative proof refinement, a memory system, and library search tools to prove theorems.
88

99
The agent runs a 4-node loop: Proposer → Compiler → Reviewer → Memory, iterating until the proof is complete or the iteration budget is exhausted.
1010

@@ -47,80 +47,81 @@ ruff check --fix .
4747

4848
```bash
4949
# Prove a specific theorem by location (module path)
50-
ax-agent prove MyModule.Path:theorem_name
50+
ax-prover prove MyModule.Path:theorem_name
5151

5252
# Prove a specific theorem by file path
53-
ax-agent prove MyProject/Algebra/Ring.lean:theorem_name
53+
ax-prover prove MyProject/Algebra/Ring.lean:theorem_name
5454

5555
# Prove the theorem at a specific line
56-
ax-agent prove MyProject/Algebra/Ring.lean#L42
56+
ax-prover prove MyProject/Algebra/Ring.lean#L42
5757

5858
# Prove all unproven theorems in a file
59-
ax-agent prove MyProject/Algebra/Ring.lean
59+
ax-prover prove MyProject/Algebra/Ring.lean
6060

6161
# Skip lake build (if repo is already built)
62-
ax-agent prove MyModule:theorem_name --skip-build
62+
ax-prover prove MyModule:theorem_name --skip-build
6363

6464
# Force re-proving
65-
ax-agent prove MyModule:theorem_name --overwrite
65+
ax-prover prove MyModule:theorem_name --overwrite
6666

6767
# Run batch experiments on a LangSmith dataset
68-
ax-agent experiment dataset_name --max-concurrency 8
68+
ax-prover experiment dataset_name --max-concurrency 8
6969
```
7070

7171
**Note on `--skip-build` flag:**
7272
- By default, `prove` runs `lake exe cache get && lake build` before starting
7373
- Use `--skip-build` when the repo is already built and up-to-date
74-
- The build step is defined in `src/ax_agent/utils/build.py:build_lean_repo()`
74+
- The build step is defined in `src/ax_prover/utils/build.py:build_lean_repo()`
7575

7676
## Architecture
7777

78-
### Prover Agent Loop (`src/ax_agent/prover/agent.py`)
78+
### Prover Agent Loop (`src/ax_prover/prover/agent.py`)
7979

8080
The agent uses a 4-node iterative LangGraph workflow:
8181

8282
1. **Proposer** — A ReAct-style LLM agent that writes Lean 4 proof code. Can optionally use tools (LeanSearch, web search) to find relevant Mathlib lemmas before proposing.
8383
2. **Compiler (Builder)** — Applies the proposed code via `TemporaryProposal`, builds with `lake env lean`, and extracts goal states at `sorry` locations using `lean_interact`. Returns `BuildSuccessFeedback` or `BuildFailedFeedback`.
8484
3. **Reviewer** — Verifies statement preservation and proof validity (no `sorry`, no cheating tactics like `native_decide`). Returns `ReviewApprovedFeedback` or `ReviewRejectedFeedback`.
85-
4. **Memory** (`src/ax_agent/prover/memory.py`) — Summarizes lessons from failed attempts into a concise context ("lab notebook") to prevent repeating mistakes. Default strategy: `ExperienceProcessor` (self-reflection).
85+
4. **Memory** (`src/ax_prover/prover/memory.py`) — Summarizes lessons from failed attempts into a concise context ("lab notebook") to prevent repeating mistakes. Default strategy: `ExperienceProcessor` (self-reflection).
8686

8787
Loop: Proposer → Builder → (Reviewer if build succeeds) → Memory → back to Proposer. Terminates on review approval, max iterations, or build timeout.
8888

8989
### Key Abstractions
9090

91-
**State Models** (`src/ax_agent/models/`):
91+
**State Models** (`src/ax_prover/models/`):
9292
- `ProverAgentState` (`proving.py`): Main state for the prover workflow — messages, item, metrics, iteration tracking
9393
- `TargetItem` (`proving.py`): A theorem to prove — title, location, proven status
9494
- `Location` (`files.py`): Where code lives — `Module.Path:function_name` or `path/to/file.lean:function_name`
9595
- `Declaration` (`declaration.py`): A parsed Lean declaration with name, type, body, and line info
9696

97-
**Messages** (`src/ax_agent/models/messages.py`):
97+
**Messages** (`src/ax_prover/models/messages.py`):
9898
- `ProposalMessage`: Code proposals with reasoning, imports, opens, and updated theorem
9999
- `FeedbackMessage`: Base class for feedback — `BuildSuccessFeedback`, `BuildFailedFeedback`, `ReviewApprovedFeedback`, `ReviewRejectedFeedback`, `SorriesGoalStateFeedback`, etc.
100100

101-
**Configuration** (`src/ax_agent/config.py`):
101+
**Configuration** (`src/ax_prover/config.py`):
102102
- `Config`: Root config with `ProverConfig` and `ToolsConfig`
103103
- `ProverConfig`: LLM config, tools list, max iterations, memory config
104104
- OmegaConf-based: supports YAML files, CLI overrides, config merging
105105

106106
### Tools
107107

108-
**Lean Search** (`src/ax_agent/tools/lean_search.py`):
108+
**Lean Search** (`src/ax_prover/tools/lean_search.py`):
109109
- Searches Lean 4/Mathlib theorems via LeanSearch API
110110
- Default: `https://leansearch.net` (public, no setup)
111111

112-
**Web Search** (`src/ax_agent/tools/web_search.py`):
112+
**Web Search** (`src/ax_prover/tools/web_search.py`):
113113
- Tavily API for finding proof strategies online
114114

115-
**Lean Build** (`src/ax_agent/utils/build.py`):
115+
**Lean Build** (`src/ax_prover/utils/build.py`):
116116
- `build_lean_repo()`: Runs `lake exe cache get && lake build`
117117
- `check_lean_file()`: Compiles a single file with `lake env lean`
118118
- `TemporaryProposal`: Context manager that applies code changes to a temp file, tests compilation, and can commit permanently
119119

120120
### Commands
121121

122-
- `prove` (`src/ax_agent/commands/prove.py`): Prove theorems by location or all unproven in a file
123-
- `experiment` (`src/ax_agent/commands/experiment.py`): Run batch experiments on LangSmith datasets with evaluation metrics
122+
- `prove` (`src/ax_prover/commands/prove.py`): Prove theorems by location or all unproven in a file
123+
- `experiment` (`src/ax_prover/commands/experiment.py`): Run batch experiments on LangSmith datasets with evaluation metrics
124+
- `configure` (`src/ax_prover/commands/configure.py`): Interactive setup for API keys (writes to platform config dir via `platformdirs`)
124125

125126
### LangSmith Integration
126127

@@ -151,7 +152,7 @@ with TemporaryProposal(base_folder, location, proposal) as applier:
151152
Locations support both file paths and module paths:
152153

153154
```python
154-
from ax_agent.models.files import Location
155+
from ax_prover.models.files import Location
155156

156157
# Both formats work
157158
loc = Location.from_string("MyProject.Path:theorem")
@@ -248,18 +249,31 @@ counter += 1
248249

249250
## Configuration
250251

251-
Environment variables (in `.env.secrets`):
252-
- `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GOOGLE_API_KEY` (at least one required)
253-
- `TAVILY_API_KEY` (optional, for web search)
254-
- `LANGSMITH_API_KEY`, `LANGSMITH_TRACING` (optional, for observability)
252+
### API Keys
253+
254+
Set up API keys interactively with `ax-prover configure`, or via environment variables.
255+
256+
Secrets cascade (first found wins, shell env always takes priority):
257+
1. CWD `.env.secrets`
258+
2. `--folder` `.env.secrets`
259+
3. `<platformdirs config>/.env.secrets` (written by `ax-prover configure`)
260+
4. Package root `.env.secrets` (editable installs)
261+
262+
Required: at least one of `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GOOGLE_API_KEY`
263+
Optional: `TAVILY_API_KEY` (web search), `LANGSMITH_API_KEY` (tracing)
264+
265+
### YAML Config
255266

256267
LLM model is configured via YAML config files (no hardcoded default).
268+
Default config is bundled in the package at `src/ax_prover/configs/default.yaml`.
269+
Config resolution: CWD > `--folder` > bundled package configs > package root `configs/`.
257270

258271
## Repository Structure
259272

260273
```
261-
src/ax_agent/
262-
├── commands/ # CLI command implementations (prove, experiment)
274+
src/ax_prover/
275+
├── commands/ # CLI command implementations (prove, experiment, configure)
276+
├── configs/ # Bundled default YAML configs and secrets template
263277
├── models/ # Pydantic state models (proving, messages, files, declaration)
264278
├── prover/ # Prover agent (agent, memory, prompts)
265279
├── tools/ # LangChain tools (lean_search, web_search)
@@ -268,6 +282,7 @@ src/ax_agent/
268282
├── evaluators.py # LangSmith experiment evaluators
269283
└── main.py # CLI entry point
270284
285+
configs/ # Experimental/ablation configs (not shipped in package)
271286
tests/ # Pytest tests mirroring src structure
272287
```
273288

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Contributing to ax-agent
1+
# Contributing to ax-prover
22

33
Thanks for your interest in contributing! Whether it's a bug report, a feature idea, or a pull request — all contributions are welcome.
44

@@ -14,7 +14,7 @@ Thanks for your interest in contributing! Whether it's a bug report, a feature i
1414
```bash
1515
# Clone the repository
1616
git clone https://github.com/Axiomatic-AI/ax-prover-base.git
17-
cd ax-agent
17+
cd ax-prover
1818

1919
# Create and activate a virtual environment
2020
python -m venv .venv

README.md

Lines changed: 31 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
# ax-agent
1+
# ax-prover
22

33
**A minimal agent for automated theorem proving in Lean 4**
44

5-
[![CI](https://github.com/Axiomatic-AI/ax-prover-base/actions/workflows/unit_tests.yml/badge.svg)](https://github.com/Axiomatic-AI/ax-prover-base/actions/workflows/unit_tests.yml)
5+
[![CI](https://github.com/Axiomatic-AI/ax-prover/actions/workflows/unit_tests.yml/badge.svg)](https://github.com/Axiomatic-AI/ax-prover/actions/workflows/unit_tests.yml)
66
[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue)](https://www.python.org/downloads/)
77
[![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL--3.0-blue.svg)](LICENSE)
8-
[![PyPI version](https://img.shields.io/pypi/v/ax-agent)](https://pypi.org/project/ax-agent/)
8+
[![PyPI version](https://img.shields.io/pypi/v/ax-prover)](https://pypi.org/project/ax-prover/)
99

1010
A simple, modular agent that proves Lean 4 theorems through iterative refinement.
1111
It uses off-the-shelf LLMs (no fine-tuning) with a feedback loop, a memory system, and library search tools to achieve competitive results against highly-engineered systems that rely on specialized training and orders of magnitude more compute.
@@ -25,7 +25,7 @@ All results with Claude Opus 4.5, 50 iterations, pass@1. See our [paper](#citati
2525
## How It Works
2626

2727
<p align="center">
28-
<img src="assets/figure1.png" alt="ax-agent architecture" width="500">
28+
<img src="assets/figure1.png" alt="ax-prover architecture" width="500">
2929
</p>
3030

3131
The agent runs an iterative loop:
@@ -40,27 +40,26 @@ The loop continues until the proof is complete or the iteration budget is exhaus
4040
## Quick Start
4141

4242
```bash
43-
pip install ax-agent
43+
pip install ax-prover
4444
```
4545

4646
```bash
4747
# Configure your API keys
48-
cp .env.secrets.example .env.secrets
49-
# Edit .env.secrets and add at least one LLM key (Anthropic recommended)
48+
ax-prover configure
5049

5150
# Navigate to a Lean 4 project
5251
cd /path/to/lean4-project
5352

5453
# Prove a theorem
55-
ax-agent prove MyModule:my_theorem
54+
ax-prover prove MyModule:my_theorem
5655
```
5756

5857
## Installation
5958

6059
```bash
61-
pip install ax-agent
60+
pip install ax-prover
6261
# or
63-
uv add ax-agent
62+
uv add ax-prover
6463

6564
# For development (includes ruff, pytest, pre-commit)
6665
pip install -e ".[dev]"
@@ -77,11 +76,16 @@ pip install -e ".[dev]"
7776
- `GOOGLE_API_KEY`
7877
- **Tavily API key** (optional, for web search) — `TAVILY_API_KEY`
7978

80-
Create a `.env.secrets` file or export the keys in your shell:
79+
Set up your API keys interactively:
8180

8281
```bash
83-
cp .env.secrets.example .env.secrets
84-
# Edit .env.secrets with your keys
82+
ax-prover configure
83+
```
84+
85+
Or export them directly in your shell:
86+
87+
```bash
88+
export ANTHROPIC_API_KEY=sk-ant-...
8589
```
8690

8791
</details>
@@ -92,22 +96,22 @@ cp .env.secrets.example .env.secrets
9296

9397
```bash
9498
# Prove a specific theorem by module path
95-
ax-agent prove MyModule.Path:theorem_name
99+
ax-prover prove MyModule.Path:theorem_name
96100

97101
# Prove a specific theorem by file path
98-
ax-agent prove MyProject/Algebra/Ring.lean:theorem_name
102+
ax-prover prove MyProject/Algebra/Ring.lean:theorem_name
99103

100104
# Prove the theorem at a specific line
101-
ax-agent prove MyProject/Algebra/Ring.lean#L42
105+
ax-prover prove MyProject/Algebra/Ring.lean#L42
102106

103107
# Prove all unproven theorems in a file
104-
ax-agent prove MyProject/Algebra/Ring.lean
108+
ax-prover prove MyProject/Algebra/Ring.lean
105109

106110
# Skip lake build (if repo is already built)
107-
ax-agent prove MyModule:theorem_name --skip-build
111+
ax-prover prove MyModule:theorem_name --skip-build
108112

109113
# Save JSON output to file (for scripting/automation)
110-
ax-agent prove MyModule:theorem_name -o result.json
114+
ax-prover prove MyModule:theorem_name -o result.json
111115
```
112116

113117
### Running experiments
@@ -116,10 +120,10 @@ Run batch evaluations on [LangSmith](https://smith.langchain.com) datasets:
116120

117121
```bash
118122
# Run experiment on a dataset
119-
ax-agent experiment dataset_name
123+
ax-prover experiment dataset_name
120124

121125
# With custom concurrency
122-
ax-agent experiment dataset_name --max-concurrency 8
126+
ax-prover experiment dataset_name --max-concurrency 8
123127
```
124128

125129
<details>
@@ -141,10 +145,13 @@ prover:
141145
142146
```bash
143147
# Use a config file
144-
ax-agent --config my_config.yaml prove MyModule:theorem
148+
ax-prover --config my_config.yaml prove MyModule:theorem
145149

146150
# Override values from the CLI
147-
ax-agent prove MyModule:theorem prover.max_iterations=100
151+
ax-prover prove MyModule:theorem prover.max_iterations=100
152+
153+
# Save your current configuration for later reuse
154+
ax-prover --save-config my_setup prove MyModule:theorem
148155
```
149156

150157
</details>
@@ -160,7 +167,7 @@ This project is licensed under the [AGPL-3.0](LICENSE).
160167

161168
## Citation
162169

163-
If you use ax-agent in your research, please cite:
170+
If you use ax-prover in your research, please cite:
164171

165172
```bibtex
166173
@article{axproverbase2026,

configs/claude_ablations/claude4-5-fast-memory.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import:
2-
- configs/llms.yaml
3-
- configs/default.yaml
2+
- ../llms.yaml
3+
- ../default.yaml
44

55
prover:
66
prover_llm: ${llm_configs.claude_opus_4_5}
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import:
2-
- configs/llms.yaml
3-
- configs/default.yaml
2+
- ../llms.yaml
3+
- ../default.yaml
44

55
prover:
66
prover_llm: ${llm_configs.claude_opus_4_5}

configs/claude_ablations/claude4-6-fast-memory.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import:
2-
- configs/llms.yaml
3-
- configs/default.yaml
2+
- ../llms.yaml
3+
- ../default.yaml
44

55
prover:
66
prover_llm: ${llm_configs.claude_opus_4_6}
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import:
2-
- configs/llms.yaml
3-
- configs/default.yaml
2+
- ../llms.yaml
3+
- ../default.yaml
44

55
prover:
66
prover_llm: ${llm_configs.claude_opus_4_6}

configs/default.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
import:
2-
- configs/llms.yaml
3-
- configs/tools.yaml
2+
- llms.yaml
3+
- tools.yaml
44

55
prover:
66
prover_llm: ${llm_configs.claude_opus_4_5}
77
proposer_tools:
8-
- ${tool_configs.search_lean_search}
9-
- ${tool_configs.search_web}
8+
search_lean: ${tool_configs.search_lean_search}
9+
search_web: ${tool_configs.search_web}
1010
max_iterations: 50
1111
memory_config:
1212
class_name: ExperienceProcessor

configs/default_base_llm.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# Remove iterations, memory and tools
22
import:
3-
- configs/default.yaml
3+
- default.yaml
44

55
prover:
6-
proposer_tools: []
6+
proposer_tools: {}
77
max_iterations: 1
88
memory_config:
99
class_name: MemorylessProcessor

0 commit comments

Comments
 (0)