You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CLAUDE.md
+41-26Lines changed: 41 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
4
4
5
5
## Project Overview
6
6
7
-
ax-agent is a minimal LangGraph-based agent for automated Lean 4 theorem proving. It uses off-the-shelf LLMs (no fine-tuning) with iterative proof refinement, a memory system, and library search tools to prove theorems.
7
+
ax-prover is a minimal LangGraph-based agent for automated Lean 4 theorem proving. It uses off-the-shelf LLMs (no fine-tuning) with iterative proof refinement, a memory system, and library search tools to prove theorems.
8
8
9
9
The agent runs a 4-node loop: Proposer → Compiler → Reviewer → Memory, iterating until the proof is complete or the iteration budget is exhausted.
10
10
@@ -47,80 +47,81 @@ ruff check --fix .
47
47
48
48
```bash
49
49
# Prove a specific theorem by location (module path)
The agent uses a 4-node iterative LangGraph workflow:
81
81
82
82
1.**Proposer** — A ReAct-style LLM agent that writes Lean 4 proof code. Can optionally use tools (LeanSearch, web search) to find relevant Mathlib lemmas before proposing.
83
83
2.**Compiler (Builder)** — Applies the proposed code via `TemporaryProposal`, builds with `lake env lean`, and extracts goal states at `sorry` locations using `lean_interact`. Returns `BuildSuccessFeedback` or `BuildFailedFeedback`.
84
84
3.**Reviewer** — Verifies statement preservation and proof validity (no `sorry`, no cheating tactics like `native_decide`). Returns `ReviewApprovedFeedback` or `ReviewRejectedFeedback`.
85
-
4.**Memory** (`src/ax_agent/prover/memory.py`) — Summarizes lessons from failed attempts into a concise context ("lab notebook") to prevent repeating mistakes. Default strategy: `ExperienceProcessor` (self-reflection).
85
+
4.**Memory** (`src/ax_prover/prover/memory.py`) — Summarizes lessons from failed attempts into a concise context ("lab notebook") to prevent repeating mistakes. Default strategy: `ExperienceProcessor` (self-reflection).
86
86
87
87
Loop: Proposer → Builder → (Reviewer if build succeeds) → Memory → back to Proposer. Terminates on review approval, max iterations, or build timeout.
88
88
89
89
### Key Abstractions
90
90
91
-
**State Models** (`src/ax_agent/models/`):
91
+
**State Models** (`src/ax_prover/models/`):
92
92
-`ProverAgentState` (`proving.py`): Main state for the prover workflow — messages, item, metrics, iteration tracking
93
93
-`TargetItem` (`proving.py`): A theorem to prove — title, location, proven status
94
94
-`Location` (`files.py`): Where code lives — `Module.Path:function_name` or `path/to/file.lean:function_name`
95
95
-`Declaration` (`declaration.py`): A parsed Lean declaration with name, type, body, and line info
-`ProposalMessage`: Code proposals with reasoning, imports, opens, and updated theorem
99
99
-`FeedbackMessage`: Base class for feedback — `BuildSuccessFeedback`, `BuildFailedFeedback`, `ReviewApprovedFeedback`, `ReviewRejectedFeedback`, `SorriesGoalStateFeedback`, etc.
100
100
101
-
**Configuration** (`src/ax_agent/config.py`):
101
+
**Configuration** (`src/ax_prover/config.py`):
102
102
-`Config`: Root config with `ProverConfig` and `ToolsConfig`
103
103
-`ProverConfig`: LLM config, tools list, max iterations, memory config
A simple, modular agent that proves Lean 4 theorems through iterative refinement.
11
11
It uses off-the-shelf LLMs (no fine-tuning) with a feedback loop, a memory system, and library search tools to achieve competitive results against highly-engineered systems that rely on specialized training and orders of magnitude more compute.
@@ -25,7 +25,7 @@ All results with Claude Opus 4.5, 50 iterations, pass@1. See our [paper](#citati
0 commit comments