GitHub - stepfun-ai/StepFun-Prover-Preview: Large language models designed for formal theorem proving through tool-integrated reasoning.

Introduction

We introduce StepFun-Prover-Preview, large language models designed for formal theorem proving through tool-integrated reasoning. Using a reinforcement learning pipeline that incorporates tool-based interactions, StepFun-Prover Preview can achieve strong performance in generating Lean 4 proofs with minimal sampling. Our approach enables the model to emulate human-like problem-solving strategies by iteratively refining proofs based on real-time environment feedback. On the miniF2F-test benchmark, StepFun-Prover-Preview-32B achieves a pass@1 success rate of 70%. Please refer to our technical report for more details.

Figure 1: Performance comparison on MiniF2F-test. y-axis shows the pass@1, which is computed by averaging 32 trials; while x-axis denotes the maximum number the provers are allowed to interact with Lean4-REPL before getting successful proof. Note both DeepSeek-Prover and Kimina-Prover utilize at least 32K token context length. Stepfun-Prover was evaluated using 20K context window including feedback from Lean4-REPL.

Methodology

Figure 2: Our training pipeline. Left. Tool-integrated RL and iterative RL-SFT cycle. The upper left illustrates data preparation. Right. Tool-integrated reasoning pattern.

Experimental Results

Model	MiniF2F-test (Pass@1)
DeepSeek-Prover-V2-7B	58.6%
DeepSeek-Prover-V2-671B	61.9%
Kimina-Prover-8B	61.1%
Kimina-Prover-72B	63.9%
StepFun-Prover-Preview-7B	66.0%
StepFun-Prover-Preview-32B	70.0%

Tabel 1: Performance comparison of theorem proving models on the miniF2F-test dataset under minimal sampling budgets. For StepFun-Provers, we generate 32 responses per query to estimate Pass@1.

Maximum generation length	MiniF2F-test (Pass@1)
4096	58.3%
8192	66.5%
12288	68.9%
16384	69.9%
20480	70.0%

Table 2: Performance of StepFun-Prover-Preview-32B on MiniF2F-test with various maximum generation lengths. Longer response can get better results.

Model Download

Model	Download
StepFun-Prover-Preview-7B	🤗HuggingFace
StepFun-Prover-Preview-32B	🤗HuggingFace

License

Both the code repository and the model weights are released under the Apache License (Version 2.0).

Citation

@misc{stepfunprover2025,
  title={StepFun-Prover Preview: Let's Think and Verify Step by Step},
  author={Shang, Shijie and Wan, Ruosi and Peng, Yue and Wu, Yutong and Chen, Xiong-hui and Yan, Jie and Zhang, Xiangyu},
  year={2025},
  eprint={2507.20199},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2507.20199},
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
demo		demo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Methodology

Experimental Results

Model Download

License

Citation

About

Uh oh!

Releases

Packages

License

stepfun-ai/StepFun-Prover-Preview

Folders and files

Latest commit

History

Repository files navigation

Introduction

Methodology

Experimental Results

Model Download

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages