StepFun-Formalizer: Unlocking the Autoformalization Potential of LLMs through Knowledge-Reasoning Fusion

Introduction

We introduce StepFun-Formalizer, a family of large language models designed to translate natural-language mathematical problems into formal statements in Lean 4. Through the fusion of formal knowledge and informal-to-formal reasoning capability, StepFun-Formalizer achieves strong performance on autoformalization tasks. Evaluated with BEq verification on mainstream benchmarks including FormalMATH-Lite, ProverBench, and CombiBench, StepFun-Formalizer matches or exceeds all prior general-purpose and specialized autoformalization models of comparable scale. Please refer to our paper for more details.

Figure 1: A case study to demonstrate the impact of formal knowledge and informal-to-formal reasoning capability on autoformalization models. It shows that general-purpose models without formal knowledge make mistakes in code implementation, while specialized ones without reasoning capability struggle with problem understanding and informal-formal alignment. StepFun-Formalizer improves autoformalization performance by combining these two capabilities.

Method

Figure 2: The illustration of our method. It shows the construction of the knowledge and reasoning datasets (① and ②), as well as the training process including SFT and RL (③ and ④).

Evalaution Results

Tabel 1: BEq@1 and BEq@16 (%) results of StepFun-Formalizer and baselines on three benchmarks. See src/eval_benchmarks.py for the evaluation code.

Model Download

Model	Download
StepFun-Formalizer-7B	🤗HuggingFace
StepFun-Formalizer-32B	🤗HuggingFace

License

Both the code repository and the model weights are released under the Apache License (Version 2.0).

Citation

@misc{stepfunformalizer2025,
      title={StepFun-Formalizer: Unlocking the Autoformalization Potential of LLMs through Knowledge-Reasoning Fusion}, 
      author={Yutong Wu and Di Huang and Ruosi Wan and Yue Peng and Shijie Shang and Chenrui Cao and Lei Qi and Rui Zhang and Zidong Du and Jie Yan and Xing Hu},
      year={2025},
      eprint={2508.04440},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.04440}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StepFun-Formalizer: Unlocking the Autoformalization Potential of LLMs through Knowledge-Reasoning Fusion

Introduction

Method

Evalaution Results

Model Download

License

Citation

About

Uh oh!

Releases

Packages

Languages

License

stepfun-ai/StepFun-Formalizer

Folders and files

Latest commit

History

Repository files navigation

StepFun-Formalizer: Unlocking the Autoformalization Potential of LLMs through Knowledge-Reasoning Fusion

Introduction

Method

Evalaution Results

Model Download

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages