Skip to content

Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision

Notifications You must be signed in to change notification settings

InternScience/Sci-CoE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision

🌟 Introduction

We tackle a key challenge in RL for LLMs: 🧩 how to train models when there is no explicit final answer and no reliable outcome reward, such as unit tests or exact matching. This is common in 🔬 scientific reasoning and 📐 mathematical proofs, where solutions are non-unique and expressed in natural language. In these settings, traditional reward signals ❌ fail, making standard RL ineffective.

To address this challenge, we propose Sci-CoE, a two-stage scientific co-evolving framework that enables models to self-evolve as both 🧠 solver and 🔍 verifier through a transition from sparse supervision to unsupervised learning.

  • Stage 1: Using a small set of annotated data 🏷️, the model establishes fundamental correctness judgment anchors for the Verifier.
  • Stage 2: We introduce a geometric reward mechanism 📐 that jointly models ✅ consensus, 🔒 reliability, and 🌱 diversity, enabling stable and scalable self-iteration on unlabeled data without relying on explicit ground-truth answers.

Sci-CoE thus transforms the absence of outcome rewards into a structured self-evolving learning signal. 🚀

🔧 Training

To start training, simply set the configurations in ./optimization/optimization_config.py, run the following command.

python run.py

🧪 Evaluation

The accuracy of generated solutions, verification strategies, and Best-of-N (BoN) performance during the training process can be evaluated using the following script:

cd evaluation
python eval.py

You can modify the model path and evaluation configurations in ./evaluation/evaluation_config.py.

📊 Benchmarks and Results

For final benchmark evaluation, we use the official evaluation scripts provided by each dataset. The evaluation scripts for MMLU-Pro, GPQA-Diamond and UGPhysics are provided in ./benchmarks.

The results of Sci-CoE on the above three benchmarks are as follows:

Model MMLU-Pro Bio. Bus. Che. C.S. Eco. Eng. Hea. His. Law Math Phi. Phy. Psy. Oth.
Qwen2.5-7B-Instruct
Base Model 57.39 72.11 64.89 57.16 60.49 68.84 39.94 56.85 48.29 32.52 71.87 49.90 58.35 65.91 54.33
Sci-CoE-Stage 1 57.68 74.34 67.17 56.89 60.73 68.72 40.04 56.60 47.77 33.33 72.09 48.50 59.05 65.54 53.90
Sci-CoE-Stage 2-18k 58.05 73.50 68.69 57.16 61.22 68.01 40.04 58.07 49.08 32.61 72.54 50.50 58.97 66.67 54.65
Sci-CoE-Stage 2-30k 58.51 73.92 68.19 55.39 61.71 70.62 42.31 58.19 48.82 34.06 72.76 50.50 59.35 67.17 54.87
Qwen3-8B
Base Model 63.19 78.80 69.71 68.02 66.10 72.27 53.04 62.47 51.97 31.52 78.53 51.50 67.67 69.30 55.95
Sci-CoE-Stage 1 63.27 78.94 69.20 66.78 65.85 72.63 53.77 63.08 50.92 32.61 78.09 52.51 68.44 69.42 55.41
Sci-CoE-Stage 2-18k 63.56 79.22 70.85 68.02 66.34 72.87 53.35 62.71 51.44 32.52 79.42 52.91 67.74 69.17 55.19
Sci-CoE-Stage 2-30k 64.34 80.20 70.72 68.20 68.05 73.93 54.59 63.33 54.07 33.42 79.79 53.51 68.36 70.30 56.06
Model Data Scale UGPhysics Mec. and Ther. Elec. Modern Physics
Qwen2.5-7B-Instruct
Base Model -- 20.67 18.88 18.52 23.34
Sci-CoE-Stage 1 4k 21.07 20.14 19.81 22.51
Sci-CoE-Stage 2 18k 21.92 20.92 21.31 23.17
Sci-CoE-Stage 2 30k 22.64 21.84 23.13 24.91
Qwen3-8B
Base Model -- 31.76 30.73 29.98 33.51
Sci-CoE-Stage 1 4k 32.03 30.25 30.62 34.38
Sci-CoE-Stage 2 18k 32.46 30.21 33.30 34.38
Sci-CoE-Stage 2 30k 33.10 30.51 34.80 34.99
Model Data Scale GPQA-Diamond Physics Chemistry Biology
Qwen2.5-7B-Instruct
Base Model -- 30.81 33.73 24.73 47.37
Sci-CoE-Stage 1 4k 31.31 34.88 24.73 47.37
Sci-CoE-Stage 2 18k 33.33 41.86 23.66 42.11
Sci-CoE-Stage 2 30k 35.35 41.86 26.88 47.37
Qwen3-8B
Base Model -- 36.87 39.53 33.33 42.11
Sci-CoE-Stage 1 4k 37.88 45.35 29.03 47.37
Sci-CoE-Stage 2 18k 38.89 41.86 33.33 52.63
Sci-CoE-Stage 2 30k 40.91 43.02 35.48 57.89

🤝 Citation

@misc{he2026scicoecoevolvingscientificreasoning,
      title={Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision}, 
      author={Xiaohan He and Shiyang Feng and Songtao Huang and Lei Bai and Bin Wang and Bo Zhang},
      year={2026},
      eprint={2602.12164},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.12164}, 
}

🤝 Acknowledgement

We sincerely thank the CURE for laying the foundation in Co-evolving mechanism. Sci-CoE is developed on top of the CURE framework, inheriting its training pipeline while adapting it to scientific reasoning settings.

About

Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published