ChainForge

ChainForge demonstrates a multi-stage workflow for training Qwen2.5 models using ideas from the DeepSeek-R1 paper. It combines DeepSeek Reasoner's chain-of-thought (CoT) output with optional Anthropic Claude expansions. This revision includes a unified call_model() helper and adds support for Claude 4, the latest DeepSeek-R1 checkpoint, and Mistral's Devstral model.

Key stages include:

Hybrid CoT Collection – gather reasoning traces from DeepSeek and expand uncertain steps with Claude.
Cold-Start SFT – fine-tune on the collected CoT data.
Reasoning-Oriented RL – train the model with a GRPO-style algorithm.
Rejection Sampling – filter the best RL completions and run an additional SFT pass.
Final RL & Optional Distillation – further improve the model and optionally distill to smaller checkpoints.
A placeholder diffusion_refine() hook exists for future Diffusion-of-Thought reasoning refinements.

Requirements

Python 3.8+
A GPU is recommended for RL stages
DEEPSEEK_API_KEY and ANTHROPIC_API_KEY environment variables
pip install -r requirements.txt

Usage

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
export DEEPSEEK_API_KEY="..."
export ANTHROPIC_API_KEY="..."
python deepseek_qwen2_5_integration_r1.py

Project Structure

.
├── deepseek_qwen2_5_integration_r1.py  # Main pipeline
├── requirements.txt
└── README.md

MLX-GRPO Enhancements

The training script now includes features inspired by the MLX-GRPO project:

Dataclass configs – TrainingArgs and RewardConfig centralise hyper-parameters and reward weights.
Modular rewards – format and content rewards can be combined for verifiable tasks.
Adaptive KL penalty and atomic checkpointing ensure stable RL runs that can be resumed from the last checkpoint.
Optional KV cache quantisation and basic speculative decoding speed up generation on Apple Silicon.

Citation

If you use this project, please cite the DeepSeek-R1 paper:

@misc{deepseek2024r1,
  title={DeepSeek-R1: Augmenting Reasoning via Reinforcement Learning},
  author={DeepSeek Team},
  year={2024},
  publisher={arXiv}
}

License

This repository is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
2501.12948v1.pdf		2501.12948v1.pdf
LICENSE		LICENSE
README.md		README.md
deepseek_qwen2_5_integration_r1.py		deepseek_qwen2_5_integration_r1.py
deepseekpapertext.md		deepseekpapertext.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ChainForge

Requirements

Usage

Project Structure

MLX-GRPO Enhancements

Citation

License

About

Uh oh!

Releases

Packages

Languages

License

nschlaepfer/ChainForge-R1-SuperCoT

Folders and files

Latest commit

History

Repository files navigation

ChainForge

Requirements

Usage

Project Structure

MLX-GRPO Enhancements

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages