Black-Box On-Policy Distillation of Large Language Models

This repository contains the algorithm implementation for our paper "Black-Box On-Policy Distillation of Large Language Models".

📄 Paper: arXiv:2511.10643

💾 Data: LMSYS-Chat-GPT-5-Chat-Response

🤖 Models: GAD Models

🚀 Getting Started

We use two repos as to easily install different branches for different experiments. Check GAD Repo for environment setup and scripts for running experiments. Check this repo for algorithm implementation.

We implement based on VeRL. We hack to use the critic module in VeRL as our discriminator.

There are four branches in this repo: seqkd branch for running the SeqKD baseline, warmup branch for warmup stage of our method, gad branch for GAD training stage of our method and eval branch to use the already-trained model to perform generation only.

For SeqKD and warmup stage of GAD, the student is supervised-finetuned on the teacher response (corresponding code at sft_seqkd and sft_warmup). We choose to use this VeRL-based repo to implement them for best alignment.

Code Guide

We provide a code walk-through of this branch gad.

Training Entrance
Student Rollout: Entrance and Implementation
Discriminator Update with BT Loss: Entrance and Implementation
Student Update with Discriminator Score: Entrance and Implementation

📄 Citation

If you find this work useful, please cite our paper:

@article{ye2025blackboxonpolicydistillationlarge,
  title={Black-Box On-Policy Distillation of Large Language Models},
  author={Tianzhu Ye and Li Dong and Zewen Chi and Xun Wu and Shaohan Huang and Furu Wei},
  journal={arXiv preprint arXiv:2511.10643},
  year={2025},
  url={https://arxiv.org/abs/2511.10643}
}

📧 Contact

For any questions or issues, please open an issue in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
docker		docker
docs		docs
examples		examples
recipe		recipe
scripts		scripts
tests		tests
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Black-Box On-Policy Distillation of Large Language Models

🚀 Getting Started

Code Guide

📄 Citation

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

YTianZHU/verl

Folders and files

Latest commit

History

Repository files navigation

Black-Box On-Policy Distillation of Large Language Models

🚀 Getting Started

Code Guide

📄 Citation

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages