Releases · TIGER-AI-Lab/verl-tool

Summary of Change

In this version, we have updated the verl version to 0.6.0 and vllm to 0.11.0 to support more models and incorporate latest features of verl. It's able to train Qwen-3-VL, Qwen3-Omni with your customized tools.

Below are the key changes and instructions for upgrading your existing VerlTool setup to be compatible with these new versions.

verl-tool's codebase has been completely re-organized. Thanks to the verl's agent loop abstraction design, we are able to put all the verl-tool's agentic logic in a single file verl_tool/agent_loop/verltool_agent_loop.py, with the main agent loop logic less than 200 lines of code. This greatly improves the modularity and maintainability of the codebase. Please refer to the new code structure when making any custom modifications.
verl-tool keeps its support for both text-only LLMs and multi-modal models training, with math_tir and pixel_reasoner as examples correspondingly.
We strictly force the "tokens-in" and "tokes-out" design to avoid potential off-policy issues brought by tokenization.
We put all the reference verl-tool's custom replacement of classes and functions in verl_tool/trainer/ppo/ray_trainer.py for better maintainability. If you are trying to understand how verl-tool replaces verl's default implementations, please refer to this file.
The step records are saved via verl's native trainer.rollout_data_dir argument. (e.g. trainer.rollout_data_dir=$(pwd)/verl_step_records/$run_name ). You need to set it in your training scripts to save the rollout data.
The verl-tool now supports hybrid training with tool and without tool. When preparing the data, simply set the use_tool field in the data samples to indicate whether the sample requires tool usage. The agent loop will automatically decide whether to call the tool server based on this field.
The old verl_tool with verl 0.4.1.dev is archived in the verl-0.4.1 branch for backward compatibility.

VERLTOOL Ver.	VERL Ver.	VLLM Ver.	Modality Support	Main Code Lines
0.1.0	0.4.1	0.8.4	Text, Image, Video	~1300
0.2.0	0.6.0	0.11.0	Text, Image, Video, Audio	~500

What's Changed

add piston for multiple code execution by @erenup in #1
[feature] add multi-tool support in a single server, matching tool us… by @jdf-prog in #2
init commit of text browser; change extra_data into extra_field by @cogito233 in #3
[fix] response str length doesn't match with active masks by @jdf-prog in #5
Fix format of payload by @cogito233 in #6
feat: added information mask by @EigenTom in #7
Torl by @jdf-prog in #10
[feature] add a reliable firejail sandbox by @Zhuofeng-Li in #9
feat: reimplemented tool-call penalty and python timeout penalty, etc by @EigenTom in #11
fix reward during validation by @Zhuofeng-Li in #14
verl-tool math without server is ready by @Zhuofeng-Li in #18
feat: implemented MVP tool-use evaluation API service by @EigenTom in #19
Dev/serverdebug by @jdf-prog in #20
feat: refined acecoder data preprocessing prompts and removed extra rewards by @EigenTom in #21
code-math without server init by @Zhuofeng-Li in #22
Dev/serverdebug by @jdf-prog in #24
Dev/train by @jdf-prog in #27
fix server ``` ending problem by @Zhuofeng-Li in #28
eval service ready aligning to openAI client.chat.completions.create API by @Zhuofeng-Li in #29
Browser env dev clean by @cogito233 in #25
Modify truncate method during rollout by @cogito233 in #16
add multi node training support by @jdf-prog in #30
0425 zhiheng wiki rl debug by @cogito233 in #31
update browser by @cogito233 in #32
Dev/train by @jdf-prog in #33
merge Dev/train by @jdf-prog in #44
Update README.md by @Zhuofeng-Li in #43
chore: added READMEs for eval_service, examples and verl_tool by @EigenTom in #46
merge Dev/train to main update readme by @Zhuofeng-Li in #45
update doc and move benchmark to the top by @Zhuofeng-Li in #47
Add deepmath dataset training by @Zhuofeng-Li in #48
Add math results and ckpt on readme by @Zhuofeng-Li in #49
update readme docs by @Zhuofeng-Li in #50
Add math benchmark as submodule by @Zhuofeng-Li in #51
chore: added coding benchmarks as submodules + fix README by @EigenTom in #52
Dev/train by @jdf-prog in #53
Dev/updatedoc by @jdf-prog in #59
Dev/update verl by @jdf-prog in #67
feat: implemented and tested Google/SERP search tool by @EigenTom in #68
Dev/dapo by @jdf-prog in #69
fix some bugs in dapo by @jdf-prog in #70
[WIP] Search-R1 Adaptation & Reproduction by @EigenTom in #71
Dev/haozhe sql by @HaozheH3 in #60
[trainer] support for fallback to original async server by @rhinosaur0 in #73
chore: aded one-stop training script for search-r1 reproduction by @EigenTom in #74
Add MseeP.ai badge by @lwsinclair in #80
Fix double repeat for val n by @rlorigro in #77
Update serve.py by @cogito233 in #78
Multimodal support by @cjakfskvnad in #63
Dev/entropy by @jdf-prog in #92
update by @wenhuchen in #93
Luyi 0807 nl2sql by @jdf-prog in #94
Luyi 0807 nl2sql by @jdf-prog in #95
fix links by @bamos in #99
Refactor file writing to include pre-import libraries by @LinkCut-World in #100
Add Megatron Support by @jdf-prog in #107
adding the MCP-Universe evaluation feature to verl-tool by @ShenzheZhu in #112
multi-node support by @jdf-prog in #119
fixing small error in mcp_interface.py and change the mcp_call to tool_call by @ShenzheZhu in #118
Support audio modality for qwen2.5-omni by @Eric2i in #132
fix typo by @bingshuailiu in #135
Adding MCP Agent Tutorial by @ShenzheZhu in #138