Releases: TIGER-AI-Lab/verl-tool
Releases · TIGER-AI-Lab/verl-tool
Release of v0.2.0
Summary of Change
In this version, we have updated the verl version to 0.6.0 and vllm to 0.11.0 to support more models and incorporate latest features of verl. It's able to train Qwen-3-VL, Qwen3-Omni with your customized tools.
Below are the key changes and instructions for upgrading your existing VerlTool setup to be compatible with these new versions.
verl-tool's codebase has been completely re-organized. Thanks to the verl's agent loop abstraction design, we are able to put all theverl-tool's agentic logic in a single fileverl_tool/agent_loop/verltool_agent_loop.py, with the main agent loop logic less than 200 lines of code. This greatly improves the modularity and maintainability of the codebase. Please refer to the new code structure when making any custom modifications.verl-toolkeeps its support for both text-only LLMs and multi-modal models training, withmath_tirandpixel_reasoneras examples correspondingly.- We strictly force the "tokens-in" and "tokes-out" design to avoid potential off-policy issues brought by tokenization.
- We put all the reference
verl-tool's custom replacement of classes and functions inverl_tool/trainer/ppo/ray_trainer.pyfor better maintainability. If you are trying to understand howverl-toolreplaces verl's default implementations, please refer to this file. - The step records are saved via verl's native
trainer.rollout_data_dirargument. (e.g.trainer.rollout_data_dir=$(pwd)/verl_step_records/$run_name). You need to set it in your training scripts to save the rollout data. - The verl-tool now supports hybrid training with tool and without tool. When preparing the data, simply set the
use_toolfield in the data samples to indicate whether the sample requires tool usage. The agent loop will automatically decide whether to call the tool server based on this field. - The old
verl_toolwith verl0.4.1.devis archived in theverl-0.4.1branch for backward compatibility.
| VERLTOOL Ver. | VERL Ver. | VLLM Ver. | Modality Support | Main Code Lines |
|---|---|---|---|---|
| 0.1.0 | 0.4.1 | 0.8.4 | Text, Image, Video | ~1300 |
| 0.2.0 | 0.6.0 | 0.11.0 | Text, Image, Video, Audio | ~500 |
What's Changed
- add piston for multiple code execution by @erenup in #1
- [feature] add multi-tool support in a single server, matching tool us… by @jdf-prog in #2
- init commit of text browser; change extra_data into extra_field by @cogito233 in #3
- [fix] response str length doesn't match with active masks by @jdf-prog in #5
- Fix format of payload by @cogito233 in #6
- feat: added information mask by @EigenTom in #7
- Torl by @jdf-prog in #10
- [feature] add a reliable firejail sandbox by @Zhuofeng-Li in #9
- feat: reimplemented tool-call penalty and python timeout penalty, etc by @EigenTom in #11
- fix reward during validation by @Zhuofeng-Li in #14
- verl-tool math without server is ready by @Zhuofeng-Li in #18
- feat: implemented MVP tool-use evaluation API service by @EigenTom in #19
- Dev/serverdebug by @jdf-prog in #20
- feat: refined acecoder data preprocessing prompts and removed extra rewards by @EigenTom in #21
- code-math without server init by @Zhuofeng-Li in #22
- Dev/serverdebug by @jdf-prog in #24
- Dev/train by @jdf-prog in #27
- fix server ``` ending problem by @Zhuofeng-Li in #28
- eval service ready aligning to openAI
client.chat.completions.createAPI by @Zhuofeng-Li in #29 - Browser env dev clean by @cogito233 in #25
- Modify truncate method during rollout by @cogito233 in #16
- add multi node training support by @jdf-prog in #30
- 0425 zhiheng wiki rl debug by @cogito233 in #31
- update browser by @cogito233 in #32
- Dev/train by @jdf-prog in #33
- merge Dev/train by @jdf-prog in #44
- Update README.md by @Zhuofeng-Li in #43
- chore: added READMEs for eval_service, examples and verl_tool by @EigenTom in #46
- merge Dev/train to main update readme by @Zhuofeng-Li in #45
- update doc and move benchmark to the top by @Zhuofeng-Li in #47
- Add deepmath dataset training by @Zhuofeng-Li in #48
- Add math results and ckpt on readme by @Zhuofeng-Li in #49
- update readme docs by @Zhuofeng-Li in #50
- Add math benchmark as submodule by @Zhuofeng-Li in #51
- chore: added coding benchmarks as submodules + fix README by @EigenTom in #52
- Dev/train by @jdf-prog in #53
- Dev/updatedoc by @jdf-prog in #59
- Dev/update verl by @jdf-prog in #67
- feat: implemented and tested Google/SERP search tool by @EigenTom in #68
- Dev/dapo by @jdf-prog in #69
- fix some bugs in dapo by @jdf-prog in #70
- [WIP] Search-R1 Adaptation & Reproduction by @EigenTom in #71
- Dev/haozhe sql by @HaozheH3 in #60
- [trainer] support for fallback to original async server by @rhinosaur0 in #73
- chore: aded one-stop training script for search-r1 reproduction by @EigenTom in #74
- Add MseeP.ai badge by @lwsinclair in #80
- Fix double repeat for val n by @rlorigro in #77
- Update serve.py by @cogito233 in #78
- Multimodal support by @cjakfskvnad in #63
- Dev/entropy by @jdf-prog in #92
- update by @wenhuchen in #93
- Luyi 0807 nl2sql by @jdf-prog in #94
- Luyi 0807 nl2sql by @jdf-prog in #95
- fix links by @bamos in #99
- Refactor file writing to include pre-import libraries by @LinkCut-World in #100
- Add Megatron Support by @jdf-prog in #107
- adding the MCP-Universe evaluation feature to verl-tool by @ShenzheZhu in #112
- multi-node support by @jdf-prog in #119
- fixing small error in mcp_interface.py and change the mcp_call to tool_call by @ShenzheZhu in #118
- Support audio modality for qwen2.5-omni by @Eric2i in #132
- fix typo by @bingshuailiu in #135
- Adding MCP Agent Tutorial by @ShenzheZhu in #138
New Contributors
- @erenup made their first contribution in #1
- @jdf-prog made their first contribution in #2
- @cogito233 made their first contribution in #3
- @HaozheH3 made their first contribution in #60
- @rhinosaur0 made their first contribution in #73
- @lwsinclair made their first contribution in #80
- @rlorigro made their first contribution in #77
- @cjakfskvnad made their first contribution in #63
- @wenhuchen made their fi...
Release of VerlTool v0.1.0
This release is to archive existing verl-tool codebase. The new re-organized and verl-tool lite vesrion will come later.