v0.2.1
Overview
-
Agentic RL
1.1 The rollout model can now be accessed directly via the OpenAI API, reducing migration costs.
1.2 Supports general multi-step workflows without requiring concatenated experience data.
1.3 IntroducedAddStrategyto facilitate group-based advantage/return calculations (experimental; will be integrated into the buffer module in future versions).
1.4 Added a ReAct Agent RL example based on the AgentScope framework.
1.5 Enhanced the Alfworld example into a general multi-step workflow. -
Async / Offline RL
2.1 RefactoredRunnerPooltoScheduler, enabling asynchronous scheduling and management of multiple workflow runners.
2.2 Added a priority queue buffer to reduce idling caused by speed differences betweenExplorerandTrainerthrough experience sorting and reuse.
2.3 IntroducedSynchronizerto manage model weight synchronization betweenExplorerandTrainer, supporting dynamic synchronization.
2.4 Added tutorials on using the Synchronizer. -
Add a benchmark tool for quick verification.
-
Added support for more RL algorithms (e.g., CHORD, DAPO, GSPO, RAFT).
-
Updated vllm to
0.10.0and verl to0.4.1. -
Fixed numerous bugs.
What's Changed
- Add a switch for progress bar in _HFBatchReader by @yanxi-chen in #126
- Add dapo reward by @hiyuchang in #114
- Add readme_zh by @hiyuchang in #127
- Fix a typo in readme by @hiyuchang in #128
- ModelWrapper automatically record Experience by @pan-x-c in #123
- Add continue_from_checkpoint by @hiyuchang in #129
- Merge verl v0.4.1 by @hiyuchang in #125
- Fix vllm nccl sync error by @pan-x-c in #132
- Add more unittest command by @pan-x-c in #133
- Add Step-wise Workflow by @pan-x-c in #130
- Add workflow and example for toolcall training using ToolAce dataset by @garyzhang99 in #134
- Rename data scripts for examples and refine toolcall example readme by @garyzhang99 in #137
- Add sft example by @hiyuchang in #138
- Fix
buffer.total_epochsnot working in SFT/DPO by @pan-x-c in #140 - Fix priority queue implementation and enhance testing by @pan-x-c in #135
- Update some details in tutorial by @hiyuchang in #144
- [exmaples] Updated the OPMD config. by @yaochaorui in #145
- Rollout openAI API compatible with vllm 0.8.5 by @pan-x-c in #146
- Standardize Experience and Sample Strategy by @pan-x-c in #141
- Add
fused_kernel_optionsby @chenyushuo in #150 - Fix MATH readme by @hiyuchang in #151
- Calculate advantage in Explorer by @pan-x-c in #148
- Add
Synchronizerby @chenyushuo in #131 - Add run_id for single-turn workflows by @hiyuchang in #152
- Bug fix for
Schedulerandtorch.tensorby @chenyushuo in #156 - Add Step-wise GRPO Advantage by @pan-x-c in #153
- Fix a bug in args_pass by @hiyuchang in #155
- Add decoupled evaluation workflow by @lingzhq in #142
- Add some training tricks for RLVR by @hiyuchang in #147
- GSPO-token policy loss function by @nkkarpov in #154
- Add tool call usage from our vllm model by @garyzhang99 in #161
- Refactor
Trainer.trainto async function by @chenyushuo in #164 - Distinguish repeatable/non-repeatable workflows by @hiyuchang in #162
- Add auto release for
synchronizerby @chenyushuo in #166 - Fix multi-turn logprobs by @pan-x-c in #170
- Bug fix in Synchronizer by @chenyushuo in #171
- Update vLLM to 0.10.0 and add
max_model_lenby @hiyuchang in #172 - Add agentscope react multi-turn toolcalls example by @garyzhang99 in #165
- Add step-wise workflow test by @hiyuchang in #173
- Add MLFlow monitor by @pan-x-c in #179
- [Feat] Allow user to set
train_batch_sizeby @hiyuchang in #177 - [example] Alfworld with General Multi-Step Workflow by @hiyuchang in #169
- feat: add RAFT alfworld example with reflection support by @shiweijiezero in #174
- Add general multi-step figure by @hiyuchang in #186
- Add benchmark by @chenyushuo in #178
- Fix
custom_fieldsin experiences by @pan-x-c in #191 - Add Document for Synchronizer by @chenyushuo in #190
- Bug fix in load_plugins and explorer by @chenyushuo in #193
- Add CHORD algorithm example by @garyzhang99 in #194
- Fix problem in math_mix config by @garyzhang99 in #196
- Fix plugin_loader by @pan-x-c in #201
- Add unittest for mix by @hiyuchang in #200
- Set truncate_prompt_tokens in SamplingParams, silently truncating very large prompts and preventing vllm from throwing exception by @vadimkantorov in #198
- Support multi-version docs by @pan-x-c in #203
- Fix/fix agentscope tools example docs by @garyzhang99 in #205
- Auto-set pad_token_id when the default is None and not set in the buffer config. by @yaochaorui in #188
- Fix agentscope react example readme by @garyzhang99 in #206
- Add
max_prompt_tokensby @chenyushuo in #202 - Release v0.2.1 by @pan-x-c in #208
New Contributors
- @yaochaorui made their first contribution in #145
- @nkkarpov made their first contribution in #154
- @vadimkantorov made their first contribution in #198
Full Changelog: v0.2.0...v0.2.1