v0.3.2
Overview
This is a bug fix update that addresses many bugs present in versions 0.3. We recommend all users currently using versions 0.3.0 and 0.3.1 upgrade to this new version.
Buffer
- Support task scheduler and selector in task dataset
- Add BOTS: Online RL task selection for efficient LLM fine-tuning (paper).
- Extract the
PriorityFunctionof the replay buffer as a customizable module.
Explorer
- Update vLLM to v0.11.0
- Fix
logprobs,top_k,top_p,temperaturemismatch when using vLLM's OpenAI API server - Fix torch cache conflicts when
enforce_eagerisFalse - Simplify Workflow Inferface
Other Modules
- Optimize monitor metrics organization
- Optimize and simplify Config and config manager
- Add more algorithm and examples
What's Changed
- Simplify trainer config by @hiyuchang in #329
- Simplify Workflow Interface by @pan-x-c in #330
- Update off-policy RFT documentation by @yanxi-chen in #335
- Simplify Buffer and Explorer Config by @pan-x-c in #333
- Update rec example by @yanxi-chen in #337
- Sync the config manager with the latest codebase by @hiyuchang in #332
- Add taskset scheduler by @chenyushuo in #326
- Split Storage Config by @pan-x-c in #338
- Fix some metrics by @hiyuchang in #339
- Refactor Priority Function by @chenyushuo in #344
- Bug fix in unittest by @chenyushuo in #346
- [Example] Mix algorithm with VLM by @hiyuchang in #342
- Add example for experience replay by @yanxi-chen in #345
- Fix Bug: group_rewards Un-assigned value by @shiweijiezero in #343
- Fix openai client logprobs calculation by @pan-x-c in #347
- Fix openai API server setup by @pan-x-c in #348
- Update readme by @yanxi-chen in #349
- Update vLLM to 0.11 by @pan-x-c in #350
- bug fix in plugin_loader by @chenyushuo in #354
- Fix/email search by @chenyushuo in #351
- Enable System Metrics Recording in MLFLOW by @hiyuchang in #362
- Add generation args to ModelConfig by @chenyushuo in #357
- Add
std_thresholdoption toStepWiseGRPOAdvantageFn, to filter out zero-grad group samples. by @garyzhang99 in #363 - Fix microbatch loss scale when loss_agg_mode is "token-mean" by @yanxi-chen in #336
- Add
weight_decayin OptimizerConfig by @hiyuchang in #364 - Fix taskset and eval_tasksets path by @hiyuchang in #367
- Fix torch compile cache dir conflicts when
enforce_eager=Falseby @pan-x-c in #368 - Examples for BOTS by @ShenQianli in #353
- Release Trinity-RFT 0.3.2 by @pan-x-c in #369
New Contributors
- @ShenQianli made their first contribution in #353
Full Changelog: v0.3.1...v0.3.2