Release v0.3.2 · modelscope/Trinity-RFT

Overview

This is a bug fix update that addresses many bugs present in versions 0.3. We recommend all users currently using versions 0.3.0 and 0.3.1 upgrade to this new version.

Buffer

Support task scheduler and selector in task dataset
Add BOTS: Online RL task selection for efficient LLM fine-tuning (paper).
Extract the PriorityFunction of the replay buffer as a customizable module.

Explorer

Update vLLM to v0.11.0
Fix logprobs, top_k, top_p, temperature mismatch when using vLLM's OpenAI API server
Fix torch cache conflicts when enforce_eager is False
Simplify Workflow Inferface

Other Modules

Optimize monitor metrics organization
Optimize and simplify Config and config manager
Add more algorithm and examples

What's Changed

Simplify trainer config by @hiyuchang in #329
Simplify Workflow Interface by @pan-x-c in #330
Update off-policy RFT documentation by @yanxi-chen in #335
Simplify Buffer and Explorer Config by @pan-x-c in #333
Update rec example by @yanxi-chen in #337
Sync the config manager with the latest codebase by @hiyuchang in #332
Add taskset scheduler by @chenyushuo in #326
Split Storage Config by @pan-x-c in #338
Fix some metrics by @hiyuchang in #339
Refactor Priority Function by @chenyushuo in #344
Bug fix in unittest by @chenyushuo in #346
[Example] Mix algorithm with VLM by @hiyuchang in #342
Add example for experience replay by @yanxi-chen in #345
Fix Bug: group_rewards Un-assigned value by @shiweijiezero in #343
Fix openai client logprobs calculation by @pan-x-c in #347
Fix openai API server setup by @pan-x-c in #348
Update readme by @yanxi-chen in #349
Update vLLM to 0.11 by @pan-x-c in #350
bug fix in plugin_loader by @chenyushuo in #354
Fix/email search by @chenyushuo in #351
Enable System Metrics Recording in MLFLOW by @hiyuchang in #362
Add generation args to ModelConfig by @chenyushuo in #357
Add std_thresholdoption to StepWiseGRPOAdvantageFn, to filter out zero-grad group samples. by @garyzhang99 in #363
Fix microbatch loss scale when loss_agg_mode is "token-mean" by @yanxi-chen in #336
Add weight_decay in OptimizerConfig by @hiyuchang in #364
Fix taskset and eval_tasksets path by @hiyuchang in #367
Fix torch compile cache dir conflicts when enforce_eager=False by @pan-x-c in #368
Examples for BOTS by @ShenQianli in #353
Release Trinity-RFT 0.3.2 by @pan-x-c in #369

New Contributors

@ShenQianli made their first contribution in #353

Full Changelog: v0.3.1...v0.3.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.3.2

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Overview

Buffer

Explorer

Other Modules

What's Changed

New Contributors

Contributors

Uh oh!