Overview
- Enhanced support for multi-modal models (including Qwen2.5 VL, Qwen3 VL and Kimi-VL-A3B-Thinking series)
- Refactored
trinitycommand line interface usingtyper - Added a log management tool and fixed bugs in the logging system.
- Added Jensen-Shannon Divergence for on-policy distillation.
- Fixed bugs in model weight synchronization and over-rollout.
What's Changed
- Update algorithm List in README by @pan-x-c in #498
- Refactor Launcher with Typer by @pan-x-c in #502
- Fix memory resume by @hiyuchang in #505
- Fix logger in debug mode by @pan-x-c in #504
- Fix Logger in Workflow by @pan-x-c in #506
- Fix over_rollout by @luyi256 in #500
- Enhance support for VL models by @chenyushuo in #501
- jsd implement for opd by @kokolerk in #499
- Add log manager to track experiement logs by @pan-x-c in #507
New Contributors
Full Changelog: v0.5.0...v0.5.1