This release expands LightRFT into the video domain, optimizes evaluation pipelines for reward models, and ensures compatibility with the latest upstream inference and training libraries (specifically sglang).
✨ Multimodal & Advanced Training
- Video Support: Expanded multimodal capabilities with the addition of Video reinforcement fine-tuning (#4).
- Enhanced Evaluation: Implemented and optimized evaluation logic for SRM (Step-wise Reward Model) and GRM (Generative Reward Model) trainers.
- Exploration: Added a high entropy token selection mechanism to improve generation diversity during training (#6).
📊 Benchmarks & Metrics
- New Eval Datasets: Added support for AIME24/25 and GPQA Diamond evaluations (#27), facilitating stronger reasoning capability assessment.
- Trajectory Metrics: Added comprehensive analysis metrics to saved trajectories for deeper insight into training dynamics (#5).
- T2I Benchmarks: Updated GRM on T2I (Text-to-Image) benchmark results and analysis in best practices.
⚙️ Core Optimization & Compatibility
- Library Updates: Adapted framework to support the latest versions of sglang (#24) and ensured compatibility with upstream ecosystem updates.
- Transformers Compatibility: Standardized configuration by renaming
dtypetotorch_dtypefor seamless integration with Hugging Face Transformers. - Code Polish: Removed redundant tuple nesting in
prepare_reward_modelreturn values and polished return logic.
🐛 Bug Fixes
- Video Metadata: Resolved compatibility bugs related to video metadata handling (#25).
- GRM Formatting: Fixed bugs in GRM dataset message formatting and evaluation logic.
- General Fixes: Fixed file path handling bugs and resolved various linting/style issues (flake8, yapf).
📚 Documentation & Community
- Community Health: Added standard Issue and PR templates to streamline contributions (#20).
- Docs Automation: Established automated documentation deployment actions (#18).
- API Docs: Polished API docstrings and Python typing for better developer experience (#21).
Full Changelog: v0.1.0...v0.1.1
Contributors: OpenDILab, System Platform Center and Safe and Trustworthy AI Center at Shanghai AI Laboratory.