Release v0.1.1 Release · opendilab/LightRFT

This release expands LightRFT into the video domain, optimizes evaluation pipelines for reward models, and ensures compatibility with the latest upstream inference and training libraries (specifically sglang).

✨ Multimodal & Advanced Training

Video Support: Expanded multimodal capabilities with the addition of Video reinforcement fine-tuning (#4).
Enhanced Evaluation: Implemented and optimized evaluation logic for SRM (Step-wise Reward Model) and GRM (Generative Reward Model) trainers.
Exploration: Added a high entropy token selection mechanism to improve generation diversity during training (#6).

📊 Benchmarks & Metrics

New Eval Datasets: Added support for AIME24/25 and GPQA Diamond evaluations (#27), facilitating stronger reasoning capability assessment.
Trajectory Metrics: Added comprehensive analysis metrics to saved trajectories for deeper insight into training dynamics (#5).
T2I Benchmarks: Updated GRM on T2I (Text-to-Image) benchmark results and analysis in best practices.

⚙️ Core Optimization & Compatibility

Library Updates: Adapted framework to support the latest versions of sglang (#24) and ensured compatibility with upstream ecosystem updates.
Transformers Compatibility: Standardized configuration by renaming dtype to torch_dtype for seamless integration with Hugging Face Transformers.
Code Polish: Removed redundant tuple nesting in prepare_reward_model return values and polished return logic.

🐛 Bug Fixes

Video Metadata: Resolved compatibility bugs related to video metadata handling (#25).
GRM Formatting: Fixed bugs in GRM dataset message formatting and evaluation logic.
General Fixes: Fixed file path handling bugs and resolved various linting/style issues (flake8, yapf).

📚 Documentation & Community

Community Health: Added standard Issue and PR templates to streamline contributions (#20).
Docs Automation: Established automated documentation deployment actions (#18).
API Docs: Polished API docstrings and Python typing for better developer experience (#21).

Full Changelog: v0.1.0...v0.1.1

Contributors: OpenDILab, System Platform Center and Safe and Trustworthy AI Center at Shanghai AI Laboratory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.1 Release

Choose a tag to compare

Sorry, something went wrong.