Releases: opendilab/LightRFT
Releases · opendilab/LightRFT
v0.1.1 Release
This release expands LightRFT into the video domain, optimizes evaluation pipelines for reward models, and ensures compatibility with the latest upstream inference and training libraries (specifically sglang).
✨ Multimodal & Advanced Training
- Video Support: Expanded multimodal capabilities with the addition of Video reinforcement fine-tuning (#4).
- Enhanced Evaluation: Implemented and optimized evaluation logic for SRM (Step-wise Reward Model) and GRM (Generative Reward Model) trainers.
- Exploration: Added a high entropy token selection mechanism to improve generation diversity during training (#6).
📊 Benchmarks & Metrics
- New Eval Datasets: Added support for AIME24/25 and GPQA Diamond evaluations (#27), facilitating stronger reasoning capability assessment.
- Trajectory Metrics: Added comprehensive analysis metrics to saved trajectories for deeper insight into training dynamics (#5).
- T2I Benchmarks: Updated GRM on T2I (Text-to-Image) benchmark results and analysis in best practices.
⚙️ Core Optimization & Compatibility
- Library Updates: Adapted framework to support the latest versions of sglang (#24) and ensured compatibility with upstream ecosystem updates.
- Transformers Compatibility: Standardized configuration by renaming
dtypetotorch_dtypefor seamless integration with Hugging Face Transformers. - Code Polish: Removed redundant tuple nesting in
prepare_reward_modelreturn values and polished return logic.
🐛 Bug Fixes
- Video Metadata: Resolved compatibility bugs related to video metadata handling (#25).
- GRM Formatting: Fixed bugs in GRM dataset message formatting and evaluation logic.
- General Fixes: Fixed file path handling bugs and resolved various linting/style issues (flake8, yapf).
📚 Documentation & Community
- Community Health: Added standard Issue and PR templates to streamline contributions (#20).
- Docs Automation: Established automated documentation deployment actions (#18).
- API Docs: Polished API docstrings and Python typing for better developer experience (#21).
Full Changelog: v0.1.0...v0.1.1
Contributors: OpenDILab, System Platform Center and Safe and Trustworthy AI Center at Shanghai AI Laboratory.
v0.1.0: Initial Release
This is the initial release of LightRFT, a light, efficient, omni-modal & reward-model driven reinforcement fine-tuning framework.
🧠 Rich Algorithm Ecosystem
- Implemented PPO and GRPO algorithms for Large Language Models.
- Added comprehensive interfaces for Reward Model training and inference.
🎯 Innovative Resource Collaboration
- Introduced "Colocate Anything" strategy to maximize GPU memory efficiency by colocating Actor, Critic, and Reward models.
🔧 Flexible Training Strategies
- Integrated DeepSpeed ZeRO and FSDP for scalable distributed training.
- Added PEFT (LoRA) integration for lightweight fine-tuning.
🌐 Environments & Models
- Added support for GSM8K (Math reasoning) and Geo3K (Multimodal) environments and datasets.
- Enabled support for Qwen and DeepSeek model families.
📚 Documentation & Toolkit
- Integrated Weights & Biases (W&B) for training metric logging.
- Released initial Quick Start guide, architecture overview, and reproduction scripts.
Full Changelog: https://github.com/opendilab/LightRFT/commits/v0.1.0
Contributors: OpenDILab, System Platform Center and Safe and Trustworthy AI Center at Shanghai AI Laboratory.