Releases · opendilab/LightRFT

This release expands LightRFT into the video domain, optimizes evaluation pipelines for reward models, and ensures compatibility with the latest upstream inference and training libraries (specifically sglang).

✨ Multimodal & Advanced Training

Video Support: Expanded multimodal capabilities with the addition of Video reinforcement fine-tuning (#4).
Enhanced Evaluation: Implemented and optimized evaluation logic for SRM (Step-wise Reward Model) and GRM (Generative Reward Model) trainers.
Exploration: Added a high entropy token selection mechanism to improve generation diversity during training (#6).

📊 Benchmarks & Metrics

New Eval Datasets: Added support for AIME24/25 and GPQA Diamond evaluations (#27), facilitating stronger reasoning capability assessment.
Trajectory Metrics: Added comprehensive analysis metrics to saved trajectories for deeper insight into training dynamics (#5).
T2I Benchmarks: Updated GRM on T2I (Text-to-Image) benchmark results and analysis in best practices.

⚙️ Core Optimization & Compatibility

Library Updates: Adapted framework to support the latest versions of sglang (#24) and ensured compatibility with upstream ecosystem updates.
Transformers Compatibility: Standardized configuration by renaming dtype to torch_dtype for seamless integration with Hugging Face Transformers.
Code Polish: Removed redundant tuple nesting in prepare_reward_model return values and polished return logic.

🐛 Bug Fixes

Video Metadata: Resolved compatibility bugs related to video metadata handling (#25).
GRM Formatting: Fixed bugs in GRM dataset message formatting and evaluation logic.
General Fixes: Fixed file path handling bugs and resolved various linting/style issues (flake8, yapf).

📚 Documentation & Community

Community Health: Added standard Issue and PR templates to streamline contributions (#20).
Docs Automation: Established automated documentation deployment actions (#18).
API Docs: Polished API docstrings and Python typing for better developer experience (#21).

Full Changelog: v0.1.0...v0.1.1

Contributors: OpenDILab, System Platform Center and Safe and Trustworthy AI Center at Shanghai AI Laboratory.

This is the initial release of LightRFT, a light, efficient, omni-modal & reward-model driven reinforcement fine-tuning framework.

🧠 Rich Algorithm Ecosystem

Implemented PPO and GRPO algorithms for Large Language Models.
Added comprehensive interfaces for Reward Model training and inference.

🎯 Innovative Resource Collaboration

Introduced "Colocate Anything" strategy to maximize GPU memory efficiency by colocating Actor, Critic, and Reward models.

🔧 Flexible Training Strategies

Integrated DeepSpeed ZeRO and FSDP for scalable distributed training.
Added PEFT (LoRA) integration for lightweight fine-tuning.

🌐 Environments & Models

Added support for GSM8K (Math reasoning) and Geo3K (Multimodal) environments and datasets.
Enabled support for Qwen and DeepSeek model families.

📚 Documentation & Toolkit

Integrated Weights & Biases (W&B) for training metric logging.
Released initial Quick Start guide, architecture overview, and reproduction scripts.

Full Changelog: https://github.com/opendilab/LightRFT/commits/v0.1.0

Contributors: OpenDILab, System Platform Center and Safe and Trustworthy AI Center at Shanghai AI Laboratory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

✨ Multimodal & Advanced Training

📊 Benchmarks & Metrics

⚙️ Core Optimization & Compatibility

🐛 Bug Fixes

📚 Documentation & Community

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

🧠 Rich Algorithm Ecosystem

🎯 Innovative Resource Collaboration

🔧 Flexible Training Strategies

🌐 Environments & Models

📚 Documentation & Toolkit

Uh oh!

Releases: opendilab/LightRFT

v0.1.1 Release

✨ Multimodal & Advanced Training

📊 Benchmarks & Metrics

⚙️ Core Optimization & Compatibility

🐛 Bug Fixes

📚 Documentation & Community

Uh oh!

v0.1.0: Initial Release

🧠 Rich Algorithm Ecosystem

🎯 Innovative Resource Collaboration

🔧 Flexible Training Strategies

🌐 Environments & Models

📚 Documentation & Toolkit

Uh oh!