Skip to content

Releases: opendilab/LightRFT

v0.1.1 Release

23 Jan 07:02

Choose a tag to compare

This release expands LightRFT into the video domain, optimizes evaluation pipelines for reward models, and ensures compatibility with the latest upstream inference and training libraries (specifically sglang).

✨ Multimodal & Advanced Training

  • Video Support: Expanded multimodal capabilities with the addition of Video reinforcement fine-tuning (#4).
  • Enhanced Evaluation: Implemented and optimized evaluation logic for SRM (Step-wise Reward Model) and GRM (Generative Reward Model) trainers.
  • Exploration: Added a high entropy token selection mechanism to improve generation diversity during training (#6).

📊 Benchmarks & Metrics

  • New Eval Datasets: Added support for AIME24/25 and GPQA Diamond evaluations (#27), facilitating stronger reasoning capability assessment.
  • Trajectory Metrics: Added comprehensive analysis metrics to saved trajectories for deeper insight into training dynamics (#5).
  • T2I Benchmarks: Updated GRM on T2I (Text-to-Image) benchmark results and analysis in best practices.

⚙️ Core Optimization & Compatibility

  • Library Updates: Adapted framework to support the latest versions of sglang (#24) and ensured compatibility with upstream ecosystem updates.
  • Transformers Compatibility: Standardized configuration by renaming dtype to torch_dtype for seamless integration with Hugging Face Transformers.
  • Code Polish: Removed redundant tuple nesting in prepare_reward_model return values and polished return logic.

🐛 Bug Fixes

  • Video Metadata: Resolved compatibility bugs related to video metadata handling (#25).
  • GRM Formatting: Fixed bugs in GRM dataset message formatting and evaluation logic.
  • General Fixes: Fixed file path handling bugs and resolved various linting/style issues (flake8, yapf).

📚 Documentation & Community

  • Community Health: Added standard Issue and PR templates to streamline contributions (#20).
  • Docs Automation: Established automated documentation deployment actions (#18).
  • API Docs: Polished API docstrings and Python typing for better developer experience (#21).

Full Changelog: v0.1.0...v0.1.1

Contributors: OpenDILab, System Platform Center and Safe and Trustworthy AI Center at Shanghai AI Laboratory.

v0.1.0: Initial Release

31 Dec 04:50

Choose a tag to compare

This is the initial release of LightRFT, a light, efficient, omni-modal & reward-model driven reinforcement fine-tuning framework.

🧠 Rich Algorithm Ecosystem

  • Implemented PPO and GRPO algorithms for Large Language Models.
  • Added comprehensive interfaces for Reward Model training and inference.

🎯 Innovative Resource Collaboration

  • Introduced "Colocate Anything" strategy to maximize GPU memory efficiency by colocating Actor, Critic, and Reward models.

🔧 Flexible Training Strategies

  • Integrated DeepSpeed ZeRO and FSDP for scalable distributed training.
  • Added PEFT (LoRA) integration for lightweight fine-tuning.

🌐 Environments & Models

  • Added support for GSM8K (Math reasoning) and Geo3K (Multimodal) environments and datasets.
  • Enabled support for Qwen and DeepSeek model families.

📚 Documentation & Toolkit

  • Integrated Weights & Biases (W&B) for training metric logging.
  • Released initial Quick Start guide, architecture overview, and reproduction scripts.

Full Changelog: https://github.com/opendilab/LightRFT/commits/v0.1.0

Contributors: OpenDILab, System Platform Center and Safe and Trustworthy AI Center at Shanghai AI Laboratory.