Skip to content

v0.4.0

Latest

Choose a tag to compare

@pan-x-c pan-x-c released this 30 Dec 08:32
· 8 commits to main since this release
4b27dff

Overview

⭐️ Highlights

Add Tinker backend for users without GPUs to leverage Trinity-RFT. See example for more details.

Explorer

  1. Add Tinker SamplingClient backend for users without GPUs.
  2. Support vLLM v0.12.0 (v0.10.2 ~ v0.11.0 are still supported).
  3. Add a tinker-compatible sample API to the vLLM backend.
  4. Enhance serve mode for online RL.
  5. Fix several bugs in the vLLM OpenAI API.

Trainer

  1. Add Tinker TrainingClient backend for users without GPUs.
  2. Add a switch in PPOPolicyLossFn to ignore explorer-generated logprobs.

Buffer

  1. Support staleness control, which mitigates the negative effects of excessively off-policy data.
  2. Add a Streamlit viewer to visualize the experience data.

Others

  1. Add benchmark comparisons with veRL and rLLM.
  2. Refactor registration system to avoid loading all modules during initialization.
  3. Add algorithms: SAPO, on-policy distillation.
  4. Enhance debug mode; add --module viewer to visualize experience data generated during debugging.
  5. Add SwanLab monitor.
  6. Add tutorial on aligning configuration with veRL.
  7. Add tutorial on choosing model context length based on GPU and model size.
  8. Optimize README and Sphinx docs.

🚨 Breaking Changes

  1. The schema of SQL experience buffer is changed. Experience data saved in previous version cannot be used.
  2. The registration system has been refactored. Developers no longer need to use @REGISTRY.register_module to register modules. See Developer Guide for details.
  3. Tinker requires Python >= 3.11. (For users who do not use tinker, Python 3.10 is still supported)
  4. vLLM 0.12.0 requires CUDA >= 12.9. (For users using vLLM 0.11.0 or lower, CUDA 12.8 is still supported)
  5. Refactor SampleStrategy, add kwargs to its inputs, and change the output type from Experiences to List[Experience].
  6. Experiences (not Experience) is going to be deprecated

What's Changed

New Contributors

Full Changelog: v0.3.3...v0.4.0