From fade3665283928917b4d6d868a33e83d7b7c57ce Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Sun, 11 Jan 2026 01:43:03 +0530 Subject: [PATCH 01/23] docs: add CPU-only quick start and improve beginner documentation --- README.md | 235 +++++++++++++++++++++--------------------------------- 1 file changed, 89 insertions(+), 146 deletions(-) diff --git a/README.md b/README.md index 5a7255ebb5..80f28c622f 100644 --- a/README.md +++ b/README.md @@ -4,10 +4,8 @@ Trinity-RFT -

Trinity-RFT: A General-Purpose and Unified Framework for
Reinforcement Fine-Tuning of Large Language Models

-
[![paper](http://img.shields.io/badge/cs.LG-2505.17826-B31B1B?logo=arxiv&logoColor=red)](https://arxiv.org/abs/2505.17826) @@ -19,27 +17,19 @@ ## 💡 What is Trinity-RFT? - -Trinity-RFT is a general-purpose, flexible and user-friendly framework for LLM reinforcement fine-tuning (RFT). +Trinity-RFT is a general-purpose, flexible and user-friendly framework for LLM reinforcement fine-tuning (RFT). It decouples RFT into three components that work in coordination: * **Explorer** generates experience data via agent-environment interaction; - * **Trainer** updates model weights by minimizing losses on the data; - * **Buffer** pipelines data processing throughout the RFT lifecycle. - Trinity-RFT provides functionalities for users with different backgrounds and objectives: * 🤖 **Agent application developers:** Train LLM-powered agents and improve their capabilities in specific domains [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html) - * 🧠 **Reinforcement learning researchers:** Design, implement and validate new RL algorithms using compact, plug-and-play modules that allow non-invasive customization [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html) - * 📊 **Data engineers:** Create RFT datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html) - - ## 🚀 News * [2025-12] [[Release Notes]](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 released: added [Tinker](https://thinkingmachines.ai/tinker/) backend for users **without GPUs**, add more benchmarks, enhance online RL and more. @@ -64,82 +54,68 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob - ## 🔨 Tutorials and Guidelines - -| Category | Tutorial / Guideline | -| --- | ----| -| *Run diverse RFT modes* | + [Quick start: GRPO on GSM8k](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)
+ [Off-policy RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_advanced.html)
+ [Fully asynchronous RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_async_mode.html)
+ [Offline learning by DPO or SFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo.html)
+ [RFT without local GPU (Tinker Backend)](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_tinker_backend.html) | -| *Multi-step agentic RL* | + [Concatenated multi-turn workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)
+ [General multi-step workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_step_wise.html)
+ [ReAct workflow with an agent framework](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)
+ [Example: train a web-search agent](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) | -| *Full-lifecycle data pipelines* | + [Rollout task mixing and selection](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)
+ [Online task curriculum](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))
+ [Research project: learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441))
+ [Experience replay with prioritization](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)
+ [Advanced data processing & human-in-the-loop](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html) | -| *Algorithm development* | + [RL algorithm development with Trinity-RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) (📝 [paper](https://arxiv.org/pdf/2508.11408))
+ [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203))
+ Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) | -| *Benchmarks* | + [Benchmark toolkit (quick verification & experimentation)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)
+ [Guru-Math benchmark & comparison with veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)
+ [FrozenLake benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)
+ [Alfworld benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) | -| *Going deeper into Trinity-RFT* | + [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)
+ [GPU resource and training configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)
+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)
+ [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) | - +| Category | Tutorial / Guideline | +|-----------------------------------|------------------------------------------------------------------------------------------------------------------| +| *Run diverse RFT modes* | • [Quick start: GRPO on GSM8k](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)
• [Off-policy RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_advanced.html)
• [Fully asynchronous RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_async_mode.html)
• [Offline learning by DPO or SFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo.html)
• [RFT without local GPU (Tinker Backend)](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_tinker_backend.html) | +| *Multi-step agentic RL* | • [Concatenated multi-turn workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)
• [General multi-step workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_step_wise.html)
• [ReAct workflow with an agent framework](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)
• [Example: train a web-search agent](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) | +| *Full-lifecycle data pipelines* | • [Rollout task mixing and selection](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)
• [Online task curriculum](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))
• [Research project: learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441))
• [Experience replay with prioritization](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)
• [Advanced data processing & human-in-the-loop](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html) | +| *Algorithm development* | • [RL algorithm development with Trinity-RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) (📝 [paper](https://arxiv.org/pdf/2508.11408))
• [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203))
• Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) | +| *Benchmarks* | • [Benchmark toolkit (quick verification & experimentation)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)
• [Guru-Math benchmark & comparison with veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)
• [FrozenLake benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)
• [Alfworld benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) | +| *Going deeper into Trinity-RFT* | • [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)
• [GPU resource and training configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)
• [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)
• [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) | > [!NOTE] > For more tutorials, please refer to the [Trinity-RFT documentation](https://modelscope.github.io/Trinity-RFT/). - - ## 🌟 Key Features * **Flexible RFT Modes:** - Supports synchronous/asynchronous, on-policy/off-policy, and online/offline RL. - Rollout and training can run separately and scale independently across devices. - Boost sample and time efficiency by experience replay. - RFT modes supported by Trinity-RFT * **Agentic RL Support:** - Supports both concatenated and general multi-step agentic workflows. - Able to directly train agent applications developed using agent frameworks like [AgentScope](https://github.com/agentscope-ai/agentscope). - Agentic workflows * **Full-Lifecycle Data Pipelines:** - Enables pipeline processing of rollout tasks and experience samples. - Active data management (prioritization, cleaning, augmentation, etc.) throughout the RFT lifecycle. - Native support for multi-task joint learning and online task curriculum construction. - Data pipeline design * **User-Friendly Design:** - Plug-and-play modules and decoupled architecture, facilitating easy adoption and development. - Rich graphical user interfaces enable low-code usage. - System architecture - - ## 🔧 Supported Algorithms -We list some algorithms supported by Trinity-RFT in the following table. For more details, the concrete configurations are shown in the [Algorithm module](https://github.com/modelscope/Trinity-RFT/blob/main/trinity/algorithm/algorithm.py). You can also set up new algorithms by customizing different components, see [tutorial](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html). - -| Algorithm | Doc / Example | Source Code | Key Configurations | -|:-----------|:-----------|:---------------|:-----------| -| PPO [[Paper](https://arxiv.org/pdf/1707.06347)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[Countdown Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/ppo_policy_loss.py)] | `algorithm_type: ppo` | -| GRPO [[Paper](https://arxiv.org/pdf/2402.03300)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k)]| [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/grpo_advantage.py)] | `algorithm_type: grpo` | -| CHORD 💡 [[Paper](https://arxiv.org/pdf/2508.11408)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)] [[ToolACE Example](https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/chord_policy_loss.py)] | `algorithm_type: mix_chord` | -| REC Series 💡 [[Paper](https://arxiv.org/pdf/2509.24203)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/rec_policy_loss.py)] | `algorithm_type: rec` | -| RLOO [[Paper](https://arxiv.org/pdf/2402.14740)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/rloo_advantage.py)] | `algorithm_type: rloo` | -| REINFORCE++ [[Paper](https://arxiv.org/pdf/2501.03262)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/reinforce_advantage.py)] | `algorithm_type: reinforceplusplus` | -| GSPO [[Paper](https://arxiv.org/pdf/2507.18071)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/gspo_policy_loss.py)] | `algorithm_type: gspo` | -| TOPR [[Paper](https://arxiv.org/pdf/2503.14286)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/topr_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/topr_policy_loss.py)] | `algorithm_type: topr` | -| sPPO [[Paper](https://arxiv.org/pdf/2108.05828)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/sppo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sppo_loss_fn.py)] | `algorithm_type: sppo` | -| AsymRE [[Paper](https://arxiv.org/pdf/2506.20520)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py)] | `algorithm_type: asymre` | -| CISPO [[Paper](https://arxiv.org/pdf/2506.13585)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py)] | `algorithm_type: cispo` | -| SAPO [[Paper](https://arxiv.org/pdf/2511.20347)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py)] | `algorithm_type: sapo` | -| On-Policy Distillation [[Blog](https://thinkingmachines.ai/blog/on-policy-distillation/)] [[Paper](https://arxiv.org/pdf/2306.13649)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/on_policy_distill)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/common/workflows/on_policy_distill_workflow.py)] | `algorithm_type: on_policy_distill` | - - +| Algorithm | Doc / Example | Source Code | Key Configurations | +|------------------------|-------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|--------------------------------| +| PPO | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[Countdown Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/ppo_policy_loss.py)] | `algorithm_type: ppo` | +| GRPO | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/grpo_advantage.py)] | `algorithm_type: grpo` | +| CHORD 💡 | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)] [[ToolACE Example](https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/chord_policy_loss.py)] | `algorithm_type: mix_chord` | +| REC Series 💡 | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/rec_policy_loss.py)] | `algorithm_type: rec` | +| RLOO | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/rloo_advantage.py)] | `algorithm_type: rloo` | +| REINFORCE++ | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/reinforce_advantage.py)] | `algorithm_type: reinforceplusplus` | +| GSPO | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/gspo_policy_loss.py)] | `algorithm_type: gspo` | +| TOPR | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/topr_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/topr_policy_loss.py)] | `algorithm_type: topr` | +| sPPO | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/sppo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sppo_loss_fn.py)] | `algorithm_type: sppo` | +| AsymRE | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py)] | `algorithm_type: asymre` | +| CISPO | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py)] | `algorithm_type: cispo` | +| SAPO | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py)] | `algorithm_type: sapo` | +| On-Policy Distillation | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/on_policy_distill)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/common/workflows/on_policy_distill_workflow.py)] | `algorithm_type: on_policy_distill` | --- ## Table of Contents - [Quick Start](#quick-start) + - [Minimal CPU-Only Quick Start](#minimal-cpu-only-quick-start) - [Step 1: installation](#step-1-installation) - [Step 2: prepare dataset and model](#step-2-prepare-dataset-and-model) - [Step 3: configurations](#step-3-configurations) @@ -148,51 +124,72 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor - [Acknowledgements](#acknowledgements) - [Citation](#citation) - +--- ## Quick Start - > [!NOTE] > This project is currently under active development. Comments and suggestions are welcome! -> -> **No GPU? No problem!** You can still try it out: -> 1. Follow the installation steps (feel free to skip GPU-specific packages like `flash-attn`) -> 2. Run the **[Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**, which is specifically designed to work on CPU-only systems. +> **No GPU? No problem!** You can still try it out using the Tinker backend: +> 1. Follow the installation steps (skip GPU-specific packages like `flash-attn`) +> 2. Run the **[Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**, which is designed for CPU-only systems. + +### Minimal CPU-Only Quick Start + +If you do not have access to a GPU, you can still try Trinity-RFT using the Tinker backend. + +```bash +# Create and activate environment +python3 -m venv .venv +source .venv/bin/activate + +# Install Trinity-RFT with CPU-only backend +pip install -e ".[tinker]" +``` + +Run a simple example: + +```bash +trinity run --config examples/tinker/tinker_example.yaml +``` + +This example is designed to run on CPU-only machines and is recommended for first-time users. ### Step 1: installation Before installing, make sure your system meets the following requirements: -- **Python**: version 3.10 to 3.12 (inclusive) -- **CUDA**: version >= 12.8 -- **GPUs**: at least 2 GPUs +* Python: version 3.10 to 3.12 (inclusive) +* CUDA: version >= 12.8 (required for GPU training) +* GPUs: at least 2 GPUs (for standard distributed training) +* CPU-only: Supported via the Tinker backend (see Minimal CPU-Only Quick Start) + +**Recommended for first-time users:** +* If you have no GPU → Use Tinker backend +* If you want simple setup → Use Docker +* If you want development & contribution → Use Conda / venv #### From Source (Recommended) If you plan to customize or contribute to Trinity-RFT, this is the best option. -##### 1. Clone the Repository +1. Clone the Repository ```bash git clone https://github.com/modelscope/Trinity-RFT cd Trinity-RFT ``` -##### 2. Set Up Environment +2. Set Up Environment + Choose one of the following options: -Choose one of the following options: - -##### Using Pre-built Docker Image (Recommended for Beginners) - -We provide a pre-built Docker image with GPU-related dependencies installed. +**Using Pre-built Docker Image (Recommended for Beginners)** ```bash docker pull ghcr.io/modelscope/trinity-rft:latest -# Run the container, replacing with your actual path docker run -it \ --gpus all \ --shm-size="64g" \ @@ -202,9 +199,7 @@ docker run -it \ ghcr.io/modelscope/trinity-rft:latest ``` -> This image has used `uv` to install all GPU-related dependencies of Trinity-RFT. The virtual environment will be automatically activated upon entering the container (you can also manually activate it via `source /opt/venv/bin/activate` if needed). You can use `uv pip install` to add extra packages as necessary. - -###### Using Conda +**Using Conda** ```bash conda create -n trinity python=3.12 @@ -212,16 +207,13 @@ conda activate trinity pip install -e ".[vllm,flash_attn]" -# If you have no GPU, comment out the line above and uncomment this instead: +# If you have no GPU: # pip install -e ".[tinker]" -# If you encounter issues when installing flash-attn, try: -# pip install flash-attn==2.8.1 --no-build-isolation - pip install -e ".[dev]" # for development like linting and debugging ``` -###### Using venv +**Using venv** ```bash python3.10 -m venv .venv @@ -229,134 +221,82 @@ source .venv/bin/activate pip install -e ".[vllm,flash_attn]" -# If you have no GPU, comment out the line above and uncomment this instead: +# If you have no GPU: # pip install -e ".[tinker]" -# If you encounter issues when installing flash-attn, try: -# pip install flash-attn==2.8.1 --no-build-isolation - -pip install -e ".[dev]" # for development like linting and debugging +pip install -e ".[dev]" ``` -###### Using `uv` - -[`uv`](https://github.com/astral-sh/uv) is a modern Python package installer. +**Using uv** ```bash uv sync --extra vllm --extra dev --extra flash_attn -# If you have no GPU, try to use Tinker instead: +# If you have no GPU: # uv sync --extra tinker --extra dev ``` #### Via PyPI -If you just want to use the package without modifying the code: - ```bash pip install trinity-rft pip install flash-attn==2.8.1 ``` -Or with `uv`: - -```bash -uv pip install trinity-rft -uv pip install flash-attn==2.8.1 -``` - -> For training with **Megatron-LM**, please refer to [Megatron-LM Backend](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_megatron.html). - ### Step 2: prepare dataset and model - Trinity-RFT supports most datasets and models from Huggingface and ModelScope. - **Prepare the model** in the local directory `$MODEL_PATH/{model_name}`: ```bash # Using Huggingface huggingface-cli download {model_name} --local-dir $MODEL_PATH/{model_name} - # Using Modelscope modelscope download {model_name} --local_dir $MODEL_PATH/{model_name} ``` -For more details about model downloading, see [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) or [ModelScope](https://modelscope.cn/docs/models/download). - - - **Prepare the dataset** in the local directory `$DATASET_PATH/{dataset_name}`: ```bash # Using Huggingface huggingface-cli download {dataset_name} --repo-type dataset --local-dir $DATASET_PATH/{dataset_name} - # Using Modelscope modelscope download --dataset {dataset_name} --local_dir $DATASET_PATH/{dataset_name} ``` -For more details about dataset downloading, see [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli#download-a-dataset-or-a-space) or [ModelScope](https://modelscope.cn/docs/datasets/download). - - - ### Step 3: configurations - Trinity-RFT provides a web interface for configuring your RFT process. > [!NOTE] > This is an experimental feature, and we will continue to improve it. - To launch the web interface for minimal configurations, you can run ```bash trinity studio --port 8080 ``` -Then you can configure your RFT process in the web page and generate a config file. You can save the config file for later use or run it directly as described in the following section. +Then you can configure your RFT process in the web page and generate a config file. -Advanced users can also edit the config file directly. +Advanced users can also edit the config file directly. We provide example config files in [`examples`](examples/). For complete GUI features, please refer to the monorepo for [Trinity-Studio](https://github.com/modelscope/Trinity-Studio). - -
- - Example: config manager GUI - -![config-manager](https://img.alicdn.com/imgextra/i1/O1CN01yhYrV01lGKchtywSH_!!6000000004791-2-tps-1480-844.png) - - -
- - - - ### Step 4: run the RFT process - Start a ray cluster: ```shell # On master node ray start --head - # On worker nodes ray start --address= ``` (Optional) You may use [Wandb](https://docs.wandb.ai/quickstart/) / [TensorBoard](https://www.tensorflow.org/tensorboard) / [MLFlow](https://mlflow.org) for better monitoring. -Please refer to [this documentation](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html#monitor-configuration) for the corresponding configurations. -For example, to log in to Wandb: - -```shell -export WANDB_API_KEY= -wandb login -``` For command-line users, run the RFT process: @@ -364,35 +304,38 @@ For command-line users, run the RFT process: trinity run --config ``` -For example, below is the command for fine-tuning Qwen2.5-1.5B-Instruct on GSM8k with GRPO: +Example — fine-tuning Qwen2.5-1.5B-Instruct on GSM8k with GRPO: -```shell +```bash trinity run --config examples/grpo_gsm8k/gsm8k.yaml ``` -For studio users, click "Run" in the web interface. - - +--- ## Contribution Guide This project is currently under active development, and we welcome contributions from the community! -See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed contribution guidelines. +We welcome contributions of all kinds, including: + +* Documentation improvements +* Example workflows +* Bug fixes and performance optimizations +If you're new to the project, documentation and example updates are a great place to start. + +See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed contribution guidelines. ## Acknowledgements This project is built upon many excellent open-source projects, including: -+ [verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training; -+ [vLLM](https://github.com/vllm-project/vllm) for LLM inference; -+ [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines; -+ [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow; -+ [Ray](https://github.com/ray-project/ray) for distributed systems; -+ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm); -+ ...... - +* verl, FSDP, Megatron-LM for LLM training +* vLLM for LLM inference +* Data-Juicer for data processing pipelines +* AgentScope for agentic workflow +* Ray for distributed systems +* RL frameworks like OpenRLHF, TRL, ChatLearn and rLLM ## Citation From 157eccd2add32240c40477412b6018fce5c03d45 Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Sun, 11 Jan 2026 01:57:34 +0530 Subject: [PATCH 02/23] docs: restore references and helpful links in README Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 80f28c622f..c3fb2a5ce8 100644 --- a/README.md +++ b/README.md @@ -96,19 +96,19 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob | Algorithm | Doc / Example | Source Code | Key Configurations | |------------------------|-------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|--------------------------------| -| PPO | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[Countdown Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/ppo_policy_loss.py)] | `algorithm_type: ppo` | -| GRPO | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/grpo_advantage.py)] | `algorithm_type: grpo` | -| CHORD 💡 | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)] [[ToolACE Example](https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/chord_policy_loss.py)] | `algorithm_type: mix_chord` | -| REC Series 💡 | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/rec_policy_loss.py)] | `algorithm_type: rec` | -| RLOO | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/rloo_advantage.py)] | `algorithm_type: rloo` | -| REINFORCE++ | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/reinforce_advantage.py)] | `algorithm_type: reinforceplusplus` | -| GSPO | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/gspo_policy_loss.py)] | `algorithm_type: gspo` | -| TOPR | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/topr_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/topr_policy_loss.py)] | `algorithm_type: topr` | -| sPPO | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/sppo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sppo_loss_fn.py)] | `algorithm_type: sppo` | -| AsymRE | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py)] | `algorithm_type: asymre` | -| CISPO | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py)] | `algorithm_type: cispo` | -| SAPO | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py)] | `algorithm_type: sapo` | -| On-Policy Distillation | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/on_policy_distill)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/common/workflows/on_policy_distill_workflow.py)] | `algorithm_type: on_policy_distill` | +| PPO [[Paper](https://arxiv.org/pdf/1707.06347)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[Countdown Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/ppo_policy_loss.py)] | `algorithm_type: ppo` | +| GRPO [[Paper](https://arxiv.org/pdf/2402.03300)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/grpo_advantage.py)] | `algorithm_type: grpo` | +| CHORD 💡 [[Paper](https://arxiv.org/pdf/2508.11408)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)] [[ToolACE Example](https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/chord_policy_loss.py)] | `algorithm_type: mix_chord` | +| REC Series 💡 [[Paper](https://arxiv.org/pdf/2509.24203)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/rec_policy_loss.py)] | `algorithm_type: rec` | +| RLOO [[Paper](https://arxiv.org/pdf/2402.14740)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/rloo_advantage.py)] | `algorithm_type: rloo` | +| REINFORCE++ [[Paper](https://arxiv.org/pdf/2501.03262)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/reinforce_advantage.py)] | `algorithm_type: reinforceplusplus` | +| GSPO [[Paper](https://arxiv.org/pdf/2507.18071)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/gspo_policy_loss.py)] | `algorithm_type: gspo` | +| TOPR [[Paper](https://arxiv.org/pdf/2503.14286)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/topr_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/topr_policy_loss.py)] | `algorithm_type: topr` | +| sPPO [[Paper](https://arxiv.org/pdf/2108.05828)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/sppo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sppo_loss_fn.py)] | `algorithm_type: sppo` | +| AsymRE [[Paper](https://arxiv.org/pdf/2506.20520)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py)] | `algorithm_type: asymre` | +| CISPO [[Paper](https://arxiv.org/pdf/2506.13585)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py)] | `algorithm_type: cispo` | +| SAPO [[Paper](https://arxiv.org/pdf/2511.20347)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py)] | `algorithm_type: sapo` | +| On-Policy Distillation [[Blog](https://thinkingmachines.ai/blog/on-policy-distillation/)] [[Paper](https://arxiv.org/pdf/2306.13649)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/on_policy_distill)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/common/workflows/on_policy_distill_workflow.py)] | `algorithm_type: on_policy_distill` | --- From 695e2a4abb647baef603c01c711c5b104480dada Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Sun, 11 Jan 2026 01:57:50 +0530 Subject: [PATCH 03/23] docs: restore references and helpful links in README Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index c3fb2a5ce8..27f49f1ecb 100644 --- a/README.md +++ b/README.md @@ -330,12 +330,12 @@ See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed contribution guidelines. This project is built upon many excellent open-source projects, including: -* verl, FSDP, Megatron-LM for LLM training -* vLLM for LLM inference -* Data-Juicer for data processing pipelines -* AgentScope for agentic workflow -* Ray for distributed systems -* RL frameworks like OpenRLHF, TRL, ChatLearn and rLLM +* [verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training +* [vLLM](https://github.com/vllm-project/vllm) for LLM inference +* [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines +* [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow +* [Ray](https://github.com/ray-project/ray) for distributed systems +* RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm) ## Citation From d5d2e8f6d8e3fe7f41400924b71193e23500ed1b Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Sun, 11 Jan 2026 01:58:01 +0530 Subject: [PATCH 04/23] Update README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 27f49f1ecb..250c2097ac 100644 --- a/README.md +++ b/README.md @@ -198,7 +198,7 @@ docker run -it \ -v :/data \ ghcr.io/modelscope/trinity-rft:latest ``` - +> This image has used `uv` to install all GPU-related dependencies of Trinity-RFT. The virtual environment will be automatically activated upon entering the container (you can also manually activate it via `source /opt/venv/bin/activate` if needed). You can use `uv pip install` to add extra packages as necessary. **Using Conda** ```bash From 266113cc185c7f85c5fb4184d28a41f20078d5b4 Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Sun, 11 Jan 2026 01:58:06 +0530 Subject: [PATCH 05/23] Update README.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 250c2097ac..e244b1432b 100644 --- a/README.md +++ b/README.md @@ -210,6 +210,8 @@ pip install -e ".[vllm,flash_attn]" # If you have no GPU: # pip install -e ".[tinker]" +# If you encounter issues when installing flash-attn, try: +# pip install flash-attn==2.8.1 --no-build-isolation pip install -e ".[dev]" # for development like linting and debugging ``` From 418b8081436670495127d2002fbb8e78a61ac308 Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Sun, 11 Jan 2026 01:58:22 +0530 Subject: [PATCH 06/23] docs: restore references and helpful links in README Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e244b1432b..73aefed6bd 100644 --- a/README.md +++ b/README.md @@ -244,7 +244,7 @@ uv sync --extra vllm --extra dev --extra flash_attn pip install trinity-rft pip install flash-attn==2.8.1 ``` - +> For training with **Megatron-LM**, please refer to [Megatron-LM Backend](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_megatron.html). ### Step 2: prepare dataset and model Trinity-RFT supports most datasets and models from Huggingface and ModelScope. From 52e8dc22a60004a7909b2bf294acd2b461b2a4a7 Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Sun, 11 Jan 2026 01:58:33 +0530 Subject: [PATCH 07/23] docs: restore references and helpful links in README Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 73aefed6bd..ac87102781 100644 --- a/README.md +++ b/README.md @@ -298,7 +298,7 @@ ray start --head ray start --address= ``` -(Optional) You may use [Wandb](https://docs.wandb.ai/quickstart/) / [TensorBoard](https://www.tensorflow.org/tensorboard) / [MLFlow](https://mlflow.org) for better monitoring. +(Optional) You may use [Wandb](https://docs.wandb.ai/quickstart/) / [TensorBoard](https://www.tensorflow.org/tensorboard) / [MLFlow](https://mlflow.org) for better monitoring. Please refer to [this documentation](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html#monitor-configuration) for the corresponding configurations. For command-line users, run the RFT process: From 438e2ab387571b5fe55c8dab1e1d5d25eabccf91 Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Sun, 11 Jan 2026 23:49:41 +0530 Subject: [PATCH 08/23] python version updated Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index ac87102781..a335d2d6da 100644 --- a/README.md +++ b/README.md @@ -141,7 +141,7 @@ If you do not have access to a GPU, you can still try Trinity-RFT using the Tink ```bash # Create and activate environment -python3 -m venv .venv +python3.10 -m venv .venv source .venv/bin/activate # Install Trinity-RFT with CPU-only backend From 2e0de4e6fcb2bdc851e1fc51da57d76b9b0ef7c2 Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Sun, 11 Jan 2026 23:50:12 +0530 Subject: [PATCH 09/23] note added Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index a335d2d6da..1ab8fc41ec 100644 --- a/README.md +++ b/README.md @@ -226,6 +226,8 @@ pip install -e ".[vllm,flash_attn]" # If you have no GPU: # pip install -e ".[tinker]" +# If you encounter issues when installing flash-attn, try: +# pip install flash-attn==2.8.1 --no-build-isolation pip install -e ".[dev]" ``` From d18189771e274742f2337b56cfd9de385b18eadd Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Mon, 12 Jan 2026 10:31:35 +0530 Subject: [PATCH 10/23] blank line added Co-authored-by: Xuchen Pan <32844285+pan-x-c@users.noreply.github.com> --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 1ab8fc41ec..0cb54d7cc1 100644 --- a/README.md +++ b/README.md @@ -199,6 +199,7 @@ docker run -it \ ghcr.io/modelscope/trinity-rft:latest ``` > This image has used `uv` to install all GPU-related dependencies of Trinity-RFT. The virtual environment will be automatically activated upon entering the container (you can also manually activate it via `source /opt/venv/bin/activate` if needed). You can use `uv pip install` to add extra packages as necessary. + **Using Conda** ```bash From 0997606dc9d68de2a299f8af1b0a52d708843d5d Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Mon, 12 Jan 2026 10:32:17 +0530 Subject: [PATCH 11/23] Update README.md Co-authored-by: Xuchen Pan <32844285+pan-x-c@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0cb54d7cc1..3231925e76 100644 --- a/README.md +++ b/README.md @@ -285,7 +285,7 @@ trinity studio --port 8080 Then you can configure your RFT process in the web page and generate a config file. -Advanced users can also edit the config file directly. +Advanced users can also edit the config file directly. We provide example config files in [`examples`](examples/). For complete GUI features, please refer to the monorepo for [Trinity-Studio](https://github.com/modelscope/Trinity-Studio). From 3b77fa4eccd4dd08de72ac8d03ab1fdec75e1dd9 Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Mon, 12 Jan 2026 10:50:36 +0530 Subject: [PATCH 12/23] Revise installation instructions and add requirements Updated installation section to improve clarity and added GPU and CPU-only support details. --- README.md | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 3231925e76..aea2726674 100644 --- a/README.md +++ b/README.md @@ -156,14 +156,21 @@ trinity run --config examples/tinker/tinker_example.yaml This example is designed to run on CPU-only machines and is recommended for first-time users. -### Step 1: installation +## Step 1: Installation Before installing, make sure your system meets the following requirements: -* Python: version 3.10 to 3.12 (inclusive) -* CUDA: version >= 12.8 (required for GPU training) -* GPUs: at least 2 GPUs (for standard distributed training) -* CPU-only: Supported via the Tinker backend (see Minimal CPU-Only Quick Start) +### GPU Requirements + +- Python: version 3.10 to 3.12 (inclusive) +- CUDA: version >= 12.8 (required for GPU training) +- GPUs: at least 2 GPUs (for standard distributed training) + +### CPU-Only Support + +- CPU-only execution is supported via the Tinker backend. +- This mode is intended for testing, development, and experimentation. + (see: Minimal CPU-Only Quick Start) **Recommended for first-time users:** From b238da3a4e9c42c2db2656d03c1eb7145da3474d Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Mon, 12 Jan 2026 10:54:42 +0530 Subject: [PATCH 13/23] Update headings in README for clarity --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index aea2726674..4dbee6d3c9 100644 --- a/README.md +++ b/README.md @@ -156,17 +156,17 @@ trinity run --config examples/tinker/tinker_example.yaml This example is designed to run on CPU-only machines and is recommended for first-time users. -## Step 1: Installation +### Step 1: Installation Before installing, make sure your system meets the following requirements: -### GPU Requirements +#### GPU Requirements - Python: version 3.10 to 3.12 (inclusive) - CUDA: version >= 12.8 (required for GPU training) - GPUs: at least 2 GPUs (for standard distributed training) -### CPU-Only Support +#### CPU-Only Support - CPU-only execution is supported via the Tinker backend. - This mode is intended for testing, development, and experimentation. From b32328b765702b2c1d8015eb25ee5fc0dd368210 Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Mon, 12 Jan 2026 11:09:39 +0530 Subject: [PATCH 14/23] Clean up formatting in README.md Removed unnecessary line breaks in the README for better readability. --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 4dbee6d3c9..565635f071 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ ## 💡 What is Trinity-RFT? -Trinity-RFT is a general-purpose, flexible and user-friendly framework for LLM reinforcement fine-tuning (RFT). +Trinity-RFT is a general-purpose, flexible and user-friendly framework for LLM reinforcement fine-tuning (RFT). It decouples RFT into three components that work in coordination: * **Explorer** generates experience data via agent-environment interaction; @@ -169,7 +169,7 @@ Before installing, make sure your system meets the following requirements: #### CPU-Only Support - CPU-only execution is supported via the Tinker backend. -- This mode is intended for testing, development, and experimentation. +- This mode is intended for testing, development, and experimentation. (see: Minimal CPU-Only Quick Start) **Recommended for first-time users:** @@ -189,7 +189,7 @@ git clone https://github.com/modelscope/Trinity-RFT cd Trinity-RFT ``` -2. Set Up Environment +2. Set Up Environment Choose one of the following options: **Using Pre-built Docker Image (Recommended for Beginners)** From 196aadcf5051d81076953c60d31b0af0f6b78d62 Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Mon, 12 Jan 2026 13:21:48 +0530 Subject: [PATCH 15/23] Update README.md Co-authored-by: Yanxi Chen <153061753+yanxi-chen@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 565635f071..0fb1364a7e 100644 --- a/README.md +++ b/README.md @@ -151,7 +151,7 @@ pip install -e ".[tinker]" Run a simple example: ```bash -trinity run --config examples/tinker/tinker_example.yaml +trinity run --config examples/tinker/tinker.yaml ``` This example is designed to run on CPU-only machines and is recommended for first-time users. From ffc5a97e024f2a5986a01240286f2c4617383757 Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Mon, 12 Jan 2026 13:22:10 +0530 Subject: [PATCH 16/23] Update README.md Co-authored-by: Yanxi Chen <153061753+yanxi-chen@users.noreply.github.com> --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 0fb1364a7e..6edf005ede 100644 --- a/README.md +++ b/README.md @@ -154,7 +154,9 @@ Run a simple example: trinity run --config examples/tinker/tinker.yaml ``` -This example is designed to run on CPU-only machines and is recommended for first-time users. +This example is designed to run on CPU-only machines. See the complete [Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker) for more details. + +To run Trinity-RFT on GPU machines instead, please follow the steps below. ### Step 1: Installation From 68997f23fc22443ada6d387fa2e98208b8251c7a Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Mon, 12 Jan 2026 13:22:27 +0530 Subject: [PATCH 17/23] Update README.md Co-authored-by: Yanxi Chen <153061753+yanxi-chen@users.noreply.github.com> --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 6edf005ede..164aff5860 100644 --- a/README.md +++ b/README.md @@ -165,8 +165,8 @@ Before installing, make sure your system meets the following requirements: #### GPU Requirements - Python: version 3.10 to 3.12 (inclusive) -- CUDA: version >= 12.8 (required for GPU training) -- GPUs: at least 2 GPUs (for standard distributed training) +- CUDA: version >= 12.8 +- GPUs: at least 2 GPUs #### CPU-Only Support From 3cfb821cd81874d17d27b0aa0dd353a37e65fb1c Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Mon, 12 Jan 2026 13:22:56 +0530 Subject: [PATCH 18/23] Update README.md Co-authored-by: Yanxi Chen <153061753+yanxi-chen@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 164aff5860..259f6a39e6 100644 --- a/README.md +++ b/README.md @@ -184,7 +184,7 @@ Before installing, make sure your system meets the following requirements: If you plan to customize or contribute to Trinity-RFT, this is the best option. -1. Clone the Repository +First, clone the repository: ```bash git clone https://github.com/modelscope/Trinity-RFT From 6612cd019f722a28d74c69e9f9ec970d47a385fc Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Mon, 12 Jan 2026 13:23:20 +0530 Subject: [PATCH 19/23] Update README.md Co-authored-by: Yanxi Chen <153061753+yanxi-chen@users.noreply.github.com> --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index 259f6a39e6..0789c065e6 100644 --- a/README.md +++ b/README.md @@ -191,8 +191,7 @@ git clone https://github.com/modelscope/Trinity-RFT cd Trinity-RFT ``` -2. Set Up Environment - Choose one of the following options: +Then, set up environment via one of the following options: **Using Pre-built Docker Image (Recommended for Beginners)** From 1e4dae8ca434152833ca312c92550f9683a24ebf Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Mon, 12 Jan 2026 13:25:08 +0530 Subject: [PATCH 20/23] Update README.md Co-authored-by: Yanxi Chen <153061753+yanxi-chen@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0789c065e6..e03712ecbd 100644 --- a/README.md +++ b/README.md @@ -337,7 +337,7 @@ We welcome contributions of all kinds, including: If you're new to the project, documentation and example updates are a great place to start. -See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed contribution guidelines. +See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed contribution guidelines, as well as our [good-first-issue list](https://github.com/modelscope/Trinity-RFT/issues/470). ## Acknowledgements From 46ffe4587f12340491380b5cedca1ea1e61c81ff Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Mon, 12 Jan 2026 14:22:32 +0530 Subject: [PATCH 21/23] Refactor README for clarity on GPU and Tinker usage Removed redundant CPU-only support section and streamlined instructions for users without a GPU. Updated installation instructions and clarified usage of Tinker backend. --- README.md | 73 +++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 52 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index e03712ecbd..9dbe055c6d 100644 --- a/README.md +++ b/README.md @@ -131,10 +131,6 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob > [!NOTE] > This project is currently under active development. Comments and suggestions are welcome! -> **No GPU? No problem!** You can still try it out using the Tinker backend: -> 1. Follow the installation steps (skip GPU-specific packages like `flash-attn`) -> 2. Run the **[Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**, which is designed for CPU-only systems. - ### Minimal CPU-Only Quick Start If you do not have access to a GPU, you can still try Trinity-RFT using the Tinker backend. @@ -168,12 +164,6 @@ Before installing, make sure your system meets the following requirements: - CUDA: version >= 12.8 - GPUs: at least 2 GPUs -#### CPU-Only Support - -- CPU-only execution is supported via the Tinker backend. -- This mode is intended for testing, development, and experimentation. - (see: Minimal CPU-Only Quick Start) - **Recommended for first-time users:** * If you have no GPU → Use Tinker backend @@ -198,6 +188,7 @@ Then, set up environment via one of the following options: ```bash docker pull ghcr.io/modelscope/trinity-rft:latest +# Run the container, replacing with your actual path docker run -it \ --gpus all \ --shm-size="64g" \ @@ -216,7 +207,7 @@ conda activate trinity pip install -e ".[vllm,flash_attn]" -# If you have no GPU: +# If you have no GPU, comment out the line above and uncomment this instead: # pip install -e ".[tinker]" # If you encounter issues when installing flash-attn, try: @@ -232,12 +223,12 @@ source .venv/bin/activate pip install -e ".[vllm,flash_attn]" -# If you have no GPU: +# If you have no GPU, comment out the line above and uncomment this instead: # pip install -e ".[tinker]" # If you encounter issues when installing flash-attn, try: # pip install flash-attn==2.8.1 --no-build-isolation -pip install -e ".[dev]" +pip install -e ".[dev]" # for development like linting and debugging ``` **Using uv** @@ -245,16 +236,26 @@ pip install -e ".[dev]" ```bash uv sync --extra vllm --extra dev --extra flash_attn -# If you have no GPU: +# If you have no GPU, try to use Tinker instead: # uv sync --extra tinker --extra dev ``` #### Via PyPI +If you just want to use the package without modifying the code: + ```bash pip install trinity-rft pip install flash-attn==2.8.1 ``` + +Or with `uv`: + +```bash +uv pip install trinity-rft +uv pip install flash-attn==2.8.1 +``` + > For training with **Megatron-LM**, please refer to [Megatron-LM Backend](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_megatron.html). ### Step 2: prepare dataset and model @@ -269,6 +270,10 @@ huggingface-cli download {model_name} --local-dir $MODEL_PATH/{model_name} modelscope download {model_name} --local_dir $MODEL_PATH/{model_name} ``` +For more details about model downloading, see [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) or [ModelScope](https://modelscope.cn/docs/models/download). + + + **Prepare the dataset** in the local directory `$DATASET_PATH/{dataset_name}`: ```bash @@ -277,6 +282,9 @@ huggingface-cli download {dataset_name} --repo-type dataset --local-dir $DATASET # Using Modelscope modelscope download --dataset {dataset_name} --local_dir $DATASET_PATH/{dataset_name} ``` +For more details about dataset downloading, see [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli#download-a-dataset-or-a-space) or [ModelScope](https://modelscope.cn/docs/datasets/download). + + ### Step 3: configurations @@ -291,13 +299,26 @@ To launch the web interface for minimal configurations, you can run trinity studio --port 8080 ``` -Then you can configure your RFT process in the web page and generate a config file. +Then you can configure your RFT process in the web page and generate a config file. You can save the config file for later use or run it directly as described in the following section. Advanced users can also edit the config file directly. We provide example config files in [`examples`](examples/). For complete GUI features, please refer to the monorepo for [Trinity-Studio](https://github.com/modelscope/Trinity-Studio). + +
+ + Example: config manager GUI + +![config-manager](https://img.alicdn.com/imgextra/i1/O1CN01yhYrV01lGKchtywSH_!!6000000004791-2-tps-1480-844.png) + + +
+ + + + ### Step 4: run the RFT process Start a ray cluster: @@ -310,6 +331,12 @@ ray start --address= ``` (Optional) You may use [Wandb](https://docs.wandb.ai/quickstart/) / [TensorBoard](https://www.tensorflow.org/tensorboard) / [MLFlow](https://mlflow.org) for better monitoring. Please refer to [this documentation](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html#monitor-configuration) for the corresponding configurations. +For example, to log in to Wandb: + +```shell +export WANDB_API_KEY= +wandb login +``` For command-line users, run the RFT process: @@ -323,6 +350,8 @@ Example — fine-tuning Qwen2.5-1.5B-Instruct on GSM8k with GRPO: trinity run --config examples/grpo_gsm8k/gsm8k.yaml ``` +For studio users, click "Run" in the web interface. + --- ## Contribution Guide @@ -343,12 +372,14 @@ See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed contribution guidelines, a This project is built upon many excellent open-source projects, including: -* [verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training -* [vLLM](https://github.com/vllm-project/vllm) for LLM inference -* [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines -* [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow -* [Ray](https://github.com/ray-project/ray) for distributed systems -* RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm) +* [verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training; +* [vLLM](https://github.com/vllm-project/vllm) for LLM inference; +* [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines; +* [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow; +* [Ray](https://github.com/ray-project/ray) for distributed systems; ++ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm); ++ ...... + ## Citation From ba25c87e2ceb6e95ed3a5cf205b336c264b78935 Mon Sep 17 00:00:00 2001 From: Parag Sharma Date: Mon, 12 Jan 2026 14:29:52 +0530 Subject: [PATCH 22/23] Update README with development installation command Added installation command for development dependencies. --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 9dbe055c6d..dc86ff459a 100644 --- a/README.md +++ b/README.md @@ -228,6 +228,7 @@ pip install -e ".[vllm,flash_attn]" # If you encounter issues when installing flash-attn, try: # pip install flash-attn==2.8.1 --no-build-isolation + pip install -e ".[dev]" # for development like linting and debugging ``` From 21a052f80724b5f7d0f4121cfa97789510e48c92 Mon Sep 17 00:00:00 2001 From: Yanxi Chen <153061753+yanxi-chen@users.noreply.github.com> Date: Mon, 12 Jan 2026 21:11:49 +0800 Subject: [PATCH 23/23] Update README.md Fix inconsistency of bullet points --- README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index dc86ff459a..a981df9b28 100644 --- a/README.md +++ b/README.md @@ -373,11 +373,11 @@ See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed contribution guidelines, a This project is built upon many excellent open-source projects, including: -* [verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training; -* [vLLM](https://github.com/vllm-project/vllm) for LLM inference; -* [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines; -* [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow; -* [Ray](https://github.com/ray-project/ray) for distributed systems; ++ [verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training; ++ [vLLM](https://github.com/vllm-project/vllm) for LLM inference; ++ [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines; ++ [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow; ++ [Ray](https://github.com/ray-project/ray) for distributed systems; + we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm); + ......