diff --git a/README.md b/README.md
index 5a7255ebb5..a981df9b28 100644
--- a/README.md
+++ b/README.md
@@ -4,10 +4,8 @@
   <img src="https://img.alicdn.com/imgextra/i1/O1CN01lvLpfw25Pl4ohGZnU_!!6000000007519-2-tps-1628-490.png" alt="Trinity-RFT" style="height: 120px;">
 </div>
 
-
 <h2 align="center">Trinity-RFT: A General-Purpose and Unified Framework for<br>Reinforcement Fine-Tuning of Large Language Models</h2>
 
-
 <div align="center">
 
 [![paper](http://img.shields.io/badge/cs.LG-2505.17826-B31B1B?logo=arxiv&logoColor=red)](https://arxiv.org/abs/2505.17826)
@@ -19,27 +17,19 @@
 
 ## 💡 What is Trinity-RFT?
 
-
 Trinity-RFT is a general-purpose, flexible and user-friendly framework for LLM reinforcement fine-tuning (RFT).
 It decouples RFT into three components that work in coordination:
 
 * **Explorer** generates experience data via agent-environment interaction;
-
 * **Trainer** updates model weights by minimizing losses on the data;
-
 * **Buffer** pipelines data processing throughout the RFT lifecycle.
 
-
 Trinity-RFT provides functionalities for users with different backgrounds and objectives:
 
 * 🤖 **Agent application developers:** Train LLM-powered agents and improve their capabilities in specific domains [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html)
-
 * 🧠 **Reinforcement learning researchers:** Design, implement and validate new RL algorithms using compact, plug-and-play modules that allow non-invasive customization [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html)
-
 * 📊 **Data engineers:** Create RFT datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)
 
-
-
 ## 🚀 News
 
 * [2025-12] [[Release Notes]](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 released: added [Tinker](https://thinkingmachines.ai/tinker/) backend for users **without GPUs**, add more benchmarks, enhance online RL and more.
@@ -64,82 +54,68 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
 </ul>
 </details>
 
-
 ## 🔨 Tutorials and Guidelines
 
-
-| Category | Tutorial / Guideline      |
-| --- | ----|
-| *Run diverse RFT modes* | + [Quick start: GRPO on GSM8k](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)<br>+ [Off-policy RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_advanced.html)<br>+ [Fully asynchronous RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_async_mode.html)<br>+ [Offline learning by DPO or SFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo.html)<br>+ [RFT without local GPU (Tinker Backend)](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_tinker_backend.html) |
-| *Multi-step agentic RL* | + [Concatenated multi-turn workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)<br>+ [General multi-step workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_step_wise.html)<br>+ [ReAct workflow with an agent framework](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)  <br>+ [Example: train a web-search agent](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) |
-| *Full-lifecycle data pipelines* | + [Rollout task mixing and selection](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)<br>+ [Online task curriculum](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374)) <br>+ [Research project: learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441)) <br>+ [Experience replay with prioritization](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [Advanced data processing & human-in-the-loop](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html)  |
-| *Algorithm development* | + [RL algorithm development with Trinity-RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) (📝 [paper](https://arxiv.org/pdf/2508.11408))<br>+ [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203)) <br>+ Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
-| *Benchmarks* | + [Benchmark toolkit (quick verification & experimentation)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)<br>+ [Guru-Math benchmark & comparison with veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>+ [FrozenLake benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>+ [Alfworld benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
-| *Going deeper into Trinity-RFT* | + [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>+ [GPU resource and training configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)<br>+ [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html)    |
-
+| Category                          | Tutorial / Guideline                                                                                             |
+|-----------------------------------|------------------------------------------------------------------------------------------------------------------|
+| *Run diverse RFT modes*           | • [Quick start: GRPO on GSM8k](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)<br>• [Off-policy RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_advanced.html)<br>• [Fully asynchronous RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_async_mode.html)<br>• [Offline learning by DPO or SFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo.html)<br>• [RFT without local GPU (Tinker Backend)](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_tinker_backend.html) |
+| *Multi-step agentic RL*           | • [Concatenated multi-turn workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)<br>• [General multi-step workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_step_wise.html)<br>• [ReAct workflow with an agent framework](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)<br>• [Example: train a web-search agent](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) |
+| *Full-lifecycle data pipelines*   | • [Rollout task mixing and selection](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)<br>• [Online task curriculum](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))<br>• [Research project: learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441))<br>• [Experience replay with prioritization](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>• [Advanced data processing & human-in-the-loop](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html) |
+| *Algorithm development*           | • [RL algorithm development with Trinity-RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) (📝 [paper](https://arxiv.org/pdf/2508.11408))<br>• [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203))<br>• Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
+| *Benchmarks*                      | • [Benchmark toolkit (quick verification & experimentation)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)<br>• [Guru-Math benchmark & comparison with veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>• [FrozenLake benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>• [Alfworld benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
+| *Going deeper into Trinity-RFT*   | • [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>• [GPU resource and training configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>• [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)<br>• [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) |
 
 > [!NOTE]
 > For more tutorials, please refer to the [Trinity-RFT documentation](https://modelscope.github.io/Trinity-RFT/).
 
-
-
 ## 🌟 Key Features
 
 * **Flexible RFT Modes:**
   - Supports synchronous/asynchronous, on-policy/off-policy, and online/offline RL.
   - Rollout and training can run separately and scale independently across devices.
   - Boost sample and time efficiency by experience replay.
-
   <img src="https://img.alicdn.com/imgextra/i3/O1CN01E7NskS1FFoTI9jlaQ_!!6000000000458-2-tps-1458-682.png" alt="RFT modes supported by Trinity-RFT" width="600" />
 
 * **Agentic RL Support:**
   - Supports both concatenated and general multi-step agentic workflows.
   - Able to directly train agent applications developed using agent frameworks like [AgentScope](https://github.com/agentscope-ai/agentscope).
-
   <img src="https://img.alicdn.com/imgextra/i1/O1CN01z1i7kk1jlMEVa8ZHV_!!6000000004588-2-tps-1262-695.png" alt="Agentic workflows" width="600" />
 
 * **Full-Lifecycle Data Pipelines:**
   - Enables pipeline processing of rollout tasks and experience samples.
   - Active data management (prioritization, cleaning, augmentation, etc.) throughout the RFT lifecycle.
   - Native support for multi-task joint learning and online task curriculum construction.
-
   <img src="https://img.alicdn.com/imgextra/i2/O1CN01Gk9CRw28NsL09nbOj_!!6000000007921-2-tps-2530-660.png" alt="Data pipeline design" width="720" />
 
 * **User-Friendly Design:**
   - Plug-and-play modules and decoupled architecture, facilitating easy adoption and development.
   - Rich graphical user interfaces enable low-code usage.
-
   <img src="https://img.alicdn.com/imgextra/i1/O1CN01Ti0o4320RywoAuyhN_!!6000000006847-2-tps-3840-2134.png" alt="System architecture" width="600" />
 
-
-
 ## 🔧 Supported Algorithms
 
-We list some algorithms supported by Trinity-RFT in the following table. For more details, the concrete configurations are shown in the [Algorithm module](https://github.com/modelscope/Trinity-RFT/blob/main/trinity/algorithm/algorithm.py). You can also set up new algorithms by customizing different components, see [tutorial](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html).
-
-| Algorithm | Doc / Example | Source Code | Key Configurations |
-|:-----------|:-----------|:---------------|:-----------|
-| PPO [[Paper](https://arxiv.org/pdf/1707.06347)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[Countdown Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/ppo_policy_loss.py)] | `algorithm_type: ppo` |
-| GRPO [[Paper](https://arxiv.org/pdf/2402.03300)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k)]| [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/grpo_advantage.py)] | `algorithm_type: grpo` |
-| CHORD 💡 [[Paper](https://arxiv.org/pdf/2508.11408)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)] [[ToolACE Example](https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/chord_policy_loss.py)] | `algorithm_type: mix_chord` |
-| REC Series 💡 [[Paper](https://arxiv.org/pdf/2509.24203)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/rec_policy_loss.py)] | `algorithm_type: rec` |
-| RLOO [[Paper](https://arxiv.org/pdf/2402.14740)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/rloo_advantage.py)] | `algorithm_type: rloo` |
-| REINFORCE++ [[Paper](https://arxiv.org/pdf/2501.03262)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/reinforce_advantage.py)] | `algorithm_type: reinforceplusplus` |
-| GSPO [[Paper](https://arxiv.org/pdf/2507.18071)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/gspo_policy_loss.py)] | `algorithm_type: gspo` |
-| TOPR [[Paper](https://arxiv.org/pdf/2503.14286)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/topr_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/topr_policy_loss.py)] | `algorithm_type: topr` |
-| sPPO [[Paper](https://arxiv.org/pdf/2108.05828)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/sppo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sppo_loss_fn.py)] | `algorithm_type: sppo` |
-| AsymRE [[Paper](https://arxiv.org/pdf/2506.20520)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py)] | `algorithm_type: asymre` |
-| CISPO [[Paper](https://arxiv.org/pdf/2506.13585)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py)] | `algorithm_type: cispo` |
-| SAPO [[Paper](https://arxiv.org/pdf/2511.20347)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py)] | `algorithm_type: sapo` |
+| Algorithm              | Doc / Example                                                                                   | Source Code                                                                                     | Key Configurations             |
+|------------------------|-------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|--------------------------------|
+| PPO [[Paper](https://arxiv.org/pdf/1707.06347)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[Countdown Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/ppo_policy_loss.py)] | `algorithm_type: ppo`          |
+| GRPO [[Paper](https://arxiv.org/pdf/2402.03300)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/grpo_advantage.py)] | `algorithm_type: grpo`         |
+| CHORD 💡 [[Paper](https://arxiv.org/pdf/2508.11408)] | [[Doc](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)] [[ToolACE Example](https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/chord_policy_loss.py)] | `algorithm_type: mix_chord`    |
+| REC Series 💡 [[Paper](https://arxiv.org/pdf/2509.24203)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k)]       | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/rec_policy_loss.py)] | `algorithm_type: rec`          |
+| RLOO [[Paper](https://arxiv.org/pdf/2402.14740)] | -                                                                                               | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/rloo_advantage.py)] | `algorithm_type: rloo`         |
+| REINFORCE++ [[Paper](https://arxiv.org/pdf/2501.03262)] | -                                                                                               | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/reinforce_advantage.py)] | `algorithm_type: reinforceplusplus` |
+| GSPO [[Paper](https://arxiv.org/pdf/2507.18071)] | -                                                                                               | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/gspo_policy_loss.py)] | `algorithm_type: gspo`         |
+| TOPR [[Paper](https://arxiv.org/pdf/2503.14286)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/topr_gsm8k)]      | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/topr_policy_loss.py)] | `algorithm_type: topr`         |
+| sPPO [[Paper](https://arxiv.org/pdf/2108.05828)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/sppo_gsm8k)]      | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sppo_loss_fn.py)] | `algorithm_type: sppo`         |
+| AsymRE [[Paper](https://arxiv.org/pdf/2506.20520)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k)]    | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py)] | `algorithm_type: asymre`       |
+| CISPO [[Paper](https://arxiv.org/pdf/2506.13585)] | -                                                                                               | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py)] | `algorithm_type: cispo`        |
+| SAPO [[Paper](https://arxiv.org/pdf/2511.20347)] | -                                                                                               | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py)] | `algorithm_type: sapo`         |
 | On-Policy Distillation [[Blog](https://thinkingmachines.ai/blog/on-policy-distillation/)] [[Paper](https://arxiv.org/pdf/2306.13649)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/on_policy_distill)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/common/workflows/on_policy_distill_workflow.py)] | `algorithm_type: on_policy_distill` |
 
-
-
 ---
 
 ## Table of Contents
 
 - [Quick Start](#quick-start)
+  - [Minimal CPU-Only Quick Start](#minimal-cpu-only-quick-start)
   - [Step 1: installation](#step-1-installation)
   - [Step 2: prepare dataset and model](#step-2-prepare-dataset-and-model)
   - [Step 3: configurations](#step-3-configurations)
@@ -148,46 +124,66 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor
 - [Acknowledgements](#acknowledgements)
 - [Citation](#citation)
 
-
+---
 
 ## Quick Start
 
-
 > [!NOTE]
 > This project is currently under active development. Comments and suggestions are welcome!
->
-> **No GPU? No problem!** You can still try it out:
-> 1. Follow the installation steps (feel free to skip GPU-specific packages like `flash-attn`)
-> 2. Run the **[Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**, which is specifically designed to work on CPU-only systems.
 
+### Minimal CPU-Only Quick Start
+
+If you do not have access to a GPU, you can still try Trinity-RFT using the Tinker backend.
+
+```bash
+# Create and activate environment
+python3.10 -m venv .venv
+source .venv/bin/activate
+
+# Install Trinity-RFT with CPU-only backend
+pip install -e ".[tinker]"
+```
+
+Run a simple example:
+
+```bash
+trinity run --config examples/tinker/tinker.yaml
+```
+
+This example is designed to run on CPU-only machines. See the complete [Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker) for more details.
+
+To run Trinity-RFT on GPU machines instead, please follow the steps below.
 
-### Step 1: installation
+### Step 1: Installation
 
 Before installing, make sure your system meets the following requirements:
 
-- **Python**: version 3.10 to 3.12 (inclusive)
-- **CUDA**: version >= 12.8
-- **GPUs**: at least 2 GPUs
+#### GPU Requirements
 
+- Python: version 3.10 to 3.12 (inclusive)
+- CUDA: version >= 12.8
+- GPUs: at least 2 GPUs
+
+**Recommended for first-time users:**
+
+* If you have no GPU → Use Tinker backend
+* If you want simple setup → Use Docker
+* If you want development & contribution → Use Conda / venv
 
 #### From Source (Recommended)
 
 If you plan to customize or contribute to Trinity-RFT, this is the best option.
 
-##### 1. Clone the Repository
+First, clone the repository:
 
 ```bash
 git clone https://github.com/modelscope/Trinity-RFT
 cd Trinity-RFT
 ```
 
-##### 2. Set Up Environment
-
-Choose one of the following options:
-
-##### Using Pre-built Docker Image (Recommended for Beginners)
+Then, set up environment via one of the following options:
 
-We provide a pre-built Docker image with GPU-related dependencies installed.
+**Using Pre-built Docker Image (Recommended for Beginners)**
 
 ```bash
 docker pull ghcr.io/modelscope/trinity-rft:latest
@@ -201,10 +197,9 @@ docker run -it \
   -v <path_to_your_data_and_checkpoints>:/data \
   ghcr.io/modelscope/trinity-rft:latest
 ```
-
 > This image has used `uv` to install all GPU-related dependencies of Trinity-RFT. The virtual environment will be automatically activated upon entering the container (you can also manually activate it via `source /opt/venv/bin/activate` if needed). You can use `uv pip install` to add extra packages as necessary.
 
-###### Using Conda
+**Using Conda**
 
 ```bash
 conda create -n trinity python=3.12
@@ -217,11 +212,10 @@ pip install -e ".[vllm,flash_attn]"
 
 # If you encounter issues when installing flash-attn, try:
 # pip install flash-attn==2.8.1 --no-build-isolation
-
 pip install -e ".[dev]"  # for development like linting and debugging
 ```
 
-###### Using venv
+**Using venv**
 
 ```bash
 python3.10 -m venv .venv
@@ -238,9 +232,7 @@ pip install -e ".[vllm,flash_attn]"
 pip install -e ".[dev]"  # for development like linting and debugging
 ```
 
-###### Using `uv`
-
-[`uv`](https://github.com/astral-sh/uv) is a modern Python package installer.
+**Using uv**
 
 ```bash
 uv sync --extra vllm --extra dev --extra flash_attn
@@ -266,19 +258,15 @@ uv pip install flash-attn==2.8.1
 ```
 
 > For training with **Megatron-LM**, please refer to [Megatron-LM Backend](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_megatron.html).
-
 ### Step 2: prepare dataset and model
 
-
 Trinity-RFT supports most datasets and models from Huggingface and ModelScope.
 
-
 **Prepare the model** in the local directory `$MODEL_PATH/{model_name}`:
 
 ```bash
 # Using Huggingface
 huggingface-cli download {model_name} --local-dir $MODEL_PATH/{model_name}
-
 # Using Modelscope
 modelscope download {model_name} --local_dir $MODEL_PATH/{model_name}
 ```
@@ -292,24 +280,20 @@ For more details about model downloading, see [Huggingface](https://huggingface.
 ```bash
 # Using Huggingface
 huggingface-cli download {dataset_name} --repo-type dataset --local-dir $DATASET_PATH/{dataset_name}
-
 # Using Modelscope
 modelscope download --dataset {dataset_name} --local_dir $DATASET_PATH/{dataset_name}
 ```
-
 For more details about dataset downloading, see [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli#download-a-dataset-or-a-space) or [ModelScope](https://modelscope.cn/docs/datasets/download).
 
 
 
 ### Step 3: configurations
 
-
 Trinity-RFT provides a web interface for configuring your RFT process.
 
 > [!NOTE]
 > This is an experimental feature, and we will continue to improve it.
 
-
 To launch the web interface for minimal configurations, you can run
 
 ```bash
@@ -338,19 +322,16 @@ For complete GUI features, please refer to the monorepo for [Trinity-Studio](htt
 
 ### Step 4: run the RFT process
 
-
 Start a ray cluster:
 
 ```shell
 # On master node
 ray start --head
-
 # On worker nodes
 ray start --address=<master_address>
 ```
 
-(Optional) You may use [Wandb](https://docs.wandb.ai/quickstart/) / [TensorBoard](https://www.tensorflow.org/tensorboard) / [MLFlow](https://mlflow.org) for better monitoring.
-Please refer to [this documentation](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html#monitor-configuration) for the corresponding configurations.
+(Optional) You may use [Wandb](https://docs.wandb.ai/quickstart/) / [TensorBoard](https://www.tensorflow.org/tensorboard) / [MLFlow](https://mlflow.org) for better monitoring. Please refer to [this documentation](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html#monitor-configuration) for the corresponding configurations.
 For example, to log in to Wandb:
 
 ```shell
@@ -364,22 +345,29 @@ For command-line users, run the RFT process:
 trinity run --config <config_path>
 ```
 
-For example, below is the command for fine-tuning Qwen2.5-1.5B-Instruct on GSM8k with GRPO:
+Example — fine-tuning Qwen2.5-1.5B-Instruct on GSM8k with GRPO:
 
-```shell
+```bash
 trinity run --config examples/grpo_gsm8k/gsm8k.yaml
 ```
 
 For studio users, click "Run" in the web interface.
 
-
+---
 
 ## Contribution Guide
 
 This project is currently under active development, and we welcome contributions from the community!
 
-See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed contribution guidelines.
+We welcome contributions of all kinds, including:
+
+* Documentation improvements
+* Example workflows
+* Bug fixes and performance optimizations
+
+If you're new to the project, documentation and example updates are a great place to start.
 
+See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed contribution guidelines, as well as our [good-first-issue list](https://github.com/modelscope/Trinity-RFT/issues/470).
 
 ## Acknowledgements