chenyushuo
diff --git a/‎README.md‎
Lines changed: 3 additions & 2 deletions b/‎README.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎README_zh.md‎
Lines changed: 3 additions & 2 deletions b/‎README_zh.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎benchmark/config/countdown-template.yaml‎
Lines changed: 1 addition & 1 deletion b/‎benchmark/config/countdown-template.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/sphinx_doc/assets/agentscope_gsm8k_reward.png‎
-147 KB b/‎docs/sphinx_doc/assets/agentscope_gsm8k_reward.png‎
-147 KB
diff --git a/‎docs/sphinx_doc/assets/email_eval_accuracy.png‎
-15.8 KB b/‎docs/sphinx_doc/assets/email_eval_accuracy.png‎
-15.8 KB
diff --git a/‎docs/sphinx_doc/assets/email_reward_mean.png‎
464 KB b/‎docs/sphinx_doc/assets/email_reward_mean.png‎
464 KB
diff --git a/‎docs/sphinx_doc/assets/email_rollout_accuracy.png‎
-50.4 KB b/‎docs/sphinx_doc/assets/email_rollout_accuracy.png‎
-50.4 KB
diff --git a/‎docs/sphinx_doc/source/tutorial/example_async_mode.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/sphinx_doc/source/tutorial/example_async_mode.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/sphinx_doc/source/tutorial/example_reasoning_basic.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/sphinx_doc/source/tutorial/example_reasoning_basic.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/sphinx_doc/source/tutorial/example_search_email.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/sphinx_doc/source/tutorial/example_search_email.md‎
Lines changed: 1 addition & 0 deletions
@@ -82,6 +82,7 @@ Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuni
 
 ## 🚀 News
 
+* [2025-10] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.2)] Trinity-RFT v0.3.2 released: bug fixes and advanced task selection & scheduling.
 * [2025-10] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.1)] Trinity-RFT v0.3.1 released: multi-stage training support, improved agentic RL examples, LoRA support, debug mode and new RL algorithms.
 * [2025-09] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.0)] Trinity-RFT v0.3.0 released: enhanced Buffer, FSDP2 & Megatron support, multi-modal models, and new RL algorithms/examples.
 * [2025-08] Introducing [CHORD](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord): dynamic SFT + RL integration for advanced LLM fine-tuning ([paper](https://arxiv.org/pdf/2508.11408)).
@@ -177,14 +178,14 @@ uv sync --extra dev --extra flash_attn
 If you just want to use the package without modifying the code:
 
 ```bash
-pip install trinity-rft==0.3.1
+pip install trinity-rft
 pip install flash-attn==2.8.1
 ```
 
 Or with `uv`:
 
 ```bash
-uv pip install trinity-rft==0.3.1
+uv pip install trinity-rft
 uv pip install flash-attn==2.8.1
 ```
 
 
@@ -83,6 +83,7 @@ Trinity-RFT 是一个灵活、通用的大语言模型（LLM）强化微调（RF
 
 ## 🚀 新闻
 
+* [2025-10] [[发布说明](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.2)] Trinity-RFT v0.3.2 发布：修复若干 Bug 并支持进阶的任务选择和调度。
 * [2025-10] [[发布说明](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.1)] Trinity-RFT v0.3.1 发布：多阶段训练支持、改进的智能体 RL 示例、LoRA 支持、调试模式和全新 RL 算法。
 * [2025-09] [[发布说明](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.0)] Trinity-RFT v0.3.0 发布：增强的 Buffer、FSDP2 & Megatron 支持，多模态模型，以及全新 RL 算法/示例。
 * [2025-08] 推出 [CHORD](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord)：动态 SFT + RL 集成，实现进阶 LLM 微调（[论文](https://arxiv.org/pdf/2508.11408)）。
@@ -176,14 +177,14 @@ uv sync --extra dev --extra flash_attn
 如果您只需使用 Trinity-RFT 而不打算修改代码：
 
 ```bash
-pip install trinity-rft==0.3.1
+pip install trinity-rft
 pip install flash-attn==2.8.1
 ```
 
 或使用 `uv`：
 
 ```bash
-uv pip install trinity-rft==0.3.1
+uv pip install trinity-rft
 uv pip install flash-attn==2.8.1
 ```
 
 
@@ -54,7 +54,7 @@ explorer:
   rollout_model:
     engine_num: 2
     tensor_parallel_size: 1
-    enforce_eager: true
+    enforce_eager: false
     enable_prefix_caching: false
     enable_chunked_prefill: false
     gpu_memory_utilization: 0.9
 
@@ -31,7 +31,7 @@ buffer:
     taskset:
       name: gsm8k
       storage_type: file
-      path: 'openai/gsm8k'
+      path: ${oc.env:TRINITY_TASKSET_PATH,openai/gsm8k}
       subset_name: 'main'
       split: train
       format:
@@ -79,7 +79,7 @@ buffer:
     taskset:
       name: gsm8k
       storage_type: file
-      path: 'openai/gsm8k'
+      path: ${oc.env:TRINITY_TASKSET_PATH,openai/gsm8k}
       subset_name: 'main'
       format:
         prompt_key: 'question'
@@ -143,7 +143,7 @@ buffer:
     taskset:  # important
       name: gsm8k
       storage_type: file
-      path: 'openai/gsm8k'
+      path: ${oc.env:TRINITY_TASKSET_PATH,openai/gsm8k}
       subset_name: 'main'
       format:
         prompt_key: 'question'
 
@@ -69,7 +69,7 @@ buffer:
     taskset:
       name: gsm8k
       storage_type: file
-      path: 'openai/gsm8k'
+      path: ${oc.env:TRINITY_TASKSET_PATH,openai/gsm8k}
       subset_name: 'main'
       split: 'train'
       format:
@@ -81,7 +81,7 @@ buffer:
     eval_tasksets:
     - name: gsm8k-eval
       storage_type: file
-      path: 'openai/gsm8k'
+      path: ${oc.env:TRINITY_TASKSET_PATH,openai/gsm8k}
       subset_name: 'main'
       split: 'test'
       format:
 
@@ -48,5 +48,6 @@ The results are shown in the following figure (the accuracy ranges from -0.1 to
 
 ![](../../assets/email_rollout_accuracy.png)
 
+![](../../assets/email_reward_mean.png)
 
 ![](../../assets/email_eval_accuracy.png)
Original file line number	Diff line number	Diff line change
`@@ -48,5 +48,6 @@ The results are shown in the following figure (the accuracy ranges from -0.1 to`
`48`	`48`
`49`	`49`	`![](../../assets/email_rollout_accuracy.png)`
`50`	`50`
	`51`	`+![](../../assets/email_reward_mean.png)`
`51`	`52`
`52`	`53`	`![](../../assets/email_eval_accuracy.png)`