improve chinese doc

hiyuchang · hiyuchang · commit c3326b36f4e8 · 2026-01-23T10:59:00.000+08:00
diff --git a/docs/sphinx_doc/source/tutorial/metrics_reference.md b/docs/sphinx_doc/source/tutorial/metrics_reference.md
@@ -1,4 +1,4 @@
-# Metrics Reference
+# Monitor Metrics Reference
 
 This document provides an overview of the metric categories used in Trinity-RFT for tracking exploration, evaluation, and training progress.
 
@@ -122,7 +122,7 @@ graph TD
     end
 ```
 
-When you set `monitor.detailed_stats` to `True`, you will get detailed statistics including mean, std, min, max, e.g., `eval/dummy/accuracy/mean@2/mean=0.83`, `eval/dummy/accuracy/mean@2/std=0.062`, `eval/dummy/accuracy/mean@2/max=0.9`, and `eval/dummy/accuracy/mean@2/min=0.75`:
+When you set `monitor.detailed_stats` to `True`, you will get detailed statistics including mean, std, min, max, as shown in the following diagram:
 
 ```mermaid
 graph TD
@@ -137,20 +137,20 @@ graph TD
 
 #### Time Metrics (`time/`)
 
-Time metrics measure execution duration for various operations throughout the training pipeline.
+Time metrics measure execution duration for various operations throughout the rollout process.
 
 - **Format**: `time/{operation_name}`
 - **Examples**:
   - `time/eval`: Time from the start of submitting evaluation tasks to the end of the evaluation phase; this duration includes both evaluation tasks and some rollout tasks.
-  - `time/train_step`: Total time for one training step
+  - `time/wait_explore_step`: Time to wait for one rollout step to complete.
 
 **Note**:
-  - Time measuring can be inaccurate due to the asynchronous nature of the exploration pipeline, but it is still useful for monitoring the overall training progress.
-  - Above metrics are reported in seconds unless otherwise specified.
+  - Time measuring can be inaccurate due to the asynchronous nature of the rollout process, but it is still useful for monitoring the overall training progress.
+  - Time metrics are reported in seconds unless otherwise specified.
   - Some training operations also report per-token timing metrics with the prefix `timing_per_token_ms/` (e.g., `timing_per_token_ms/update_actor`, `timing_per_token_ms/update_critic`, `timing_per_token_ms/adv`, `timing_per_token_ms/values`). These metrics normalize execution time by the number of tokens processed, providing efficiency measurements independent of batch size.
 
 
-### Training Metrics
+### Trainer Metrics
 
 This category includes metrics that track the training dynamics of the policy (actor) model (`actor/`) and the value function (critic) model (`critic/`), as well as some performance metrics (`perf/`, `global_seqlen/`, `response_length/`, `prompt_length/`, `time/`). These metrics are adapted from [veRL](https://github.com/volcengine/verl). Interested users can refer to the [veRL documentation](https://verl.readthedocs.io/en/latest/index.html) for more details.
 
diff --git a/docs/sphinx_doc/source_zh/tutorial/metrics_reference.md b/docs/sphinx_doc/source_zh/tutorial/metrics_reference.md
@@ -1,4 +1,4 @@
-# 指标解释
+# 监控指标解释
 
 本文档解释了 Trinity-RFT 中用于跟踪探索、评估和训练进度的指标类别。
 
@@ -7,7 +7,7 @@
 大多数指标遵循分层命名规范：`{category}/{taskset_name}/{metric_name}/{statistic}`
 
 - **Category（类别）**：广泛的功能领域（rollout、eval、time、actor、critic 等）
-- **Taskset name（任务集名称）**：使用的任务集名称，仅适用于评估指标
+- **Taskset name（任务集名称）**：使用的任务集名称，仅适用于评估阶段的指标
 - **Metric name（指标名称）**：正在测量的具体指标
 - **Statistic（统计量）**：统计指标（mean、max、min、std 等，如适用）
 
@@ -18,7 +18,7 @@
 
 ### Explorer 相关指标
 
-探索器指标跟踪模型生成响应的 rollout 阶段的性能，包括 rollout 指标（`rollout/`）、评估指标（`eval/`）和一些时间指标（`time/`）。
+Explorer 相关的指标发生在模型生成响应的 rollout 阶段，包括 rollout 指标（`rollout/`）、评估指标（`eval/`）和一些时间指标（`time/`）。
 
 
 #### Rollout 指标（`rollout/`）
@@ -30,15 +30,15 @@ Rollout 指标跟踪模型生成响应的 rollout 阶段的性能。
   - `rollout/accuracy/mean`：生成响应的平均准确率
   - `rollout/format_score/mean`：平均格式正确性分数
 
-**指标聚合过程**：
+**指标计算过程**：
 
-考虑一个包含 `batch_size` 个任务的探索步骤，其中每个任务有 `repeat_times` 次运行。Rollout 指标（例如，`rollout/`）在不同级别计算和聚合：
+考虑一个包含 `batch_size` 个任务的探索步骤，其中每个任务有 `repeat_times` 次运行。Rollout 指标（例如，`rollout/`）在不同级别的计算过程如下：
 
 - 从*Run 级别*到*Task 级别*：在 `calculate_task_level_metrics` 函数中，指标跨同一任务的 `repeat_times` 次运行聚合。例如，`rollout/accuracy` 是该任务所有运行的平均准确率。
 
 - 从*Task 级别*到*Step 级别*：在 `gather_metrics` 函数中，指标跨步骤中所有任务聚合。例如，`rollout/accuracy/mean`、`rollout/accuracy/max`、`rollout/accuracy/min` 分别是步骤中所有任务的准确率（`rollout/accuracy`）的平均值、最大值和最小值。
 
-以下图表说明了 rollout 指标的聚合过程：
+以下图表说明了 rollout 指标的计算过程：
 
 ```mermaid
 graph TD
@@ -72,7 +72,7 @@ graph TD
 
 #### 评估指标（`eval/`）和基准测试指标（`bench/`）
 
-评估指标衡量模型在保留的评估任务上的性能。这些指标在定期评估运行期间计算。
+评估指标衡量模型在保留的评估任务上的性能。这些指标在评估阶段的每次运行期间计算。
 
 - **格式**：`eval/{task_name}/{metric_name}/{statistic}` 或 `bench/{task_name}/{metric_name}/{statistic}`
 - **示例**：
@@ -83,15 +83,15 @@ graph TD
   - Eval 和 bench 指标的计算方式相同，唯一的区别是指标名称的前缀。
   - 默认情况下，只返回指标的*平均值*。如果你想返回详细统计信息，可以在配置中将 `monitor.detailed_stats` 设置为 `True`。
 
-**指标聚合过程**：
+**指标计算过程**：
 
 考虑一个包含 `len(eval_taskset)` 个任务的评估步骤，其中每个任务有 `repeat_times` 次运行。评估指标（例如，`eval/`、`bench/`）在不同级别计算和聚合：
 
 - 从*Run 级别*到*Task 级别*：在 `calculate_task_level_metrics` 函数中，指标跨同一任务的 `repeat_times` 次运行聚合。例如，`eval/dummy/accuracy/mean@2` 是该任务所有运行的平均准确率。
 
 - 从*Task 级别*到*Step 级别*：在 `gather_eval_metrics` 函数中，指标跨步骤中所有任务聚合。例如，`eval/dummy/accuracy/mean@2`、`eval/dummy/accuracy/std@2`、`eval/dummy/accuracy/best@2`、`eval/dummy/accuracy/worst@2` 分别是步骤中所有任务的准确率（`eval/dummy/accuracy`）的平均值、标准差、最佳值和最差值。
 
-以下图表说明了在包含三个任务的虚拟数据集上评估指标的聚合过程。默认情况下，报告所有评估任务中指标的 `mean@k`、`std@k`、`best@k`、`worst@k`。你可以在配置中将 `monitor.detailed_stats` 设置为 `True` 以返回详细统计信息。
+以下图表说明了在包含三个任务的虚拟数据集上评估指标的计算过程。默认情况下，报告所有评估任务中指标的 `mean@k`、`std@k`、`best@k`、`worst@k`。你可以在配置中将 `monitor.detailed_stats` 设置为 `True` 以返回详细统计信息。
 
 ```mermaid
 graph TD
@@ -122,7 +122,8 @@ graph TD
         Task3_Metric --> Step_Metrics
     end
 ```
-当你将 `monitor.detailed_stats` 设置为 `True` 时，你会得到详细统计信息，包括 mean、std、min、max，例如 `eval/dummy/accuracy/mean@2/mean=0.83`、`eval/dummy/accuracy/mean@2/std=0.062`、`eval/dummy/accuracy/mean@2/max=0.9` 和 `eval/dummy/accuracy/mean@2/min=0.75`：
+
+当你将 `monitor.detailed_stats` 设置为 `True` 时，你会得到详细的统计信息，包括 mean、std、min、max，例如下面图表中给出的一些实例：
 
 ```mermaid
 graph TD
@@ -137,39 +138,39 @@ graph TD
 
 #### 时间指标（`time/`）
 
-时间指标测量整个训练管道中各种操作的执行持续时间。
+时间指标测量整个过程中各种操作的执行持续时间。
 
 - **格式**：`time/{operation_name}`
 - **示例**：
   - `time/eval`：从提交评估任务开始到评估阶段结束的时间；此持续时间包括评估任务和一些 rollout 任务。
-  - `time/train_step`：一个训练步骤的总时间
+  - `time/wait_explore_step`：等待一次 rollout 步骤完成的时间。
 
 **注意**：
-  - 由于探索管道的异步性质，时间测量可能不准确，但对于监控整体训练进度仍然有用。
-  - 除非另有说明，上述指标以秒为单位报告。
-  - 一些训练操作还报告每 token 的时间指标，前缀为 `timing_per_token_ms/`（例如，`timing_per_token_ms/update_actor`、`timing_per_token_ms/update_critic`、`timing_per_token_ms/adv`、`timing_per_token_ms/values`）。这些指标通过处理的 token 数量对执行时间进行归一化，提供独立于批次大小的效率测量。
+  - 由于 rollout 阶段的异步性质，时间测量的结果可能不太准确，但仍然可以用于监控整体训练进度。
+  - 除非特殊说明，时间指标以秒为单位。
+  - 一些训练操作还报告每个 token 的时间指标，前缀为 `timing_per_token_ms/`（例如，`timing_per_token_ms/update_actor`、`timing_per_token_ms/update_critic`、`timing_per_token_ms/adv`、`timing_per_token_ms/values`）。这些指标通过处理的 token 数量对执行时间进行归一化，提供独立于批次大小的效率测量。
 
 
-### 训练指标
+### Trainer 相关指标
 
-此类别包括跟踪策略（actor）模型（`actor/`）和价值函数（critic）模型（`critic/`）的训练动态的指标，以及一些性能指标（`perf/`、`global_seqlen/`、`response_length/`、`prompt_length/`、`time/`）。这些指标改编自 [veRL](https://github.com/volcengine/verl)。感兴趣的用户可以参考 [veRL 文档](https://verl.readthedocs.io/en/latest/index.html) 了解更多详细信息。
+此类别包括跟踪 actor 模型（`actor/`）和 critic 模型（`critic/`）的训练动态的指标，以及一些性能指标（`perf/`、`global_seqlen/`、`response_length/`、`prompt_length/`、`time/`）。这些指标和 [veRL](https://github.com/volcengine/verl) 中的指标一致，感兴趣的用户可以参考 [veRL 文档](https://verl.readthedocs.io/en/latest/index.html) 了解更多详细信息。
 
 
 ### 数据处理指标
 
-此类别包括跟踪通过各种管道操作符处理经验（`experience_pipeline/`）和数据采样统计（`sample/`）的指标。这些指标在步骤级别聚合，因为经验管道和数据采样在每个步骤中执行。
+此类别包括跟踪通过各种数据处理操作（`experience_pipeline/`）和数据采样统计（`sample/`）的指标。这些指标在步骤（step）级别计算，因为 experience 处理和数据采样会在每个步骤中执行一次。
 
 
 #### Experience Pipeline 相关指标（`experience_pipeline/` 和 `time/experience_pipeline/`）
 
-经验管道指标跟踪通过各种管道操作符处理经验。每个指标表示一个步骤中特定操作符的计数。
+Experience Pipeline 相关的指标统计了和数据处理相关的值，每个指标表示一个步骤中特定操作的统计值。
 
 - **格式**：`experience_pipeline/{metric_name}`
 - **示例**：
   - `experience_pipeline/experience_count`：处理的经验数量
   - `experience_pipeline/group_advantages/reward_mean/mean`：这里 `reward_mean` 是每个任务的平均奖励，然后我们计算步骤中所有任务的平均奖励的平均值。
 
-以下图表说明了数据处理指标的聚合过程：
+以下图表说明了数据处理指标的计算过程：
 ```mermaid
 graph TD
     subgraph Step["4 Experiences in one step"]
@@ -188,7 +189,7 @@ graph TD
 
 #### 采样相关指标（`sample/`）
 
-采样指标跟踪训练期间的数据采样统计。
+采样相关的指标发生在训练阶段，用于训练期间的数据采样统计。
 
 - **格式**：`sample/{metric_name}`
 - **示例**：