update customization docs (#1233)

nanjiangwill · web-flow · commit eaa6530520c2 · 2025-12-27T00:39:04.000+08:00
diff --git a/docs/en/get_started/customization.md b/docs/en/get_started/customization.md
@@ -19,14 +19,15 @@ Below is a summary of all available customization interfaces and their purposes.
 | [`--custom-loss-function-path`](#9-custom-loss-function---custom-loss-function-path) | Implement custom training loss computation. |
 | [`--custom-tis-function-path`](#10-custom-tisrs-function---custom-tis-function-path) | Implement custom importance sampling for off-policy correction. |
 | [`--custom-reward-post-process-path`](#11-reward-post-processing---custom-reward-post-process-path) | Custom post-processing of rewards before advantage computation. |
-| [`--custom-rollout-log-function-path`](#12-logging-functions) | Custom logging for training rollouts. |
-| [`--custom-eval-rollout-log-function-path`](#12-logging-functions) | Custom logging for evaluation rollouts. |
-| [`--data-source-path`](#13-data-source---data-source-path) | Override the data source for rollout prompts. |
-| [`--eval-function-path`](#14-evaluation-function---eval-function-path) | Override the rollout function specifically for evaluation. |
-| [`--custom-megatron-init-path`](#15-megatron-hooks) | Custom initialization after Megatron setup. |
-| [`--custom-megatron-before-log-prob-hook-path`](#15-megatron-hooks) | Custom logic before log probability computation. |
-| [`--custom-megatron-before-train-step-hook-path`](#15-megatron-hooks) | Custom logic before each training step. |
-| [`--slime-router-middleware-paths`](#16-slime-router-middleware---slime-router-middleware-paths) | Add custom middleware to slime router. |
+| [`--custom-convert-samples-to-train-data-path`](#12-samples-to-train-data-conversion---custom-convert-samples-to-train-data-path) | Override the conversion of samples to training data format. |
+| [`--custom-rollout-log-function-path`](#13-logging-functions) | Custom logging for training rollouts. |
+| [`--custom-eval-rollout-log-function-path`](#13-logging-functions) | Custom logging for evaluation rollouts. |
+| [`--data-source-path`](#14-data-source---data-source-path) | Override the data source for rollout prompts. |
+| [`--eval-function-path`](#15-evaluation-function---eval-function-path) | Override the rollout function specifically for evaluation. |
+| [`--custom-megatron-init-path`](#16-megatron-hooks) | Custom initialization after Megatron setup. |
+| [`--custom-megatron-before-log-prob-hook-path`](#16-megatron-hooks) | Custom logic before log probability computation. |
+| [`--custom-megatron-before-train-step-hook-path`](#16-megatron-hooks) | Custom logic before each training step. |
+| [`--slime-router-middleware-paths`](#17-slime-router-middleware---slime-router-middleware-paths) | Add custom middleware to slime router. |
 
 ## Detailed Interface Reference
 
@@ -240,7 +241,47 @@ def postprocess_function(args, samples: list[list[Sample]]) -> None
 
 ---
 
-### 12. Logging Functions
+### 12. Samples to Train Data Conversion (`--custom-convert-samples-to-train-data-path`)
+
+**Default**: `None` (uses built-in conversion logic)
+
+**Purpose**: Override the conversion of samples to training data format.
+
+**Signature**:
+```python
+def convert_samples_to_train_data(
+    args,
+    samples: list[Sample] | list[list[Sample]],
+) -> dict
+```
+
+**Return Type**:
+```python
+dict: {
+    "tokens": list[list[int]],           # Token IDs for each sample
+    "response_lengths": list[int],        # Response lengths
+    "rewards": list[float],               # Normalized rewards
+    "raw_reward": list[float],            # Raw rewards
+    "truncated": list[int],               # Truncation flags (0 or 1)
+    "sample_indices": list[int],          # Sample indices
+    "loss_masks": list[list[int]],        # Loss masks for each sample
+    # Optional fields:
+    "round_number": list[int],            # Round numbers (for rollout buffer)
+    "rollout_log_probs": list,            # Log probs (for off-policy correction)
+    "rollout_routed_experts": list,       # Routed experts (for MoE)
+    "metadata": list,                     # Train metadata
+    "multimodal_inputs": list,            # Multimodal inputs (for VLM)
+    "teacher_log_probs": list,            # Teacher log probs (for distillation)
+}
+```
+
+**Use Cases**:
+- Handling `list[list[Sample]]` inputs
+- Custom data format requirements for training
+
+---
+
+### 13. Logging Functions
 
 #### Training Rollout Logging (`--custom-rollout-log-function-path`)
 
@@ -262,7 +303,7 @@ def log_eval_rollout_data(rollout_id, args, data, extra_metrics) -> bool
 
 ---
 
-### 13. Data Source (`--data-source-path`)
+### 14. Data Source (`--data-source-path`)
 
 **Default**: `slime.rollout.data_source.RolloutDataSourceWithBuffer`
 
@@ -288,7 +329,7 @@ class CustomDataSource(DataSource):
 
 ---
 
-### 14. Evaluation Function (`--eval-function-path`)
+### 15. Evaluation Function (`--eval-function-path`)
 
 **Default**: Same as `--rollout-function-path`
 
@@ -300,7 +341,7 @@ class CustomDataSource(DataSource):
 
 ---
 
-### 15. Megatron Hooks
+### 16. Megatron Hooks
 
 #### Megatron Initialization (`--custom-megatron-init-path`)
 
@@ -331,7 +372,7 @@ def custom_hook(args, rollout_id, step_id, model, optimizer, opt_param_scheduler
 
 ---
 
-### 16. slime Router Middleware (`--slime-router-middleware-paths`)
+### 17. slime Router Middleware (`--slime-router-middleware-paths`)
 
 **Purpose**: Add custom middleware to the slime router for request processing.
 
diff --git a/docs/zh/get_started/customization.md b/docs/zh/get_started/customization.md
@@ -19,14 +19,15 @@ slime 通过函数路径参数提供了广泛的自定义能力。这些参数
 | [`--custom-loss-function-path`](#9-自定义损失函数---custom-loss-function-path) | 实现自定义训练损失计算。 |
 | [`--custom-tis-function-path`](#10-自定义-tisrs-函数---custom-tis-function-path) | 实现用于离策略（off-policy）校正的自定义重要性采样。 |
 | [`--custom-reward-post-process-path`](#11-奖励后处理---custom-reward-post-process-path) | 在优势计算前对奖励进行自定义后处理。 |
-| [`--custom-rollout-log-function-path`](#12-日志函数) | 训练 rollout 的自定义日志记录。 |
-| [`--custom-eval-rollout-log-function-path`](#12-日志函数) | 评估 rollout 的自定义日志记录。 |
-| [`--data-source-path`](#13-数据源---data-source-path) | 覆盖 rollout 提示词的数据源。 |
-| [`--eval-function-path`](#14-评估函数---eval-function-path) | 专门为评估覆盖 rollout 函数。 |
-| [`--custom-megatron-init-path`](#15-megatron-Hook) | Megatron 设置后的自定义初始化。 |
-| [`--custom-megatron-before-log-prob-hook-path`](#15-megatron-Hook) | log probability 计算前的自定义逻辑。 |
-| [`--custom-megatron-before-train-step-hook-path`](#15-megatron-Hook) | 每个训练步骤前的自定义逻辑。 |
-| [`--slime-router-middleware-paths`](#16-slime-router-中间件---slime-router-middleware-paths) | 向 slime router 添加自定义中间件。 |
+| [`--custom-convert-samples-to-train-data-path`](#12-样本转训练数据---custom-convert-samples-to-train-data-path) | 覆盖样本到训练数据格式的转换逻辑。 |
+| [`--custom-rollout-log-function-path`](#13-日志函数) | 训练 rollout 的自定义日志记录。 |
+| [`--custom-eval-rollout-log-function-path`](#13-日志函数) | 评估 rollout 的自定义日志记录。 |
+| [`--data-source-path`](#14-数据源---data-source-path) | 覆盖 rollout 提示词的数据源。 |
+| [`--eval-function-path`](#15-评估函数---eval-function-path) | 专门为评估覆盖 rollout 函数。 |
+| [`--custom-megatron-init-path`](#16-megatron-hook) | Megatron 设置后的自定义初始化。 |
+| [`--custom-megatron-before-log-prob-hook-path`](#16-megatron-hook) | log probability 计算前的自定义逻辑。 |
+| [`--custom-megatron-before-train-step-hook-path`](#16-megatron-hook) | 每个训练步骤前的自定义逻辑。 |
+| [`--slime-router-middleware-paths`](#17-slime-router-中间件---slime-router-middleware-paths) | 向 slime router 添加自定义中间件。 |
 
 ## 详细接口参考
 
@@ -240,7 +241,47 @@ def postprocess_function(args, samples: list[list[Sample]]) -> None
 
 ---
 
-### 12. 日志函数
+### 12. 样本转训练数据 (`--custom-convert-samples-to-train-data-path`)
+
+**默认值**: `None`（使用内置转换逻辑）
+
+**用途**: 覆盖样本到训练数据格式的转换逻辑。
+
+**函数签名**:
+```python
+def convert_samples_to_train_data(
+    args,
+    samples: list[Sample] | list[list[Sample]],
+) -> dict
+```
+
+**返回类型**:
+```python
+dict: {
+    "tokens": list[list[int]],           # 每个样本的 token ID
+    "response_lengths": list[int],        # 响应长度
+    "rewards": list[float],               # 归一化后的奖励
+    "raw_reward": list[float],            # 原始奖励
+    "truncated": list[int],               # 截断标志（0 或 1）
+    "sample_indices": list[int],          # 样本索引
+    "loss_masks": list[list[int]],        # 每个样本的损失掩码
+    # 可选字段：
+    "round_number": list[int],            # 轮次编号（用于 rollout buffer）
+    "rollout_log_probs": list,            # log 概率（用于离策略校正）
+    "rollout_routed_experts": list,       # 路由专家（用于 MoE）
+    "metadata": list,                     # 训练元数据
+    "multimodal_inputs": list,            # 多模态输入（用于 VLM）
+    "teacher_log_probs": list,            # 教师 log 概率（用于蒸馏）
+}
+```
+
+**使用场景**:
+- 处理 `list[list[Sample]]` 输入
+- 自定义训练数据格式需求
+  
+---
+
+### 13. 日志函数
 
 #### 训练 Rollout 日志 (`--custom-rollout-log-function-path`)
 
@@ -262,7 +303,7 @@ def log_eval_rollout_data(rollout_id, args, data, extra_metrics) -> bool
 
 ---
 
-### 13. 数据源 (`--data-source-path`)
+### 14. 数据源 (`--data-source-path`)
 
 **默认值**: `slime.rollout.data_source.RolloutDataSourceWithBuffer`
 
@@ -288,7 +329,7 @@ class CustomDataSource(DataSource):
 
 ---
 
-### 14. 评估函数 (`--eval-function-path`)
+### 15. 评估函数 (`--eval-function-path`)
 
 **默认值**: 与 `--rollout-function-path` 相同
 
@@ -300,7 +341,7 @@ class CustomDataSource(DataSource):
 
 ---
 
-### 15. Megatron Hook
+### 16. Megatron Hook
 
 #### Megatron 初始化 (`--custom-megatron-init-path`)
 
@@ -331,7 +372,7 @@ def custom_hook(args, rollout_id, step_id, model, optimizer, opt_param_scheduler
 
 ---
 
-### 16. slime Router 中间件 (`--slime-router-middleware-paths`)
+### 17. slime Router 中间件 (`--slime-router-middleware-paths`)
 
 **用途**: 向 slime router 添加自定义中间件用于请求处理。