Skip to content

Commit eaa6530

Browse files
authored
update customization docs (#1233)
1 parent 7cfed64 commit eaa6530

File tree

2 files changed

+108
-26
lines changed

2 files changed

+108
-26
lines changed

docs/en/get_started/customization.md

Lines changed: 54 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,15 @@ Below is a summary of all available customization interfaces and their purposes.
1919
| [`--custom-loss-function-path`](#9-custom-loss-function---custom-loss-function-path) | Implement custom training loss computation. |
2020
| [`--custom-tis-function-path`](#10-custom-tisrs-function---custom-tis-function-path) | Implement custom importance sampling for off-policy correction. |
2121
| [`--custom-reward-post-process-path`](#11-reward-post-processing---custom-reward-post-process-path) | Custom post-processing of rewards before advantage computation. |
22-
| [`--custom-rollout-log-function-path`](#12-logging-functions) | Custom logging for training rollouts. |
23-
| [`--custom-eval-rollout-log-function-path`](#12-logging-functions) | Custom logging for evaluation rollouts. |
24-
| [`--data-source-path`](#13-data-source---data-source-path) | Override the data source for rollout prompts. |
25-
| [`--eval-function-path`](#14-evaluation-function---eval-function-path) | Override the rollout function specifically for evaluation. |
26-
| [`--custom-megatron-init-path`](#15-megatron-hooks) | Custom initialization after Megatron setup. |
27-
| [`--custom-megatron-before-log-prob-hook-path`](#15-megatron-hooks) | Custom logic before log probability computation. |
28-
| [`--custom-megatron-before-train-step-hook-path`](#15-megatron-hooks) | Custom logic before each training step. |
29-
| [`--slime-router-middleware-paths`](#16-slime-router-middleware---slime-router-middleware-paths) | Add custom middleware to slime router. |
22+
| [`--custom-convert-samples-to-train-data-path`](#12-samples-to-train-data-conversion---custom-convert-samples-to-train-data-path) | Override the conversion of samples to training data format. |
23+
| [`--custom-rollout-log-function-path`](#13-logging-functions) | Custom logging for training rollouts. |
24+
| [`--custom-eval-rollout-log-function-path`](#13-logging-functions) | Custom logging for evaluation rollouts. |
25+
| [`--data-source-path`](#14-data-source---data-source-path) | Override the data source for rollout prompts. |
26+
| [`--eval-function-path`](#15-evaluation-function---eval-function-path) | Override the rollout function specifically for evaluation. |
27+
| [`--custom-megatron-init-path`](#16-megatron-hooks) | Custom initialization after Megatron setup. |
28+
| [`--custom-megatron-before-log-prob-hook-path`](#16-megatron-hooks) | Custom logic before log probability computation. |
29+
| [`--custom-megatron-before-train-step-hook-path`](#16-megatron-hooks) | Custom logic before each training step. |
30+
| [`--slime-router-middleware-paths`](#17-slime-router-middleware---slime-router-middleware-paths) | Add custom middleware to slime router. |
3031

3132
## Detailed Interface Reference
3233

@@ -240,7 +241,47 @@ def postprocess_function(args, samples: list[list[Sample]]) -> None
240241

241242
---
242243

243-
### 12. Logging Functions
244+
### 12. Samples to Train Data Conversion (`--custom-convert-samples-to-train-data-path`)
245+
246+
**Default**: `None` (uses built-in conversion logic)
247+
248+
**Purpose**: Override the conversion of samples to training data format.
249+
250+
**Signature**:
251+
```python
252+
def convert_samples_to_train_data(
253+
args,
254+
samples: list[Sample] | list[list[Sample]],
255+
) -> dict
256+
```
257+
258+
**Return Type**:
259+
```python
260+
dict: {
261+
"tokens": list[list[int]], # Token IDs for each sample
262+
"response_lengths": list[int], # Response lengths
263+
"rewards": list[float], # Normalized rewards
264+
"raw_reward": list[float], # Raw rewards
265+
"truncated": list[int], # Truncation flags (0 or 1)
266+
"sample_indices": list[int], # Sample indices
267+
"loss_masks": list[list[int]], # Loss masks for each sample
268+
# Optional fields:
269+
"round_number": list[int], # Round numbers (for rollout buffer)
270+
"rollout_log_probs": list, # Log probs (for off-policy correction)
271+
"rollout_routed_experts": list, # Routed experts (for MoE)
272+
"metadata": list, # Train metadata
273+
"multimodal_inputs": list, # Multimodal inputs (for VLM)
274+
"teacher_log_probs": list, # Teacher log probs (for distillation)
275+
}
276+
```
277+
278+
**Use Cases**:
279+
- Handling `list[list[Sample]]` inputs
280+
- Custom data format requirements for training
281+
282+
---
283+
284+
### 13. Logging Functions
244285

245286
#### Training Rollout Logging (`--custom-rollout-log-function-path`)
246287

@@ -262,7 +303,7 @@ def log_eval_rollout_data(rollout_id, args, data, extra_metrics) -> bool
262303

263304
---
264305

265-
### 13. Data Source (`--data-source-path`)
306+
### 14. Data Source (`--data-source-path`)
266307

267308
**Default**: `slime.rollout.data_source.RolloutDataSourceWithBuffer`
268309

@@ -288,7 +329,7 @@ class CustomDataSource(DataSource):
288329

289330
---
290331

291-
### 14. Evaluation Function (`--eval-function-path`)
332+
### 15. Evaluation Function (`--eval-function-path`)
292333

293334
**Default**: Same as `--rollout-function-path`
294335

@@ -300,7 +341,7 @@ class CustomDataSource(DataSource):
300341

301342
---
302343

303-
### 15. Megatron Hooks
344+
### 16. Megatron Hooks
304345

305346
#### Megatron Initialization (`--custom-megatron-init-path`)
306347

@@ -331,7 +372,7 @@ def custom_hook(args, rollout_id, step_id, model, optimizer, opt_param_scheduler
331372

332373
---
333374

334-
### 16. slime Router Middleware (`--slime-router-middleware-paths`)
375+
### 17. slime Router Middleware (`--slime-router-middleware-paths`)
335376

336377
**Purpose**: Add custom middleware to the slime router for request processing.
337378

docs/zh/get_started/customization.md

Lines changed: 54 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,15 @@ slime 通过函数路径参数提供了广泛的自定义能力。这些参数
1919
| [`--custom-loss-function-path`](#9-自定义损失函数---custom-loss-function-path) | 实现自定义训练损失计算。 |
2020
| [`--custom-tis-function-path`](#10-自定义-tisrs-函数---custom-tis-function-path) | 实现用于离策略(off-policy)校正的自定义重要性采样。 |
2121
| [`--custom-reward-post-process-path`](#11-奖励后处理---custom-reward-post-process-path) | 在优势计算前对奖励进行自定义后处理。 |
22-
| [`--custom-rollout-log-function-path`](#12-日志函数) | 训练 rollout 的自定义日志记录。 |
23-
| [`--custom-eval-rollout-log-function-path`](#12-日志函数) | 评估 rollout 的自定义日志记录。 |
24-
| [`--data-source-path`](#13-数据源---data-source-path) | 覆盖 rollout 提示词的数据源。 |
25-
| [`--eval-function-path`](#14-评估函数---eval-function-path) | 专门为评估覆盖 rollout 函数。 |
26-
| [`--custom-megatron-init-path`](#15-megatron-Hook) | Megatron 设置后的自定义初始化。 |
27-
| [`--custom-megatron-before-log-prob-hook-path`](#15-megatron-Hook) | log probability 计算前的自定义逻辑。 |
28-
| [`--custom-megatron-before-train-step-hook-path`](#15-megatron-Hook) | 每个训练步骤前的自定义逻辑。 |
29-
| [`--slime-router-middleware-paths`](#16-slime-router-中间件---slime-router-middleware-paths) | 向 slime router 添加自定义中间件。 |
22+
| [`--custom-convert-samples-to-train-data-path`](#12-样本转训练数据---custom-convert-samples-to-train-data-path) | 覆盖样本到训练数据格式的转换逻辑。 |
23+
| [`--custom-rollout-log-function-path`](#13-日志函数) | 训练 rollout 的自定义日志记录。 |
24+
| [`--custom-eval-rollout-log-function-path`](#13-日志函数) | 评估 rollout 的自定义日志记录。 |
25+
| [`--data-source-path`](#14-数据源---data-source-path) | 覆盖 rollout 提示词的数据源。 |
26+
| [`--eval-function-path`](#15-评估函数---eval-function-path) | 专门为评估覆盖 rollout 函数。 |
27+
| [`--custom-megatron-init-path`](#16-megatron-hook) | Megatron 设置后的自定义初始化。 |
28+
| [`--custom-megatron-before-log-prob-hook-path`](#16-megatron-hook) | log probability 计算前的自定义逻辑。 |
29+
| [`--custom-megatron-before-train-step-hook-path`](#16-megatron-hook) | 每个训练步骤前的自定义逻辑。 |
30+
| [`--slime-router-middleware-paths`](#17-slime-router-中间件---slime-router-middleware-paths) | 向 slime router 添加自定义中间件。 |
3031

3132
## 详细接口参考
3233

@@ -240,7 +241,47 @@ def postprocess_function(args, samples: list[list[Sample]]) -> None
240241

241242
---
242243

243-
### 12. 日志函数
244+
### 12. 样本转训练数据 (`--custom-convert-samples-to-train-data-path`)
245+
246+
**默认值**: `None`(使用内置转换逻辑)
247+
248+
**用途**: 覆盖样本到训练数据格式的转换逻辑。
249+
250+
**函数签名**:
251+
```python
252+
def convert_samples_to_train_data(
253+
args,
254+
samples: list[Sample] | list[list[Sample]],
255+
) -> dict
256+
```
257+
258+
**返回类型**:
259+
```python
260+
dict: {
261+
"tokens": list[list[int]], # 每个样本的 token ID
262+
"response_lengths": list[int], # 响应长度
263+
"rewards": list[float], # 归一化后的奖励
264+
"raw_reward": list[float], # 原始奖励
265+
"truncated": list[int], # 截断标志(0 或 1)
266+
"sample_indices": list[int], # 样本索引
267+
"loss_masks": list[list[int]], # 每个样本的损失掩码
268+
# 可选字段:
269+
"round_number": list[int], # 轮次编号(用于 rollout buffer)
270+
"rollout_log_probs": list, # log 概率(用于离策略校正)
271+
"rollout_routed_experts": list, # 路由专家(用于 MoE)
272+
"metadata": list, # 训练元数据
273+
"multimodal_inputs": list, # 多模态输入(用于 VLM)
274+
"teacher_log_probs": list, # 教师 log 概率(用于蒸馏)
275+
}
276+
```
277+
278+
**使用场景**:
279+
- 处理 `list[list[Sample]]` 输入
280+
- 自定义训练数据格式需求
281+
282+
---
283+
284+
### 13. 日志函数
244285

245286
#### 训练 Rollout 日志 (`--custom-rollout-log-function-path`)
246287

@@ -262,7 +303,7 @@ def log_eval_rollout_data(rollout_id, args, data, extra_metrics) -> bool
262303

263304
---
264305

265-
### 13. 数据源 (`--data-source-path`)
306+
### 14. 数据源 (`--data-source-path`)
266307

267308
**默认值**: `slime.rollout.data_source.RolloutDataSourceWithBuffer`
268309

@@ -288,7 +329,7 @@ class CustomDataSource(DataSource):
288329

289330
---
290331

291-
### 14. 评估函数 (`--eval-function-path`)
332+
### 15. 评估函数 (`--eval-function-path`)
292333

293334
**默认值**: 与 `--rollout-function-path` 相同
294335

@@ -300,7 +341,7 @@ class CustomDataSource(DataSource):
300341

301342
---
302343

303-
### 15. Megatron Hook
344+
### 16. Megatron Hook
304345

305346
#### Megatron 初始化 (`--custom-megatron-init-path`)
306347

@@ -331,7 +372,7 @@ def custom_hook(args, rollout_id, step_id, model, optimizer, opt_param_scheduler
331372

332373
---
333374

334-
### 16. slime Router 中间件 (`--slime-router-middleware-paths`)
375+
### 17. slime Router 中间件 (`--slime-router-middleware-paths`)
335376

336377
**用途**: 向 slime router 添加自定义中间件用于请求处理。
337378

0 commit comments

Comments
 (0)