|
| 1 | +--- |
| 2 | +name: add-rollout-function |
| 3 | +description: Guide for adding a new rollout function in slime and wiring it through --rollout-function-path. Use when user wants to implement custom rollout data generation logic, custom train/eval rollout outputs, or migrate from the default sglang rollout path. |
| 4 | +--- |
| 5 | + |
| 6 | +# Add Rollout Function |
| 7 | + |
| 8 | +Implement a custom rollout function and integrate it safely with slime training/eval flow. |
| 9 | + |
| 10 | +## When to Use |
| 11 | + |
| 12 | +Use this skill when: |
| 13 | + |
| 14 | +- User asks to add a new rollout task or rollout generation function |
| 15 | +- User asks to replace default `slime.rollout.sglang_rollout.generate_rollout` |
| 16 | +- User asks to customize train/eval data generation behavior |
| 17 | + |
| 18 | +## Step-by-Step Guide |
| 19 | + |
| 20 | +### Step 1: Choose the Right Starting Point |
| 21 | + |
| 22 | +Start from one of these references: |
| 23 | + |
| 24 | +- Async RL-style rollout: `slime/rollout/sglang_rollout.py` |
| 25 | +- Simple SFT-style rollout: `slime/rollout/sft_rollout.py` |
| 26 | + |
| 27 | +If the task needs engine-based async generation and rewards, use the sglang path as base. |
| 28 | +If the task is file/buffer-driven and simple, use sft path as base. |
| 29 | + |
| 30 | +### Step 2: Create the New Rollout Module |
| 31 | + |
| 32 | +Create a new file, for example: `slime/rollout/<your_rollout>.py` |
| 33 | + |
| 34 | +Required callable signature: |
| 35 | + |
| 36 | +```python |
| 37 | +def generate_rollout(args, rollout_id, data_source, evaluation=False) -> RolloutFnTrainOutput | RolloutFnEvalOutput: |
| 38 | + ... |
| 39 | +``` |
| 40 | + |
| 41 | +Return types are defined in `slime/rollout/base_types.py`. |
| 42 | + |
| 43 | +### Step 3: Implement Train and Eval Branches Explicitly |
| 44 | + |
| 45 | +- For training (`evaluation=False`), return `RolloutFnTrainOutput(samples=..., metrics=...)` |
| 46 | +- For evaluation (`evaluation=True`), return `RolloutFnEvalOutput(data=..., metrics=...)` |
| 47 | + |
| 48 | +Minimal skeleton: |
| 49 | + |
| 50 | +```python |
| 51 | +from slime.rollout.base_types import RolloutFnTrainOutput, RolloutFnEvalOutput |
| 52 | + |
| 53 | + |
| 54 | +def generate_rollout(args, rollout_id, data_source, evaluation=False): |
| 55 | + if evaluation: |
| 56 | + result = { |
| 57 | + "custom_eval": { |
| 58 | + "rewards": [], |
| 59 | + "truncated": [], |
| 60 | + "samples": [], |
| 61 | + } |
| 62 | + } |
| 63 | + return RolloutFnEvalOutput(data=result) |
| 64 | + |
| 65 | + groups = data_source.get_samples(args.rollout_batch_size) |
| 66 | + # fill Sample fields needed by training: tokens/response_length/reward/status (+ loss_mask when needed) |
| 67 | + return RolloutFnTrainOutput(samples=groups) |
| 68 | +``` |
| 69 | + |
| 70 | +### Step 4: Keep Data Contract Compatible |
| 71 | + |
| 72 | +For each generated sample, ensure required training fields are populated consistently with your objective: |
| 73 | + |
| 74 | +- `tokens` |
| 75 | +- `response_length` |
| 76 | +- `reward` (or reward dict if your setup uses keyed rewards) |
| 77 | +- `status` |
| 78 | + |
| 79 | +If partial rollout or masking logic is involved, keep `loss_mask` semantics consistent with existing behavior. |
| 80 | + |
| 81 | +### Step 5: Wire Through Arguments |
| 82 | + |
| 83 | +Set your function path via CLI: |
| 84 | + |
| 85 | +```bash |
| 86 | +--rollout-function-path slime.rollout.<your_rollout>.generate_rollout |
| 87 | +``` |
| 88 | + |
| 89 | +The default and signature expectation are documented in: |
| 90 | + |
| 91 | +- `slime/utils/arguments.py` |
| 92 | +- `docs/en/get_started/customization.md` |
| 93 | + |
| 94 | +## Common Mistakes |
| 95 | + |
| 96 | +- Returning raw Python lists/dicts with mismatched schema in custom path |
| 97 | +- Implementing only training branch and forgetting evaluation branch |
| 98 | +- Generating samples without required fields (`tokens`, `response_length`, `reward`, `status`) |
| 99 | +- Using blocking-heavy logic in high-frequency rollout paths without batching/concurrency control |
| 100 | + |
| 101 | +## Reference Locations |
| 102 | + |
| 103 | +- Default rollout: `slime/rollout/sglang_rollout.py` |
| 104 | +- Simple custom example: `slime/rollout/sft_rollout.py` |
| 105 | +- Output dataclasses: `slime/rollout/base_types.py` |
| 106 | +- Wiring/loading: `slime/ray/rollout.py` |
| 107 | +- Argument definition: `slime/utils/arguments.py` |
| 108 | +- Customization docs: `docs/en/get_started/customization.md` |
0 commit comments