Skip to content

Commit 52fc971

Browse files
authored
Add slime skills for rollout, reward, filter, eval config, and CI (#1646)
1 parent 6a3bb90 commit 52fc971

File tree

6 files changed

+462
-0
lines changed

6 files changed

+462
-0
lines changed

.agents/skills

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../.claude/skills
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
---
2+
name: add-dynamic-filter
3+
description: Guide for adding dynamic/filter hooks in slime rollout pipeline. Use when user wants sample-group selection during rollout, buffer filtering before training, or per-sample masking/processing hooks.
4+
---
5+
6+
# Add Dynamic Filter
7+
8+
Add filtering hooks in rollout and buffer stages while preserving sample-group contracts.
9+
10+
## When to Use
11+
12+
Use this skill when:
13+
14+
- User asks to filter sample groups during dynamic sampling
15+
- User asks to customize buffer extraction strategy
16+
- User asks to mask/remove some rollout samples before training
17+
- User asks to process all generated samples for logging/analysis
18+
19+
## Step-by-Step Guide
20+
21+
### Step 1: Pick the Correct Hook
22+
23+
- Dynamic sampling filter: `--dynamic-sampling-filter-path`
24+
- Buffer filter: `--buffer-filter-path`
25+
- Per-sample rollout filter: `--rollout-sample-filter-path`
26+
- All-samples post process: `--rollout-all-samples-process-path`
27+
28+
### Step 2: Implement the Function Signature
29+
30+
Dynamic sampling filter (called in `slime/rollout/sglang_rollout.py`):
31+
32+
```python
33+
def filter_function(args, samples, **kwargs):
34+
# return DynamicFilterOutput or bool
35+
```
36+
37+
Preferred return type:
38+
39+
```python
40+
from slime.rollout.filter_hub.base_types import DynamicFilterOutput
41+
42+
return DynamicFilterOutput(keep=True, reason=None)
43+
```
44+
45+
Buffer filter (called in `slime/rollout/data_source.py`):
46+
47+
```python
48+
def buffer_filter(args, rollout_id, buffer, num_samples):
49+
return selected_groups
50+
```
51+
52+
Rollout sample filter:
53+
54+
```python
55+
def rollout_sample_filter(args, samples):
56+
# modify sample.remove_sample in-place where needed
57+
```
58+
59+
All-samples process:
60+
61+
```python
62+
def process_all_samples(args, all_samples, data_source):
63+
...
64+
```
65+
66+
### Step 3: Preserve Group Structure
67+
68+
- Keep `list[list[Sample]]` structure intact where required.
69+
- Do not flatten sample groups unless downstream path expects flattened samples.
70+
- For sample removal, prefer `sample.remove_sample=True` over deleting objects.
71+
72+
### Step 4: Wire and Validate
73+
74+
Example wiring:
75+
76+
```bash
77+
--dynamic-sampling-filter-path slime.rollout.filter_hub.dynamic_sampling_filters.check_reward_nonzero_std
78+
--buffer-filter-path <module>.buffer_filter
79+
--rollout-sample-filter-path <module>.rollout_sample_filter
80+
--rollout-all-samples-process-path <module>.process_all_samples
81+
```
82+
83+
### Step 5: Measure Side Effects
84+
85+
- Check final sample count remains aligned with `rollout_batch_size` expectations.
86+
- Verify drop reasons are surfaced in rollout metrics when dynamic filter is used.
87+
- Validate training still receives valid loss masks/rewards after filtering.
88+
89+
## Common Mistakes
90+
91+
- Returning wrong container type for buffer filter
92+
- Dropping samples by deletion instead of mask flagging
93+
- Losing sample-group alignment in group-RM setup
94+
- Adding expensive logic in hot filtering paths
95+
96+
## Reference Locations
97+
98+
- Dynamic filter types: `slime/rollout/filter_hub/base_types.py`
99+
- Dynamic filter example: `slime/rollout/filter_hub/dynamic_sampling_filters.py`
100+
- Rollout generation hook points: `slime/rollout/sglang_rollout.py`
101+
- Buffer filter hook point: `slime/rollout/data_source.py`
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
---
2+
name: add-eval-dataset-config
3+
description: Guide for adding and validating evaluation dataset configuration in slime. Use when user wants to configure eval datasets via --eval-config or --eval-prompt-data, add per-dataset overrides, or customize evaluation rollout behavior.
4+
---
5+
6+
# Add Eval Dataset Config
7+
8+
Configure evaluation datasets in slime with explicit dataset-level overrides and predictable runtime behavior.
9+
10+
## When to Use
11+
12+
Use this skill when:
13+
14+
- User asks to add evaluation datasets for periodic eval
15+
- User asks to migrate from `--eval-prompt-data` to structured `--eval-config`
16+
- User asks for per-dataset eval overrides (sampling params, keys, rm_type, metadata)
17+
18+
## Step-by-Step Guide
19+
20+
### Step 1: Choose Config Entry Method
21+
22+
Supported inputs:
23+
24+
- Structured config file: `--eval-config <yaml>`
25+
- Legacy CLI pairs: `--eval-prompt-data <name1> <path1> <name2> <path2> ...`
26+
27+
If `--eval-interval` is set, eval datasets must be configured.
28+
29+
### Step 2: Build YAML with Required Fields
30+
31+
Each dataset needs at least:
32+
33+
- `name`
34+
- `path`
35+
36+
Example:
37+
38+
```yaml
39+
eval:
40+
defaults:
41+
n_samples_per_eval_prompt: 1
42+
temperature: 0.7
43+
top_p: 1.0
44+
datasets:
45+
- name: aime
46+
path: /path/to/aime.jsonl
47+
rm_type: math
48+
input_key: prompt
49+
label_key: answer
50+
metadata_overrides:
51+
split: test
52+
```
53+
54+
### Step 3: Understand Override Priority
55+
56+
`slime/utils/eval_config.py` resolves fields in this order:
57+
58+
1. Dataset-level values in `eval.datasets[*]`
59+
2. `eval.defaults`
60+
3. CLI args fallback (for example eval_* or rollout_* fields)
61+
62+
Common overridable fields include:
63+
64+
- Runtime: `n_samples_per_eval_prompt`, `temperature`, `top_p`, `top_k`, `max_response_len`
65+
- Sample keys: `input_key`, `label_key`, `tool_key`, `metadata_key`
66+
- Extra: `rm_type`, `custom_generate_function_path`, `metadata_overrides`
67+
68+
### Step 4: Wire Eval Function if Needed
69+
70+
By default, eval uses `--eval-function-path` (defaults to rollout function path).
71+
Use a separate eval function when inference/eval behavior must differ from training rollout.
72+
73+
### Step 5: Validate Parsing and Runtime
74+
75+
- Start with config parsing sanity by running a short launch command.
76+
- Confirm dataset entries are loaded into `args.eval_datasets`.
77+
- Verify output keys match eval logging/metrics expectations.
78+
79+
## Common Mistakes
80+
81+
- Missing `name` in dataset entries
82+
- Odd-length `--eval-prompt-data` pairs
83+
- Setting `--eval-interval` without any eval dataset
84+
- Mixing reward dict outputs without `eval_reward_key` configuration
85+
86+
## Reference Locations
87+
88+
- Eval config model: `slime/utils/eval_config.py`
89+
- Eval config resolution: `slime/utils/arguments.py`
90+
- Eval rollout path: `slime/rollout/sglang_rollout.py`
91+
- Customization docs: `docs/en/get_started/customization.md`
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
name: add-reward-function
3+
description: Guide for adding a custom reward function in slime and wiring it through --custom-rm-path (and optional reward post-processing). Use when user wants new reward logic, remote/service reward integration, or task-specific reward shaping.
4+
---
5+
6+
# Add Reward Function
7+
8+
Implement custom reward logic and connect it to slime rollout/training safely.
9+
10+
## When to Use
11+
12+
Use this skill when:
13+
14+
- User asks to add new reward computation logic
15+
- User asks to integrate an external reward service
16+
- User asks to customize reward normalization/post-processing
17+
18+
## Step-by-Step Guide
19+
20+
### Step 1: Choose Reward Mode
21+
22+
Pick one of these:
23+
24+
- Single-sample mode (`--group-rm` disabled): custom function gets one `Sample`
25+
- Group/batch mode (`--group-rm` enabled): custom function gets `list[Sample]`
26+
27+
`slime.rollout.rm_hub.__init__.py` calls your function via `--custom-rm-path`.
28+
29+
### Step 2: Create Reward Module
30+
31+
Create `slime/rollout/rm_hub/<your_rm>.py`.
32+
33+
Supported signatures:
34+
35+
```python
36+
async def custom_rm(args, sample):
37+
return float_reward_or_reward_dict
38+
```
39+
40+
```python
41+
async def custom_rm(args, samples):
42+
return list_of_rewards
43+
```
44+
45+
If using group mode, return one reward per sample in input order.
46+
47+
### Step 3: Keep Reward Type Consistent
48+
49+
- Return scalar numeric rewards unless your pipeline explicitly uses keyed rewards.
50+
- If using reward dicts, ensure downstream `reward_key` / `eval_reward_key` is configured.
51+
- Keep exceptions explicit for invalid metadata instead of silently returning zeros.
52+
53+
### Step 4: Optional Reward Post-Processing
54+
55+
To customize normalization/shaping before advantage computation, add:
56+
57+
```python
58+
def post_process_rewards(args, samples):
59+
# return (raw_rewards, processed_rewards)
60+
...
61+
```
62+
63+
Wire with:
64+
65+
```bash
66+
--custom-reward-post-process-path <module>.post_process_rewards
67+
```
68+
69+
This hook is consumed in `slime/ray/rollout.py`.
70+
71+
### Step 5: Wire and Validate
72+
73+
Use:
74+
75+
```bash
76+
--custom-rm-path slime.rollout.rm_hub.<your_rm>.custom_rm
77+
```
78+
79+
## Common Mistakes
80+
81+
- Returning wrong output shape in group mode
82+
- Mixing scalar rewards and reward dicts without `reward_key` config
83+
- Doing blocking network calls without async handling
84+
- Forgetting to validate reward behavior on truncated/failed samples
85+
86+
## Reference Locations
87+
88+
- Reward dispatch: `slime/rollout/rm_hub/__init__.py`
89+
- Reward post-process hook: `slime/ray/rollout.py`
90+
- Customization docs: `docs/en/get_started/customization.md`
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
---
2+
name: add-rollout-function
3+
description: Guide for adding a new rollout function in slime and wiring it through --rollout-function-path. Use when user wants to implement custom rollout data generation logic, custom train/eval rollout outputs, or migrate from the default sglang rollout path.
4+
---
5+
6+
# Add Rollout Function
7+
8+
Implement a custom rollout function and integrate it safely with slime training/eval flow.
9+
10+
## When to Use
11+
12+
Use this skill when:
13+
14+
- User asks to add a new rollout task or rollout generation function
15+
- User asks to replace default `slime.rollout.sglang_rollout.generate_rollout`
16+
- User asks to customize train/eval data generation behavior
17+
18+
## Step-by-Step Guide
19+
20+
### Step 1: Choose the Right Starting Point
21+
22+
Start from one of these references:
23+
24+
- Async RL-style rollout: `slime/rollout/sglang_rollout.py`
25+
- Simple SFT-style rollout: `slime/rollout/sft_rollout.py`
26+
27+
If the task needs engine-based async generation and rewards, use the sglang path as base.
28+
If the task is file/buffer-driven and simple, use sft path as base.
29+
30+
### Step 2: Create the New Rollout Module
31+
32+
Create a new file, for example: `slime/rollout/<your_rollout>.py`
33+
34+
Required callable signature:
35+
36+
```python
37+
def generate_rollout(args, rollout_id, data_source, evaluation=False) -> RolloutFnTrainOutput | RolloutFnEvalOutput:
38+
...
39+
```
40+
41+
Return types are defined in `slime/rollout/base_types.py`.
42+
43+
### Step 3: Implement Train and Eval Branches Explicitly
44+
45+
- For training (`evaluation=False`), return `RolloutFnTrainOutput(samples=..., metrics=...)`
46+
- For evaluation (`evaluation=True`), return `RolloutFnEvalOutput(data=..., metrics=...)`
47+
48+
Minimal skeleton:
49+
50+
```python
51+
from slime.rollout.base_types import RolloutFnTrainOutput, RolloutFnEvalOutput
52+
53+
54+
def generate_rollout(args, rollout_id, data_source, evaluation=False):
55+
if evaluation:
56+
result = {
57+
"custom_eval": {
58+
"rewards": [],
59+
"truncated": [],
60+
"samples": [],
61+
}
62+
}
63+
return RolloutFnEvalOutput(data=result)
64+
65+
groups = data_source.get_samples(args.rollout_batch_size)
66+
# fill Sample fields needed by training: tokens/response_length/reward/status (+ loss_mask when needed)
67+
return RolloutFnTrainOutput(samples=groups)
68+
```
69+
70+
### Step 4: Keep Data Contract Compatible
71+
72+
For each generated sample, ensure required training fields are populated consistently with your objective:
73+
74+
- `tokens`
75+
- `response_length`
76+
- `reward` (or reward dict if your setup uses keyed rewards)
77+
- `status`
78+
79+
If partial rollout or masking logic is involved, keep `loss_mask` semantics consistent with existing behavior.
80+
81+
### Step 5: Wire Through Arguments
82+
83+
Set your function path via CLI:
84+
85+
```bash
86+
--rollout-function-path slime.rollout.<your_rollout>.generate_rollout
87+
```
88+
89+
The default and signature expectation are documented in:
90+
91+
- `slime/utils/arguments.py`
92+
- `docs/en/get_started/customization.md`
93+
94+
## Common Mistakes
95+
96+
- Returning raw Python lists/dicts with mismatched schema in custom path
97+
- Implementing only training branch and forgetting evaluation branch
98+
- Generating samples without required fields (`tokens`, `response_length`, `reward`, `status`)
99+
- Using blocking-heavy logic in high-frequency rollout paths without batching/concurrency control
100+
101+
## Reference Locations
102+
103+
- Default rollout: `slime/rollout/sglang_rollout.py`
104+
- Simple custom example: `slime/rollout/sft_rollout.py`
105+
- Output dataclasses: `slime/rollout/base_types.py`
106+
- Wiring/loading: `slime/ray/rollout.py`
107+
- Argument definition: `slime/utils/arguments.py`
108+
- Customization docs: `docs/en/get_started/customization.md`

0 commit comments

Comments
 (0)