Skip to content

Commit 1c81e5d

Browse files
miles-code-angelzhaochenyang20gemini-code-assist[bot]zijiexia
authored
code sync (#1329)
Co-authored-by: zhaochenyang20 <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: zijiexia <[email protected]>
1 parent 85cde85 commit 1c81e5d

26 files changed

+792
-92
lines changed

.gitmodules

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[submodule "examples/swe-agent/nemo-gym"]
2+
path = examples/swe-agent/nemo-gym
3+
url = https://github.com/yueming-yuan/Gym
4+
branch = slime-swe-agent
5+
[submodule "examples/swe-agent/mini-swe-agent"]
6+
path = examples/swe-agent/mini-swe-agent
7+
url = https://github.com/yueming-yuan/nv-mini-swe-agent
8+
branch = slime-swe-agent

examples/formal_math/single_round/run_minimal.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -96,10 +96,14 @@
9696
)
9797

9898
wandb_args = (
99-
"--use-wandb "
100-
"--wandb-project slime-formal-math-run-minimal "
101-
"--wandb-group demo "
102-
"--wandb-key ${WANDB_API_KEY} "
99+
(
100+
"--use-wandb "
101+
"--wandb-project slime-formal-math-run-minimal "
102+
"--wandb-group demo "
103+
f"--wandb-key '{wandb_api_key}' "
104+
)
105+
if (wandb_api_key := os.environ.get("WANDB_API_KEY"))
106+
else ""
103107
)
104108

105109
train_args = (

examples/geo3k_vlm_multi_turn/run_geo3k_vlm_multi_turn.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -43,10 +43,14 @@ def execute():
4343
ckpt_args = f"--hf-checkpoint /root/models/{MODEL_NAME} "
4444

4545
wandb_args = (
46-
"--use-wandb "
47-
"--wandb-project slime-dev "
48-
"--wandb-group geo3k_vlm_multi_turn "
49-
"--wandb-key ${WANDB_API_KEY} "
46+
(
47+
"--use-wandb "
48+
"--wandb-project slime-dev "
49+
"--wandb-group geo3k_vlm_multi_turn "
50+
f"--wandb-key '{wandb_api_key}' "
51+
)
52+
if (wandb_api_key := os.environ.get("WANDB_API_KEY"))
53+
else ""
5054
)
5155

5256
rollout_args = (

examples/swe-agent/README.md

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
### Introduction
2+
3+
This is an example for SWE-agent training. This example uses NVIDIA's Nemo-Gym as the Gym environment implement, SWE-Gym as the training data, and SWE-bench as the evaluation.
4+
5+
This implementation of this example is partially in submodules below:
6+
- Nemo-Gym: https://github.com/yueming-yuan/Gym/tree/slime-swe-agent
7+
- mini-swe-agent: https://github.com/yueming-yuan/nv-mini-swe-agent/tree/slime-swe-agent
8+
9+
10+
### Prepare environment
11+
#### Update submodules
12+
```bash
13+
git submodule update --init --recursive .
14+
```
15+
#### Docker settings
16+
```bash
17+
# 1. create a docker network
18+
docker network create swe-net
19+
20+
# 2. create environment docker
21+
docker run -itd \
22+
--name swe_env \
23+
--shm-size 16g \
24+
-v /var/run/docker.sock:/var/run/docker.sock \
25+
-v /mnt/data:/data \
26+
-v /home/sglang-rl/<your_name>:/workspace \
27+
--ipc=host \
28+
--ulimit nofile=65536:65536 \
29+
--ulimit memlock=-1 \
30+
--ulimit stack=67108864 \
31+
--network swe-net \
32+
ubuntu:latest \
33+
/bin/bash
34+
35+
# 3. create slime docker
36+
docker run -itd \
37+
--shm-size 32g \
38+
--gpus all \
39+
-v /mnt/data/cache/huggingface:/root/.cache/huggingface \
40+
-v /mnt/data:/data \
41+
-v /home/sglang-rl/<your_name>:/workspace \
42+
--ipc=host \
43+
--ulimit nofile=65536:65536 \
44+
--ulimit memlock=-1 \
45+
--ulimit stack=67108864 \
46+
--privileged \
47+
--network swe-net \
48+
--name slime_<your_name> \
49+
slimerl/slime:latest \
50+
/bin/zsh
51+
52+
# 4. install utils in environment docker
53+
docker exec -it swe_env /bin/bash
54+
apt update && apt install -y zsh curl git python3 python3-pip docker.io
55+
```
56+
note: `-v /var/run/docker.sock:/var/run/docker.sock` is required for Docker-in-Docker SWE environment execution; use `--network swe-net` to enable communication between training & environment.
57+
58+
#### Installation
59+
60+
In **environment docker**, install Gym
61+
```bash
62+
git clone https://github.com/yueming-yuan/Gym
63+
cd Gym
64+
65+
curl -LsSf https://astral.sh/uv/install.sh | sh
66+
source $HOME/.local/bin/env
67+
uv venv --python 3.12 && source .venv/bin/activate
68+
uv sync --extra dev --group docs
69+
70+
# configure env.yaml
71+
echo "policy_base_url: https://api.openai.com/v1
72+
policy_api_key: your-openai-api-key
73+
policy_model_name: gpt-4.1-2025-04-14
74+
default_host: 0.0.0.0" > env.yaml
75+
```
76+
note: set host IP to `0.0.0.0` to enable communications between dockers.
77+
78+
then set up for SWE-agent server:
79+
```bash
80+
cd responses_api_agents/mini_swe_agent
81+
uv pip install -r requirements.txt
82+
```
83+
Now you should be able to run the SWE-agent server.
84+
85+
For **slime docker** setup, please follow the standard setup process.
86+
87+
### Preparing data
88+
In **slime docker**, download **SWE-Gym** data from huggingface and convert it to Slime' prompt data format with this script.
89+
```
90+
cd slime/examples/swe-agent
91+
python download_and_process_data.py --input SWE-Gym/SWE-Gym --output /root/swe_train.jsonl
92+
```
93+
94+
### Running train
95+
1. In environment docker, launch the agent server
96+
```bash
97+
cd Gym
98+
source .venv/bin/activate
99+
cd responses_api_agents/mini_swe_agent
100+
./start_server.sh
101+
```
102+
103+
104+
2. In slime docker,
105+
(1) export `SWE_AGENT_GYM_URL` to be the port of the second server you started in Gym in environment docker, whose `server_type` is `responses_api_agents`. `swe_env` is the environment docker's name; replace it if you changed the name.
106+
(minor TODO: modify the port selections to avoid setting this every time.) (2) launch the training.
107+
```bash
108+
export SWE_AGENT_GYM_URL="http://swe_env:<port_of_responses_api_agents>"
109+
bash examples/swe-agent/run-qwen3-4b-instruct.sh
110+
```
111+
112+
113+
### Troubleshooting
114+
1. The first time of every SWE environment can be slow, and may need to wait before generation, because each SWE-Gym task has a specific docker, and `docker pull` takes time.
115+
2. Sometimes the environment may also be slow at evaluation. The timeout of evaluation is 10 minutes by default. If the server is stuck at `[EVAL]<instance> Running eval`, you may need to wait for it.
116+
117+
## Metrics
118+
```
119+
agent/turns_mean, agent/turns_sum - Turn counts
120+
agent/tool_calls_mean, agent/tool_calls_sum - Tool call counts
121+
agent/total_time_mean/max/min - Total time statistics
122+
agent/model_query_time_sum_mean - Avg total model time per rollout
123+
agent/env_execution_time_sum_mean - Avg total env time per rollout
124+
agent/eval_time_mean - Avg evaluation time
125+
agent/overhead_time_mean - Avg overhead time
126+
agent/time_per_turn - Avg time per turn
127+
agent/model_query_time_avg - Avg model query time per turn
128+
agent/env_execution_time_avg - Avg env execution time per turn
129+
agent/model_time_ratio, agent/env_time_ratio - Time ratios
130+
```
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
#!/usr/bin/env python3
2+
"""Download and process data to Slime format."""
3+
4+
import argparse
5+
import json
6+
import tempfile
7+
from pathlib import Path
8+
from datasets import load_dataset
9+
10+
11+
def convert_to_slime_format(input_path: str, output_path: str, limit: int = None, split: str = "train"):
12+
"""Convert JSONL to Slime format.
13+
14+
Args:
15+
input_path: Path to input JSONL file
16+
output_path: Path to output JSONL file in Slime format
17+
limit: Optional limit on number of samples
18+
split: Dataset split name (used in metadata)
19+
"""
20+
count = 0
21+
with open(input_path) as fin, open(output_path, "w") as fout:
22+
for line in fin:
23+
if limit and count >= limit:
24+
break
25+
26+
instance = json.loads(line)
27+
28+
# Add subset and split to metadata for Gym API
29+
metadata = dict(instance)
30+
metadata["subset"] = "gym"
31+
metadata["split"] = split
32+
33+
slime_sample = {
34+
"prompt": instance.get("problem_statement", ""),
35+
"metadata": metadata,
36+
}
37+
38+
fout.write(json.dumps(slime_sample) + "\n")
39+
count += 1
40+
41+
print(f"Converted {count} samples: {input_path} -> {output_path}")
42+
43+
44+
def main():
45+
parser = argparse.ArgumentParser(description="Download HuggingFace dataset and convert to Slime format")
46+
parser.add_argument("--input", type=str, required=True, help="HuggingFace dataset path or local JSONL file")
47+
parser.add_argument("--output", type=str, required=True, help="Output JSONL file path")
48+
parser.add_argument(
49+
"--split", type=str, default="train", help="Dataset split (default: train, only for HF datasets)"
50+
)
51+
parser.add_argument("--limit", type=int, help="Limit number of samples")
52+
53+
args = parser.parse_args()
54+
55+
input_path = Path(args.input)
56+
57+
if input_path.exists() and input_path.suffix == ".jsonl":
58+
print(f"Processing local file: {args.input}")
59+
convert_to_slime_format(args.input, args.output, args.limit, args.split)
60+
else:
61+
print(f"Loading HuggingFace dataset: {args.input} (split={args.split})")
62+
ds = load_dataset(args.input, split=args.split)
63+
64+
if args.limit:
65+
ds = ds.select(range(min(args.limit, len(ds))))
66+
67+
tmp_path = None
68+
try:
69+
with tempfile.NamedTemporaryFile(mode="w", suffix=".jsonl", delete=False) as tmp:
70+
tmp_path = tmp.name
71+
72+
print(f"Downloading to temporary file: {tmp_path}")
73+
ds.to_json(tmp_path)
74+
75+
print(f"Converting to Slime format: {args.output}")
76+
convert_to_slime_format(tmp_path, args.output, split=args.split)
77+
finally:
78+
if tmp_path and Path(tmp_path).exists():
79+
Path(tmp_path).unlink()
80+
81+
print("Done.")
82+
83+
84+
if __name__ == "__main__":
85+
main()

0 commit comments

Comments
 (0)