|
| 1 | +### Introduction |
| 2 | + |
| 3 | +This is an example for SWE-agent training. This example uses NVIDIA's Nemo-Gym as the Gym environment implement, SWE-Gym as the training data, and SWE-bench as the evaluation. |
| 4 | + |
| 5 | +This implementation of this example is partially in submodules below: |
| 6 | +- Nemo-Gym: https://github.com/yueming-yuan/Gym/tree/slime-swe-agent |
| 7 | +- mini-swe-agent: https://github.com/yueming-yuan/nv-mini-swe-agent/tree/slime-swe-agent |
| 8 | + |
| 9 | + |
| 10 | +### Prepare environment |
| 11 | +#### Update submodules |
| 12 | +```bash |
| 13 | +git submodule update --init --recursive . |
| 14 | +``` |
| 15 | +#### Docker settings |
| 16 | +```bash |
| 17 | +# 1. create a docker network |
| 18 | +docker network create swe-net |
| 19 | + |
| 20 | +# 2. create environment docker |
| 21 | +docker run -itd \ |
| 22 | + --name swe_env \ |
| 23 | + --shm-size 16g \ |
| 24 | + -v /var/run/docker.sock:/var/run/docker.sock \ |
| 25 | + -v /mnt/data:/data \ |
| 26 | + -v /home/sglang-rl/<your_name>:/workspace \ |
| 27 | + --ipc=host \ |
| 28 | + --ulimit nofile=65536:65536 \ |
| 29 | + --ulimit memlock=-1 \ |
| 30 | + --ulimit stack=67108864 \ |
| 31 | + --network swe-net \ |
| 32 | + ubuntu:latest \ |
| 33 | + /bin/bash |
| 34 | + |
| 35 | +# 3. create slime docker |
| 36 | +docker run -itd \ |
| 37 | + --shm-size 32g \ |
| 38 | + --gpus all \ |
| 39 | + -v /mnt/data/cache/huggingface:/root/.cache/huggingface \ |
| 40 | + -v /mnt/data:/data \ |
| 41 | + -v /home/sglang-rl/<your_name>:/workspace \ |
| 42 | + --ipc=host \ |
| 43 | + --ulimit nofile=65536:65536 \ |
| 44 | + --ulimit memlock=-1 \ |
| 45 | + --ulimit stack=67108864 \ |
| 46 | + --privileged \ |
| 47 | + --network swe-net \ |
| 48 | + --name slime_<your_name> \ |
| 49 | + slimerl/slime:latest \ |
| 50 | + /bin/zsh |
| 51 | + |
| 52 | +# 4. install utils in environment docker |
| 53 | +docker exec -it swe_env /bin/bash |
| 54 | +apt update && apt install -y zsh curl git python3 python3-pip docker.io |
| 55 | +``` |
| 56 | +note: `-v /var/run/docker.sock:/var/run/docker.sock` is required for Docker-in-Docker SWE environment execution; use `--network swe-net` to enable communication between training & environment. |
| 57 | + |
| 58 | +#### Installation |
| 59 | + |
| 60 | +In **environment docker**, install Gym |
| 61 | +```bash |
| 62 | +git clone https://github.com/yueming-yuan/Gym |
| 63 | +cd Gym |
| 64 | + |
| 65 | +curl -LsSf https://astral.sh/uv/install.sh | sh |
| 66 | +source $HOME/.local/bin/env |
| 67 | +uv venv --python 3.12 && source .venv/bin/activate |
| 68 | +uv sync --extra dev --group docs |
| 69 | + |
| 70 | +# configure env.yaml |
| 71 | +echo "policy_base_url: https://api.openai.com/v1 |
| 72 | +policy_api_key: your-openai-api-key |
| 73 | +policy_model_name: gpt-4.1-2025-04-14 |
| 74 | +default_host: 0.0.0.0" > env.yaml |
| 75 | +``` |
| 76 | +note: set host IP to `0.0.0.0` to enable communications between dockers. |
| 77 | + |
| 78 | +then set up for SWE-agent server: |
| 79 | +```bash |
| 80 | +cd responses_api_agents/mini_swe_agent |
| 81 | +uv pip install -r requirements.txt |
| 82 | +``` |
| 83 | +Now you should be able to run the SWE-agent server. |
| 84 | + |
| 85 | +For **slime docker** setup, please follow the standard setup process. |
| 86 | + |
| 87 | +### Preparing data |
| 88 | +In **slime docker**, download **SWE-Gym** data from huggingface and convert it to Slime' prompt data format with this script. |
| 89 | +``` |
| 90 | +cd slime/examples/swe-agent |
| 91 | +python download_and_process_data.py --input SWE-Gym/SWE-Gym --output /root/swe_train.jsonl |
| 92 | +``` |
| 93 | + |
| 94 | +### Running train |
| 95 | +1. In environment docker, launch the agent server |
| 96 | +```bash |
| 97 | +cd Gym |
| 98 | +source .venv/bin/activate |
| 99 | +cd responses_api_agents/mini_swe_agent |
| 100 | +./start_server.sh |
| 101 | +``` |
| 102 | + |
| 103 | + |
| 104 | +2. In slime docker, |
| 105 | +(1) export `SWE_AGENT_GYM_URL` to be the port of the second server you started in Gym in environment docker, whose `server_type` is `responses_api_agents`. `swe_env` is the environment docker's name; replace it if you changed the name. |
| 106 | +(minor TODO: modify the port selections to avoid setting this every time.) (2) launch the training. |
| 107 | +```bash |
| 108 | +export SWE_AGENT_GYM_URL="http://swe_env:<port_of_responses_api_agents>" |
| 109 | +bash examples/swe-agent/run-qwen3-4b-instruct.sh |
| 110 | +``` |
| 111 | + |
| 112 | + |
| 113 | +### Troubleshooting |
| 114 | +1. The first time of every SWE environment can be slow, and may need to wait before generation, because each SWE-Gym task has a specific docker, and `docker pull` takes time. |
| 115 | +2. Sometimes the environment may also be slow at evaluation. The timeout of evaluation is 10 minutes by default. If the server is stuck at `[EVAL]<instance> Running eval`, you may need to wait for it. |
| 116 | + |
| 117 | +## Metrics |
| 118 | +``` |
| 119 | +agent/turns_mean, agent/turns_sum - Turn counts |
| 120 | +agent/tool_calls_mean, agent/tool_calls_sum - Tool call counts |
| 121 | +agent/total_time_mean/max/min - Total time statistics |
| 122 | +agent/model_query_time_sum_mean - Avg total model time per rollout |
| 123 | +agent/env_execution_time_sum_mean - Avg total env time per rollout |
| 124 | +agent/eval_time_mean - Avg evaluation time |
| 125 | +agent/overhead_time_mean - Avg overhead time |
| 126 | +agent/time_per_turn - Avg time per turn |
| 127 | +agent/model_query_time_avg - Avg model query time per turn |
| 128 | +agent/env_execution_time_avg - Avg env execution time per turn |
| 129 | +agent/model_time_ratio, agent/env_time_ratio - Time ratios |
| 130 | +``` |
0 commit comments