Skip to content

Commit 3c0bd3d

Browse files
authored
fix: fix launching vLLM with ray (#420)
1 parent f53feed commit 3c0bd3d

File tree

6 files changed

+18
-14
lines changed

6 files changed

+18
-14
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ state-of-the-art 7B and 32B models for mathematical reasoning. Check out our
7474
| Task | Description | Performance |
7575
| ------------------------------------------------ | ------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------- |
7676
| **[Math](examples/math/)** | Mathematical problem solving (SFT, GRPO, or PPO) | TBA |
77-
| **[Multi-Turn Math](examples/multi-turn-math/)** | Iterative mathematical problem solving with self-correction | [Training Curve](examples/multi-turn-math/reward_curve.png) |
77+
| **[Multi-Turn Math](examples/multi-turn-math/)** | Iterative mathematical problem solving with self-correction | [Training Curve](examples/multi-turn-math/reward_curve.png) |
7878
| **[LoRA Math](examples/lora/)** | Math Agent Trained With LoRA | TBA |
7979
| **[VLM Math](examples/vlm/)** | CLEVR visual counting tasks | TBA |
8080
| **[Reasoning](examples/countdown/)** | Countdown numbers game with custom rewards | [Training Curve](/examples/countdown/countdown_training_curve.png) |

areal/api/cli_args.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -295,7 +295,7 @@ class TrainEngineConfig:
295295
lora_alpha: int = field(default=16, metadata={"help": "lora alpha"})
296296
target_modules: List[str] = field(
297297
default_factory=list,
298-
metadata={"help": "lora target_modules. None defaults to 'all-linear'"},
298+
metadata={"help": "lora target_modules."},
299299
)
300300
peft_type: str = field(
301301
default="lora",

areal/launcher/sglang_server.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
import time
77
import uuid
88
from concurrent.futures import ThreadPoolExecutor
9+
from copy import deepcopy
910
from typing import Optional
1011

1112
import psutil
@@ -182,9 +183,10 @@ def run(self):
182183
host_ip = gethostip()
183184

184185
base_gpu_id = (server_local_idx - server_idx_offset) * gpus_per_server
185-
self.config.random_seed = base_random_seed + server_local_idx
186+
config = deepcopy(self.config)
187+
config.random_seed = base_random_seed + server_local_idx
186188
cmd = SGLangConfig.build_cmd(
187-
self.config,
189+
config,
188190
tp_size=self.allocation_mode.gen.tp_size,
189191
base_gpu_id=base_gpu_id,
190192
host=host_ip,

areal/launcher/vllm_server.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import time
55
import uuid
66
from concurrent.futures import ThreadPoolExecutor
7+
from copy import deepcopy
78
from typing import Optional
89

910
import requests
@@ -105,6 +106,7 @@ def run(self):
105106
n_servers_per_proc = max(1, n_visible_devices // gpus_per_server)
106107
server_idx_offset = min(list(map(int, visible))) // gpus_per_server
107108
else:
109+
visible = [str(i) for i in range(self.n_gpus_per_node)]
108110
n_servers_per_proc = n_servers_per_node
109111
server_idx_offset = 0
110112

@@ -114,8 +116,8 @@ def run(self):
114116
launch_server_args = []
115117
server_addresses = []
116118
base_random_seed = self.config.seed
117-
for server_local_idx in range(
118-
server_idx_offset, server_idx_offset + n_servers_per_proc
119+
for j, server_local_idx in enumerate(
120+
range(server_idx_offset, server_idx_offset + n_servers_per_proc)
119121
):
120122
port_range = (
121123
server_local_idx * ports_per_server + 10000,
@@ -126,15 +128,15 @@ def run(self):
126128
dist_init_addr = f"localhost:{dist_init_port}"
127129
host_ip = gethostip()
128130

129-
base_gpu_id = (server_local_idx - server_idx_offset) * gpus_per_server
130131
custom_env = {
131132
device_control_env_var: ",".join(
132-
map(str, range(base_gpu_id, base_gpu_id + gpus_per_server))
133+
visible[j * gpus_per_server : (j + 1) * gpus_per_server]
133134
)
134135
}
135-
self.config.seed = base_random_seed + server_local_idx
136+
config = deepcopy(self.config)
137+
config.seed = base_random_seed + server_local_idx
136138
cmd = vLLMConfig.build_cmd(
137-
self.config,
139+
config,
138140
tp_size=self.allocation_mode.gen.tp_size,
139141
host=host_ip,
140142
port=server_port,

docs/algorithms/rloo.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Author: [Honghua DONG](https://github.com/dhh1995)
66

77
![rloo figure](../figures/reinforce.png)
88

9-
REINFORCE Leave One-Out (RLOO), introduced by Ahmadian et al. (2024), is an RL method that removes the need for a value function (critic).
9+
REINFORCE Leave One-Out (RLOO), introduced by Ahmadian et al. (2024), is an RL method that removes the need for a value function (critic).
1010
Instead, it estimates the baseline by averaging rewards of other sampled responses for the same prompt within the group.
1111

1212
The overall core objective is:

docs/cli_reference.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -333,7 +333,7 @@ Configuration for PPO actor model, a subclass of a TrainEngine.
333333
| `use_lora` | boolean | `False` | Whether to use LoRA. Only support FSDP. Note that should be enabled together with vLLM/SGLang. |
334334
| `lora_rank` | integer | `32` | lora rank |
335335
| `lora_alpha` | integer | `16` | lora alpha |
336-
| `target_modules` | list of string | **Required** | lora target_modules. None defaults to 'all-linear' |
336+
| `target_modules` | list of string | **Required** | lora target_modules. |
337337
| `peft_type` | string | `"lora"` | peft method type. Only LoRA is supported for now. |
338338
| `group_size` | integer | `1` | Number of sequences in each group |
339339
| `ppo_n_minibatches` | integer | `4` | Number of minibatches for each PPO update |
@@ -388,7 +388,7 @@ Configuration for PPO critic model, a subclass of a TrainEngine.
388388
| `use_lora` | boolean | `False` | Whether to use LoRA. Only support FSDP. Note that should be enabled together with vLLM/SGLang. |
389389
| `lora_rank` | integer | `32` | lora rank |
390390
| `lora_alpha` | integer | `16` | lora alpha |
391-
| `target_modules` | list of string | **Required** | lora target_modules. None defaults to 'all-linear' |
391+
| `target_modules` | list of string | **Required** | lora target_modules. |
392392
| `peft_type` | string | `"lora"` | peft method type. Only LoRA is supported for now. |
393393
| `ppo_n_minibatches` | integer | `4` | Number of minibatches for each PPO update |
394394
| `eps_clip` | float | `0.5` | Clipping factor for value loss |
@@ -420,7 +420,7 @@ Core configuration for model training, including optimization and backend settin
420420
| `use_lora` | boolean | `False` | Whether to use LoRA. Only support FSDP. Note that should be enabled together with vLLM/SGLang. |
421421
| `lora_rank` | integer | `32` | lora rank |
422422
| `lora_alpha` | integer | `16` | lora alpha |
423-
| `target_modules` | list of string | **Required** | lora target_modules. None defaults to 'all-linear' |
423+
| `target_modules` | list of string | **Required** | lora target_modules. |
424424
| `peft_type` | string | `"lora"` | peft method type. Only LoRA is supported for now. |
425425

426426
(section-generation-hyperparameters)=

0 commit comments

Comments
 (0)