-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Labels
Tracking IssueContext for a long tailed trackingContext for a long tailed tracking
Description
Summary
The current num_hosts
parameter in ProcessConfig
/ServiceConfig
forces host-level allocation, preventing efficient GPU packing when multiple replicas could share the same host. e.g. a model that works best for TP=4 (vLLM).
Current Behavior
service = await Generator.options(
procs=4, # 4 GPU processes per replica
with_gpus=True,
num_hosts=1, # Each replica gets 1 dedicated host
num_replicas=2 # Total: 2 hosts needed
).as_service()
Desired Behavior
Allow efficient GPU-level packing:
service = await Generator.options(
procs=4, # 4 GPU processes per replica
with_gpus=True,
num_replicas=2 # Provisioner packs onto available GPUs
).as_service()
Proposal
The current resource allocating semantic is a bit awkward to express these three specifications at the same time
- HostMesh
- ProcMesh (on which HostMesh)
- ServiceProcMesh (on which HostMesh)
So the proposal is to change the current config semantic to be something along this line
device_per_host: 8
host_meshs: # the following adds up to 7 hosts
policy:
num_hosts: 4
ref_model:
num_hosts: 1
trainer:
num_hosts: 2
services:
policy:
procs: 4
num_replicas: 8 # maximize the hosts
with_gpus: true
host_mesh: policy
ref_model:
procs: 4
num_replicas: 1
with_gpus: true
host_mesh: ref_model
reward_actor:
procs: 4
num_replicas: 1
with_gpus: false
host_mesh: default
actors:
dataset:
procs: 1
with_gpus: false
host_mesh: policy
trainer:
procs: 2
with_gpus: true
mesh_name: trainer
replay_buffer:
procs: 1
with_gpus: false
mesh_name: trainer
compute_advantages:
procs: 1
with_gpus: false
mesh_name: default
Metadata
Metadata
Assignees
Labels
Tracking IssueContext for a long tailed trackingContext for a long tailed tracking