Skip to content

Inefficient GPU allocation for multi-replica service (especially small models) #455

@JenniferWang

Description

@JenniferWang

Summary

The current num_hosts parameter in ProcessConfig/ServiceConfig forces host-level allocation, preventing efficient GPU packing when multiple replicas could share the same host. e.g. a model that works best for TP=4 (vLLM).

Current Behavior

service = await Generator.options(
    procs=4,           # 4 GPU processes per replica
    with_gpus=True,
    num_hosts=1,       # Each replica gets 1 dedicated host
    num_replicas=2     # Total: 2 hosts needed
).as_service()

Desired Behavior

Allow efficient GPU-level packing:

service = await Generator.options(
    procs=4,           # 4 GPU processes per replica
    with_gpus=True,
    num_replicas=2     # Provisioner packs onto available GPUs
).as_service()

Proposal

The current resource allocating semantic is a bit awkward to express these three specifications at the same time

  1. HostMesh
  2. ProcMesh (on which HostMesh)
  3. ServiceProcMesh (on which HostMesh)

So the proposal is to change the current config semantic to be something along this line

device_per_host: 8 
host_meshs: # the following adds up to 7 hosts
   policy: 
      num_hosts: 4 
   ref_model:
      num_hosts: 1
   trainer: 
      num_hosts: 2

services:
   policy:
      procs: 4
      num_replicas: 8  # maximize the hosts
      with_gpus: true
      host_mesh: policy
   ref_model:
      procs: 4
      num_replicas: 1
      with_gpus: true
      host_mesh: ref_model
   reward_actor:
      procs: 4
      num_replicas: 1
      with_gpus: false
      host_mesh: default

actors:
   dataset:
      procs: 1
      with_gpus: false
      host_mesh: policy
   trainer:
      procs: 2
      with_gpus: true
      mesh_name: trainer
   replay_buffer:
      procs: 1
      with_gpus: false
      mesh_name: trainer
   compute_advantages:
      procs: 1
      with_gpus: false
      mesh_name: default

Metadata

Metadata

Assignees

No one assigned

    Labels

    Tracking IssueContext for a long tailed tracking

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions