Inefficient GPU allocation for multi-replica service (especially small models)

Summary
-------

The current `num_hosts` parameter in `ProcessConfig`/`ServiceConfig` forces host-level allocation, preventing efficient GPU packing when multiple replicas could share the same host. e.g. a model that works best for TP=4 (vLLM). 

Current Behavior
----------------
```
service = await Generator.options(
    procs=4,           # 4 GPU processes per replica
    with_gpus=True,
    num_hosts=1,       # Each replica gets 1 dedicated host
    num_replicas=2     # Total: 2 hosts needed
).as_service()
```

Desired Behavior
----------------
Allow efficient GPU-level packing:
```
service = await Generator.options(
    procs=4,           # 4 GPU processes per replica
    with_gpus=True,
    num_replicas=2     # Provisioner packs onto available GPUs
).as_service()
```
Proposal 
----------------

The current resource allocating semantic is a bit awkward to express these three specifications at the same time
1. HostMesh 
2. ProcMesh (on which HostMesh)
3. ServiceProcMesh (on which HostMesh)

So the proposal is to change the current config semantic to be something along this line
```
device_per_host: 8 
host_meshs: # the following adds up to 7 hosts
   policy: 
      num_hosts: 4 
   ref_model:
      num_hosts: 1
   trainer: 
      num_hosts: 2

services:
   policy:
      procs: 4
      num_replicas: 8  # maximize the hosts
      with_gpus: true
      host_mesh: policy
   ref_model:
      procs: 4
      num_replicas: 1
      with_gpus: true
      host_mesh: ref_model
   reward_actor:
      procs: 4
      num_replicas: 1
      with_gpus: false
      host_mesh: default

actors:
   dataset:
      procs: 1
      with_gpus: false
      host_mesh: policy
   trainer:
      procs: 2
      with_gpus: true
      mesh_name: trainer
   replay_buffer:
      procs: 1
      with_gpus: false
      mesh_name: trainer
   compute_advantages:
      procs: 1
      with_gpus: false
      mesh_name: default

```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inefficient GPU allocation for multi-replica service (especially small models) #455

Summary

Current Behavior

Desired Behavior

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inefficient GPU allocation for multi-replica service (especially small models) #455

Description

Summary

Current Behavior

Desired Behavior

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions