Skip to content

[RFC]: Omni Coordinator - OmniCoordinator & Load Balancer #87

@chickeyton

Description

@chickeyton

Motivation.

Parent RFC:
vllm-project#984

Proposed Change.

Create the central coordination service for instance discovery

in /vllim_omni/distributed/data_parallel/

1.1.1 Implement OmniCoordinator
1.1.2 Implement OmniCoordClientForStage
1.1.3 Implement OmniCoordClientForHub
1.1.4 Defines exchange messages and enums

Create the LoadBalancer

in /vllim_omni/distributed/load_balancer/

1.2.1 Implement LoadBalancer base class, LoadBalancer.select(*) inputs are the task and the instance list obtained from ClientForHub
1.2.2 Implement RandomBalancer as a conceret subclass of LoadBalancer

Unit Tests

in /tests/distributed/

1.3.1 Unit tests for DPCoordinator test_dp_coordinator.py
1.3.2 Unit tests for LoadBalancer test_load_balancer.py

Design Doc

vllm_omni_dp_router_desing.md

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions