Skip to content

Add support for vLLM's Data Parallel #1519

@shmuelk

Description

@shmuelk

What would you like to be added:
Support for running vLLM with the Data Parallel feature enabled (i.e. --data-parallel-size > 1).

At this time to simplify things we will only support multiple vLLM instances in one pod. In such a configuration all of the vLLM instances in a pod share the same IP address, but each one has a it's own HTTP Endpoint port. That port is used for both inference and metrics reporting.

Why is this needed:

Enable users of the Inference GAteway to deploy vLLM pods that use the vLLM data parallel feature for improved scalability.

Metadata

Metadata

Assignees

Labels

needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions