Add support for vLLM's Data Parallel

**What would you like to be added**:
Support for running vLLM with the Data Parallel feature enabled (i.e. --data-parallel-size > 1).

At this time to simplify things we will only support multiple vLLM instances in one pod. In such a configuration all of the vLLM instances in a pod share the same IP address, but each one has a it's own HTTP Endpoint port. That port is used for both inference and metrics reporting.

**Why is this needed**:

Enable users of the Inference GAteway to deploy vLLM pods that use the vLLM data parallel feature for improved scalability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for vLLM's Data Parallel #1519

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for vLLM's Data Parallel #1519

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions