generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 176
Open
Labels
needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.
Description
What would you like to be added:
Support for running vLLM with the Data Parallel feature enabled (i.e. --data-parallel-size > 1).
At this time to simplify things we will only support multiple vLLM instances in one pod. In such a configuration all of the vLLM instances in a pod share the same IP address, but each one has a it's own HTTP Endpoint port. That port is used for both inference and metrics reporting.
Why is this needed:
Enable users of the Inference GAteway to deploy vLLM pods that use the vLLM data parallel feature for improved scalability.
Metadata
Metadata
Assignees
Labels
needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.