WIP: Support for vLLM Data parallel #1663

shmuelk · 2025-09-28T11:09:51Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR adds support for the vLLM Data Parallel feature. The vLLM Data Parallel feature causes the vLLM "launcher" to launch many vLLM instances in the same Pod, each listening on a different port.

The InferencePool CRD has already been changed to support this by allowing up to eight TargetPorts to be specified. It is assumed that all pods in the InferencePool have been configured the same way WRT Data Parallelism.

In an attempt to minimize the amount of changes to the code, the datastore has been modified to create "virtual pods" from the real pods that are found by the pod reconciler. These virtual pods are all given names that are the real pod's name concatenated with the string "-rank-N", where N is a number from zero to seven. The term rank was used as that's what each of the separate vLLM "servers" in a Data Parallel configuration are called.

The former code has a notion of a globally known ports for inference and metrics scraping. This has been eliminated and instead inference port and metrics port fields have been added to the PodInfo struct. In addition a field was added, PodName, that contains the name of the real pod used to create the "virtual pods".

Lastly the API of the PreRequest extension point has been changed, removing the inference port parameter. Any PreRequest extensions must get the inference port of the pod(s) in question from the PodInfo's GetPort() API.

Which issue(s) this PR fixes:
Fixes #1519

Does this PR introduce a user-facing change?:

Signed-off-by: Shmuel Kallner <[email protected]>

netlify · 2025-09-28T11:09:58Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`fec83a2`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68d9310bacaf920008bc440c
😎 Deploy Preview	https://deploy-preview-1663--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2025-09-28T11:10:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shmuelk
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Shmuel Kallner <[email protected]>

danehans · 2025-10-02T21:22:23Z

In an attempt to minimize the amount of changes to the code, the datastore has been modified to create "virtual pods" from the real pods that are found by the pod reconciler.

Pods are comprised of one or more containers. Will each vLLM engine instance run as a separate container in the vLLM pod?

danehans · 2025-10-03T17:07:00Z

pkg/epp/backend/metrics/metrics.go

-	if p.ModelServerMetricsPort == 0 {
-		p.ModelServerMetricsPort = targetPortNumber
-	}


I see that GetMetricsPort() does not implement this default behavior, which makes sense now that targetPortNumber is a list. We need to note this as a breaking change in the PR description.

There is no breaking change. See the code in pkg/epp/datastore/datastore.go lines 242-244. If there is only one targetPort in the InferencePool and the ModelServerMetricsPort from the command line is not zero it will be used to fill the metricsPort in the PodInfo struct. The function GetMetricsPort() simply returns what was placed in the struct earlier.

danehans · 2025-10-04T00:19:00Z

@shmuelk please create a tracker issue for adding a conformance test that includes multiple InferencePool targetPorts.

shmuelk · 2025-10-05T11:54:07Z

In an attempt to minimize the amount of changes to the code, the datastore has been modified to create "virtual pods"
from the real pods that are found by the pod reconciler.

Pods are comprised of one or more containers. Will each vLLM engine instance run as a separate container in the vLLM pod?

They are run as separate processes in the same container as far as I know. What launches Data parallel in a real vLLM is a parameter to vLLM. I don't think that can add containers to the pod.

shmuelk added 6 commits September 28, 2025 13:12

Removed global inference port from Prerequest extension API

06d9c55

Signed-off-by: Shmuel Kallner <[email protected]>

Inference port and metrics port now per pod

c334de6

Signed-off-by: Shmuel Kallner <[email protected]>

Differentiate between real pod delete and virtual pod delete

a91d6f0

Signed-off-by: Shmuel Kallner <[email protected]>

Pass default metrics port to datastore

444dbd7

Signed-off-by: Shmuel Kallner <[email protected]>

Updates to reflect newer APIs

e6741b4

Signed-off-by: Shmuel Kallner <[email protected]>

Updates to tests

26aaea9

Signed-off-by: Shmuel Kallner <[email protected]>

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Sep 28, 2025

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 28, 2025

k8s-ci-robot requested review from danehans and nirrozenbaum September 28, 2025 11:09

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 28, 2025

shmuelk added 3 commits September 28, 2025 14:59

Fail tests that have errors, don't just log the errors

6794fe5

Signed-off-by: Shmuel Kallner <[email protected]>

Remove tests that are no longer applicable

5d4098c

Signed-off-by: Shmuel Kallner <[email protected]>

Set an InferencePool into the datastore

6e1b7dc

Signed-off-by: Shmuel Kallner <[email protected]>

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 28, 2025

shmuelk added 2 commits September 28, 2025 15:46

Added tests with multiple TargetPorts

04b2965

Signed-off-by: Shmuel Kallner <[email protected]>

Fix lint issues

fec83a2

Signed-off-by: Shmuel Kallner <[email protected]>

danehans reviewed Oct 3, 2025

View reviewed changes

shmuelk mentioned this pull request Oct 5, 2025

Conformance: Add a complience test for Gateway support of multiple targetPorts in an InferencePool #1680

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: Support for vLLM Data parallel #1663

WIP: Support for vLLM Data parallel #1663

shmuelk commented Sep 28, 2025

Uh oh!

netlify bot commented Sep 28, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Sep 28, 2025

Uh oh!

danehans commented Oct 2, 2025

Uh oh!

danehans Oct 3, 2025

Uh oh!

shmuelk Oct 5, 2025

Uh oh!

danehans commented Oct 4, 2025

Uh oh!

shmuelk commented Oct 5, 2025

Uh oh!

Uh oh!

WIP: Support for vLLM Data parallel #1663

Are you sure you want to change the base?

WIP: Support for vLLM Data parallel #1663

Conversation

shmuelk commented Sep 28, 2025

Uh oh!

netlify bot commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Sep 28, 2025

Uh oh!

danehans commented Oct 2, 2025

Uh oh!

danehans Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

shmuelk Oct 5, 2025

Choose a reason for hiding this comment

Uh oh!

danehans commented Oct 4, 2025

Uh oh!

shmuelk commented Oct 5, 2025

Uh oh!

Uh oh!

netlify bot commented Sep 28, 2025 •

edited

Loading