Skip to content

Conversation

shmuelk
Copy link
Contributor

@shmuelk shmuelk commented Sep 28, 2025

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR adds support for the vLLM Data Parallel feature. The vLLM Data Parallel feature causes the vLLM "launcher" to launch many vLLM instances in the same Pod, each listening on a different port.

The InferencePool CRD has already been changed to support this by allowing up to eight TargetPorts to be specified. It is assumed that all pods in the InferencePool have been configured the same way WRT Data Parallelism.

In an attempt to minimize the amount of changes to the code, the datastore has been modified to create "virtual pods" from the real pods that are found by the pod reconciler. These virtual pods are all given names that are the real pod's name concatenated with the string "-rank-N", where N is a number from zero to seven. The term rank was used as that's what each of the separate vLLM "servers" in a Data Parallel configuration are called.

The former code has a notion of a globally known ports for inference and metrics scraping. This has been eliminated and instead inference port and metrics port fields have been added to the PodInfo struct. In addition a field was added, PodName, that contains the name of the real pod used to create the "virtual pods".

Lastly the API of the PreRequest extension point has been changed, removing the inference port parameter. Any PreRequest extensions must get the inference port of the pod(s) in question from the PodInfo's GetPort() API.

Which issue(s) this PR fixes:
Fixes #1519

Does this PR introduce a user-facing change?:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Sep 28, 2025
Copy link

netlify bot commented Sep 28, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit fec83a2
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68d9310bacaf920008bc440c
😎 Deploy Preview https://deploy-preview-1663--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 28, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shmuelk
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 28, 2025
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 28, 2025
Signed-off-by: Shmuel Kallner <[email protected]>
@danehans
Copy link
Contributor

danehans commented Oct 2, 2025

In an attempt to minimize the amount of changes to the code, the datastore has been modified to create "virtual pods" from the real pods that are found by the pod reconciler.

Pods are comprised of one or more containers. Will each vLLM engine instance run as a separate container in the vLLM pod?

Comment on lines -78 to -80
if p.ModelServerMetricsPort == 0 {
p.ModelServerMetricsPort = targetPortNumber
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that GetMetricsPort() does not implement this default behavior, which makes sense now that targetPortNumber is a list. We need to note this as a breaking change in the PR description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no breaking change. See the code in pkg/epp/datastore/datastore.go lines 242-244. If there is only one targetPort in the InferencePool and the ModelServerMetricsPort from the command line is not zero it will be used to fill the metricsPort in the PodInfo struct. The function GetMetricsPort() simply returns what was placed in the struct earlier.

@danehans
Copy link
Contributor

danehans commented Oct 4, 2025

@shmuelk please create a tracker issue for adding a conformance test that includes multiple InferencePool targetPorts.

@shmuelk
Copy link
Contributor Author

shmuelk commented Oct 5, 2025

In an attempt to minimize the amount of changes to the code, the datastore has been modified to create "virtual pods"
from the real pods that are found by the pod reconciler.

Pods are comprised of one or more containers. Will each vLLM engine instance run as a separate container in the vLLM pod?

They are run as separate processes in the same container as far as I know. What launches Data parallel in a real vLLM is a parameter to vLLM. I don't think that can add containers to the pod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for vLLM's Data Parallel
3 participants