Skip to content

[Feat][Operator] Add prefixaware and kvaware routing options to VLLMRouter CRD#881

Open
keyuchen21 wants to merge 1 commit intovllm-project:mainfrom
keyuchen21:feat/operator-prefixaware-kvaware-routing
Open

[Feat][Operator] Add prefixaware and kvaware routing options to VLLMRouter CRD#881
keyuchen21 wants to merge 1 commit intovllm-project:mainfrom
keyuchen21:feat/operator-prefixaware-kvaware-routing

Conversation

@keyuchen21
Copy link
Contributor

Summary

Fixes #862.

The VLLMRouter CRD's routingLogic enum only allowed roundrobin and session, even though the router binary also supports prefixaware and kvaware. Users on CRD-based deployments (e.g. OpenShift) had to use the extraArgs escape hatch to access these modes.

  • Add prefixaware and kvaware to the routingLogic enum in the Go types and the generated CRD YAML
  • Add a first-class lmcacheControllerPort field (default 9000) for kvaware routing; the controller passes --lmcache-controller-port to the router container when set
  • Update the sample manifest comment to list all four supported routing strategies

Test plan

  • make manifests in operator/ regenerates the CRD YAML with the new enum values and lmcacheControllerPort field
  • kubectl apply -f operator/config/crd/bases/ applies cleanly
  • A VLLMRouter with routingLogic: prefixaware is accepted by the API server
  • A VLLMRouter with routingLogic: kvaware and lmcacheControllerPort: 9000 is accepted and the router pod starts with --routing-logic kvaware --lmcache-controller-port 9000 (verify via kubectl get deployment <name> -o yaml)

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the VLLMRouter Custom Resource Definition by integrating advanced routing capabilities directly into its schema. It allows users to configure prefixaware and kvaware routing strategies natively, eliminating the need for manual extraArgs workarounds. This change improves the usability and configurability of the VLLM router deployments, particularly for Kubernetes-native environments.

Highlights

  • CRD Enhancement: Extended the VLLMRouter CRD's routingLogic enum to include prefixaware and kvaware options, allowing users to leverage these routing strategies directly without extraArgs.
  • New Configuration Field: Introduced a new lmcacheControllerPort field in the VLLMRouter CRD, specifically for configuring the LMCache controller port when kvaware routing is used.
  • Controller Logic Update: Updated the VLLMRouter controller to correctly pass the lmcacheControllerPort as a command-line argument to the router container when specified.
  • Documentation Update: Revised the sample manifest comments to accurately reflect all four supported routing strategies: roundrobin, session, prefixaware, and kvaware.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • operator/api/v1alpha1/vllmrouter_types.go
    • Added prefixaware and kvaware to the RoutingLogic enum validation.
    • Introduced LmcacheControllerPort field with a default value of 9000.
  • operator/config/crd/bases/production-stack.vllm.ai_vllmrouters.yaml
    • Updated the routingLogic enum in the CRD YAML to include prefixaware and kvaware.
    • Added the lmcacheControllerPort definition to the CRD schema.
  • operator/config/samples/production-stack_v1alpha1_vllmrouter.yaml
    • Modified the comment for routingLogic to list all four supported strategies.
  • operator/internal/controller/vllmrouter_controller.go
    • Implemented logic to append --lmcache-controller-port argument to the router container if LmcacheControllerPort is set.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly adds support for prefixaware and kvaware routing to the VLLMRouter CRD. The changes to the CRD definition, controller, and sample files are logical and well-structured. However, there is a significant issue that needs to be addressed: changes to the new lmcacheControllerPort field will not trigger a deployment update. This is because the deploymentNeedsUpdate function in vllmrouter_controller.go does not compare container arguments when checking for changes. This is an existing bug, but this PR makes it more impactful. I strongly recommend updating deploymentNeedsUpdate to compare dep.Spec.Template.Spec.Containers[0].Args to ensure configuration changes are correctly applied. I have also added a specific comment to improve the robustness of the argument handling.

Comment on lines +291 to +293
if router.Spec.LmcacheControllerPort != 0 {
args = append(args, "--lmcache-controller-port", fmt.Sprintf("%d", router.Spec.LmcacheControllerPort))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve robustness and make the controller's behavior more explicit, it's better to only add the --lmcache-controller-port argument when the routingLogic is set to kvaware. This prevents passing a potentially unused argument to the router when other routing logics are selected.

Suggested change
if router.Spec.LmcacheControllerPort != 0 {
args = append(args, "--lmcache-controller-port", fmt.Sprintf("%d", router.Spec.LmcacheControllerPort))
}
if router.Spec.RoutingLogic == "kvaware" && router.Spec.LmcacheControllerPort != 0 {
args = append(args, "--lmcache-controller-port", fmt.Sprintf("%d", router.Spec.LmcacheControllerPort))
}

…outer CRD

Extend the VLLMRouter CRD to expose prefixaware and kvaware routing
strategies, which were previously only reachable via the extraArgs
escape hatch.

Changes:
- Add prefixaware and kvaware to the RoutingLogic enum in Go types and CRD YAML
- Add LmcacheControllerPort field (default 9000) for kvaware routing
- Only pass --lmcache-controller-port when routingLogic is kvaware
- Add container args comparison to deploymentNeedsUpdate so routing
  config changes correctly trigger a deployment rollout
- Update sample manifest comment to list all supported routing strategies

Fixes vllm-project#862

Signed-off-by: Keyu Chen <54015474+keyuchen21@users.noreply.github.com>
@keyuchen21 keyuchen21 force-pushed the feat/operator-prefixaware-kvaware-routing branch from a402b44 to a75f0da Compare March 9, 2026 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: operator vLLMRouter : missing routingLogic : prefixaware and kvaware

1 participant