Skip to content

Conversation

@davidbreitgand
Copy link
Contributor

kind/documentation
Closes #1858

Extends documentation

…serve multiple LoRAs (many LoRAs per one model while having multiple models)
@netlify
Copy link

netlify bot commented Nov 13, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 300fbbe
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/691626c07f41750008234c95
😎 Deploy Preview https://deploy-preview-1859--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: davidbreitgand
Once this PR has been reviewed and has the lgtm label, please assign ahg-g for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 13, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @davidbreitgand. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 13, 2025
@elevran
Copy link
Contributor

elevran commented Nov 17, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 17, 2025
value: /
headers:
- type: Exact
#Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header.
Copy link
Contributor

@nirrozenbaum nirrozenbaum Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this comment from the yaml?
theoretically for testing the HttpRoute functionality one can inject the header manually.
this has nothing to do with BBR, which is just an implementation detail and one way to inject the model name header.

Suggested change
#Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header.

value: /
headers:
- type: Exact
#Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Suggested change
#Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header.

Comment on lines +69 to +93
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: vllm-llama3-8b-instruct-lora-food-review-1 #give this HTTPRoute any name that helps you to group and track the routes
spec:
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: inference-gateway
rules:
- backendRefs:
- group: inference.networking.k8s.io
kind: InferencePool
name: vllm-llama3-8b-instruct
matches:
- path:
type: PathPrefix
value: /
headers:
- type: Exact
#Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header.
name: X-Gateway-Model-Name
value: 'food-review-1' #this is the name of LoRA as defined in vLLM deployment
timeouts:
request: 300s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is not part of the first HttpRoute llm-llama-route that maps to InferencePool vllm-llama3-8b-instruct?

- --max-loras
- "2"
- --lora-modules
- '{"name": "food-review"}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can call the lora adapters with completely different names to avoid confusion?
(the original deployment has food-review-1).

@elevran
Copy link
Contributor

elevran commented Nov 19, 2025

@davidbreitgand minor addition (letting @nirrozenbaum drive the review)
please consider changing the documentation comment on the PR's description to better reflect the change from a user's perspective.

### Serving multiple LoRAs per base AI model

<div style="border: 1px solid red; padding: 10px; border-radius: 5px;">
⚠️ Known Limitation : LoRA names must be unique across the base AI models (i.e., across the backend inference server deployments)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Known limitation almost implies its wrong in some way... can we just drop the limitation part?

<div style="border: 1px solid red; padding: 10px; border-radius: 5px;">
⚠️ Known Limitation :
[Kubernetes API Gateway limits the total number of matchers per HTTPRoute to be less than 128](https://github.com/kubernetes-sigs/gateway-api/blob/df8c96c254e1ac6d5f5e0d70617f36143723d479/apis/v1/httproute_types.go#L128).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```
2. Send a few requests to the LoRA of the Llama model as follows:
```bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting is strange here in the preview also.

}'
```
2. Send a few requests to the LoRA of the Llama model as follows:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest just using 1. for all ordered list entries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update the serving multiple AI models guide with multiple LoRA example

5 participants