Skip to content

Conversation

davidbreitgand
Copy link
Owner

…ional name for LoRA: model-name/lora/lora-name

  1. The LoRA name used by a client: /lora/
  2. The "lora" keyword separator can also be defined in LORA_TAG environment variable
  3. Known issue: It is better to use an "official" OpenAI struct for unmarshalling

… /pkg/epp/datalayer/factory.go:28:2: no required module provides package sigs.k8s.io/gateway-api-inference-extension/api/v1
One liner: add COPY api ./api command to resolve import dependency in…
…ional name for LoRA: model-name/lora/lora-name
logger.V(logutil.DEFAULT).Info("Orig: " + string(orig))

if idx := bytes.Index(orig, []byte(loraTag)); idx != -1 {
lastSlash := bytes.LastIndex(orig[:idx], []byte("/"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this assuming adapter name doesn't contain /?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I take everything before loraTag as prefix and everything after loraTag as suffix.

// skip the slash itself by adding +1 so suffix doesn't start with '/'
suffix = afterTag[nextSlash+1:]
logger.V(logutil.DEFAULT).Info("Model name after mutation:" + string(suffix))
requestBody.Model = string(suffix)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should update the content-length header when changing anything in the content

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

Copy link

@elevran elevran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review comments provided

hub: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension
tag: main
pullPolicy: Always
#tag: main
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments

  • while fine for local development, these should not be part of the PR. The charts should continue to work with main.
  • also for local dev, you want Always (even when running on kind), since you want newly built images to download to the workers (from Kind, not docker)

backendRefs:
- name: deepseek-r1
group: inference.networking.x-k8s.io
kind: InferencePool No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing newline at EoF

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what YAMLs are used by the current guides? Those are not checked in so you created a new one?
If so, might want to refer to the same file in the multi-model guide.

Same applies to other YAMLs in this PR

"bytes"
"context"
"encoding/json"
"os"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a IGW package for interacting with env, including setting of defaults, casting to types, etc.

return nil, err
}

//The reason for this additional unmarshal is that I change the model name and then re-marshal RequestBody struct. But it has only one field, and I need to preserve original message at re-marshalling.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space after // in comments.
Applies to multiple places.


//The reason for this additional unmarshal is that I change the model name and then re-marshal RequestBody struct. But it has only one field, and I need to preserve original message at re-marshalling.
//This can be done more efficiently if a full "official" struct by OpenAI is used. In OpenAI v2 it should be ChatCompletionNewParams
var raw map[string]json.RawMessage
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could unmarshal from/to map[string]any and not lose the info.
Also, if there's a standard (or commonly used) OpenAI parsing package, suggest PR converting BBR to using that instead of custom code.

//Convention: [model-family]/<model-name>/lora/<lora-name>
//Model name definition (the vLLM side) does not change: <my-arbitrary-lora-name>
//Model name in request (the client side): lora-name (no change from before)
loraTag := os.Getenv("LORA_TAG") //set via environment
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be preferred, IMO, to make that change in a separate function and not mutate the current implementation.
also, using a callback would be more aligned with the expectations of readers given the structure and use of plugins in EPP.

loraTag = "lora"
}

orig := []byte(requestBody.Model)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be easier to parse using strings.Split by casting byte array to string.

suffix = afterTag[nextSlash+1:]
logger.V(logutil.DEFAULT).Info("Model name after mutation:" + string(suffix))
requestBody.Model = string(suffix)
// update only the "model" field in the original raw map so other fields (e.g. prompt) are preserved
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain the convention in the PR message?
I think you're expecting model to be in the format of <some-base-model-to-dispatch-to> <tag (e.g., /lora)> <model-name-as-known-by-vllm>.
Is that correct?

requestBody.Model = string(suffix)
// update only the "model" field in the original raw map so other fields (e.g. prompt) are preserved
modelBytes, merr := json.Marshal(requestBody.Model)
if merr != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not reuse err instead of merr and merr2?

//Mutate model name if it contains reserved keyword "lora" indicating that the requested model is lora (served from the same vLLM as the base model and from the same inferencepool)
//Convention: [model-family]/<model-name>/lora/<lora-name>
//Model name definition (the vLLM side) does not change: <my-arbitrary-lora-name>
//Model name in request (the client side): lora-name (no change from before)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is this comment (line 84) accurate ? This is saying the client side has no change from before but I think in this feature you are proposing the client prompt has to have the model name in the format you listed in line 82 which will be a change for most clients I believe.
  2. Also it seems that clients will be required to know the exact backend LORA name which they may not usually know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants