generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 195
Open
Labels
needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.
Description
What would you like to be added:
A need a way to have a plugin which can modify the request body when the EPP picks the worker.
Not sure what the best way to go about it. I envision:
- The director creates the LLMRequest once (with an empty map) and hands the same object through scheduling so the scorer’s annotation sticks around. It needs to have an extra
annotationsfield. - The EPP routing plugin fills in req.Annotations[tokenDataAnnotationKey] =
- After scheduling succeeds, runPreRequestPlugins invokes each plugin and (for mutators) passes the live body map. It needs to include the body mutations.
- Another pre-request plugin pkg/epp/requestcontrol/plugins/new/plugin.go reads this body and copies the data into the body to send to the workers.
- Once the director returns, StreamingServer.Process marshals reqCtx.Request.Body (now containing token_data) and stores it in the ext-proc responses that go back to Envoy.
- From there Envoy forwards those mutated bytes downstream. In short: router plugin → request annotations → PreRequest mutator → marshalled body → Envoy → worker.
Why is this needed:
I have an EPP routing plugin which does token-aware kv-routing. It uses the model's tokenizer. My plugin identifies the best worker but also tokenizes the prompt in the process. I want to be able to pass these tokens to the serving workers in the request body. Otherwise my workers will de-tokenize again. This would introduce additional latency.
Metadata
Metadata
Assignees
Labels
needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.