Provide a way not to double tokenize the prompt in case of the token-aware KV routing

**What would you like to be added**:
A need a way to have a plugin which can modify the request body when the EPP picks the worker. 

Not sure what the best way to go about it. I envision:
1. The director creates the LLMRequest once (with an empty map) and hands the same object through scheduling so the scorer’s annotation sticks around. It needs to have an extra `annotations` field.
3. The EPP routing plugin fills in req.Annotations[tokenDataAnnotationKey] = <tokenized prompt>
4. After scheduling succeeds, runPreRequestPlugins invokes each plugin and (for mutators) passes the live body map. It needs to include the body mutations. 
5. Another pre-request plugin pkg/epp/requestcontrol/plugins/new/plugin.go reads this body and copies the data into the body to send to the workers.
6. Once the director returns, StreamingServer.Process marshals reqCtx.Request.Body (now containing token_data) and stores it in the ext-proc responses that go back to Envoy. 
7. From there Envoy forwards those mutated bytes downstream. In short: router plugin → request annotations → PreRequest mutator → marshalled body → Envoy → worker.
8. 
**Why is this needed**:

I have an EPP routing plugin which does token-aware kv-routing. It uses the model's tokenizer. My plugin identifies the best worker but also tokenizes the prompt in the process. I want to be able to pass these tokens to the serving workers in the request body. Otherwise my workers will de-tokenize again. This would introduce additional latency.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide a way not to double tokenize the prompt in case of the token-aware KV routing #1875

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Provide a way not to double tokenize the prompt in case of the token-aware KV routing #1875

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions