Skip to content

EPP: Not Routing Based on InferenceObjective #1339

@danehans

Description

@danehans

EPP is not routing based solely on InferenceObjective:

$ k get svc -n inf-ext-e2e
NAME                          TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)          AGE
inference-gateway             LoadBalancer   10.96.55.22    172.18.255.0   8080:32262/TCP   7h33m

$ kubectl exec -n curl po/curl -- curl -i "10.96.55.22:8080/v1/completions" -H 'Content-Type: application/json' -d '{"model": "food-review","prompt": "Write as if you were a critic: San Francisco","max_tokens": 100,"temperature": 0}'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   233    0   117  100   116   8778   8703 --:--:-- --:--:-- --:--:-- 19416
HTTP/1.1 404 Not Found
x-envoy-upstream-service-time: 0
x-went-into-resp-headers: true
server: envoy
date: Fri, 08 Aug 2025 23:18:21 GMT
content-type: application/json
transfer-encoding: chunked

{"code":404,"message":"The model `food-review` does not exist.","object":"error","param":null,"type":"NotFoundError"}

This is because infObjective := d.datastore.ObjectiveGet(reqCtx.ObjectiveKey) returns nil when HandleRequest() is called in pkg/epp/requestcontrol/director.go.

When the x-gateway-inference-objective header is set to the model name, routing works as expected.

$ kubectl exec -n curl po/curl -- curl -i "10.96.55.22:8080/v1/completions" -H 'x-gateway-inference-objective: food-review' -H 'Content-Type: application/json' -d '{"model": "food-review","prompt": "Write as if you were a critic: San Francisco","max_tokens": 100,"temperature": 0}'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   538    0   422  100   116  25114   6903 --:--:-- --:--:-- --:--:-- 33625
HTTP/1.1 200 OK
x-envoy-upstream-service-time: 1
x-went-into-resp-headers: true
server: envoy
date: Fri, 08 Aug 2025 23:21:09 GMT
content-type: application/json
transfer-encoding: chunked

{"choices":[{"finish_reason":"stop","index":0,"text":"To be or not to be that is the question."}],"created":1754695269,"do_remote_decode":false,"do_remote_prefill":false,"id":"chatcmpl-ecf34cf3-2b55-4540-905b-cb35e0cc2fa5","model":"food-review-1","object":"text_completion","remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"usage":{"completion_tokens":11,"prompt_tokens":9,"total_tokens":20}}%

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions