generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 182
Closed
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.
Milestone
Description
EPP is not routing based solely on InferenceObjective:
$ k get svc -n inf-ext-e2e
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
inference-gateway LoadBalancer 10.96.55.22 172.18.255.0 8080:32262/TCP 7h33m
$ kubectl exec -n curl po/curl -- curl -i "10.96.55.22:8080/v1/completions" -H 'Content-Type: application/json' -d '{"model": "food-review","prompt": "Write as if you were a critic: San Francisco","max_tokens": 100,"temperature": 0}'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 233 0 117 100 116 8778 8703 --:--:-- --:--:-- --:--:-- 19416
HTTP/1.1 404 Not Found
x-envoy-upstream-service-time: 0
x-went-into-resp-headers: true
server: envoy
date: Fri, 08 Aug 2025 23:18:21 GMT
content-type: application/json
transfer-encoding: chunked
{"code":404,"message":"The model `food-review` does not exist.","object":"error","param":null,"type":"NotFoundError"}
This is because infObjective := d.datastore.ObjectiveGet(reqCtx.ObjectiveKey)
returns nil
when HandleRequest()
is called in pkg/epp/requestcontrol/director.go
.
When the x-gateway-inference-objective
header is set to the model name, routing works as expected.
$ kubectl exec -n curl po/curl -- curl -i "10.96.55.22:8080/v1/completions" -H 'x-gateway-inference-objective: food-review' -H 'Content-Type: application/json' -d '{"model": "food-review","prompt": "Write as if you were a critic: San Francisco","max_tokens": 100,"temperature": 0}'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 538 0 422 100 116 25114 6903 --:--:-- --:--:-- --:--:-- 33625
HTTP/1.1 200 OK
x-envoy-upstream-service-time: 1
x-went-into-resp-headers: true
server: envoy
date: Fri, 08 Aug 2025 23:21:09 GMT
content-type: application/json
transfer-encoding: chunked
{"choices":[{"finish_reason":"stop","index":0,"text":"To be or not to be that is the question."}],"created":1754695269,"do_remote_decode":false,"do_remote_prefill":false,"id":"chatcmpl-ecf34cf3-2b55-4540-905b-cb35e0cc2fa5","model":"food-review-1","object":"text_completion","remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"usage":{"completion_tokens":11,"prompt_tokens":9,"total_tokens":20}}%
Metadata
Metadata
Assignees
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.