You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: site-src/guides/troubleshooting.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,12 +7,12 @@ This guide provides troubleshooting steps and solutions for common issues encoun
7
7
### `model not found in request body` or `prompt not found in request`
8
8
If the OpenAI API endpoint you're using isn't working as expected, the issue might be related to the request body format. The endpoint picker (EPP) assumes that if a request is a POST, its body must contain the `model` and `prompt` fields. This is because the gateway currently assumes the requests are for Large Language Models (LLMs).
9
9
10
-
Solution: Make sure your request body contains the missing field.
10
+
**Solution**: Make sure your request body contains the missing field.
11
11
12
12
## 404 Not Found
13
13
This is a default gateway error, meaning the request never reached a backend service. This usually means that there is no HTTPRoute configured to match the request path (e.g. /v1/completions). The gateway doesn't know where to send the traffic.
14
14
15
-
Solution: Ensure you have an HTTPRoute resource deployed that specifies the correct host, path, and backendRef to your InferencePool.
15
+
**Solution**: Ensure you have an HTTPRoute resource deployed that specifies the correct host, path, and backendRef to your InferencePool.
16
16
17
17
## 429 Too Many Requests
18
18
### `system saturated, sheddable request dropped`
@@ -31,7 +31,7 @@ This error indicates that the entire request pool has exceeded its saturation th
31
31
### `fault filter abort`
32
32
This internal error suggests a misconfiguration in the gateway's backend routing. Your HTTPRoute is configured to point to an InferencePool that does not exist or cannot be found by the gateway. The gateway recognizes the route but fails when trying to send traffic to the non-existent backend.
33
33
34
-
Solution: Verify that the backendRef in your HTTPRoute correctly names an InferencePool resource that is deployed and accessible in the same namespace. If you wish to route to an InferencePool in a different namespace, you can create a `ReferenceGrant` like below:
34
+
**Solution**: Verify that the backendRef in your HTTPRoute correctly names an InferencePool resource that is deployed and accessible in the same namespace. If you wish to route to an InferencePool in a different namespace, you can create a `ReferenceGrant` like below:
35
35
36
36
```
37
37
apiVersion: gateway.networking.k8s.io/v1beta1
@@ -54,25 +54,25 @@ spec:
54
54
### `upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: Connection refused`
55
55
This error indicates that the gateway successfully identified the correct model server pod but failed to establish a connection to it. This is likely caused by the port number specified in the InferencePool's configuration doesn't match the port your model server is listening on. The gateway tries to connect to the wrong port and is refused.
56
56
57
-
Solution: Verify the port specified in your InferencePool matches the port number exposed by your model server container, and update your InferencePool accordingly.
57
+
**Solution**: Verify the port specified in your InferencePool matches the port number exposed by your model server container, and update your InferencePool accordingly.
58
58
59
59
### `no healthy upstream`
60
60
This error indicates that the HTTPRoute and InferencePool are correctly configured, but there are no healthy pods in the pool to route traffic to. This can happen if the pods are crashing, still starting up, or failing their health checks.
61
61
62
-
Solution: Check the status of your model server pods. Investigate the pod logs for any startup errors or health check failures. Ensure your model server is running and listening on the correct port and that any configured healthchecks / readiness probes are succeeding.
62
+
**Solution**: Check the status of your model server pods. Investigate the pod logs for any startup errors or health check failures. Ensure your model server is running and listening on the correct port and that any configured healthchecks / readiness probes are succeeding.
63
63
64
64
## The endpoint picker (EPP) Crashlooping
65
65
When EPP is crashlooping, check the logs of your EPP pod. Some common errors include:
66
66
67
67
### `failed to list <InferencePool or InferenceObjective or Pod>: … is forbidden`
68
68
The EPP needs to watch the InferencePool, InferenceObjectives and Pods that belong to them. This constant watching and reconciliation allows the EPP to maintain an up-to-date view of the environment, enabling it to make dynamic decisions. This particular error indicates that the service account used by the EPP doesn't have the necessary permissions to list the resources it’s watching.
69
69
70
-
Solution: Create or update the RBAC configuration to grant the [required permissions](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/137a0b4660b96487caac626ed135b3600be876ed/config/manifests/inferencepool-resources.yaml#L129) to the EPP service account.
70
+
**Solution**: Create or update the RBAC configuration to grant the [required permissions](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/137a0b4660b96487caac626ed135b3600be876ed/config/manifests/inferencepool-resources.yaml#L129) to the EPP service account.
71
71
72
72
### `Pool is not initialized, skipping refreshing metrics`
73
73
This error indicates that the Inference Pool pods are not initialized.
74
74
75
-
Solution: Check the EPP start up argument `--pool-name` has the correct InferencePool name specified and the InferencePool exists.
75
+
**Solution**: Check the EPP start up argument `--pool-name` has the correct InferencePool name specified and the InferencePool exists.
76
76
77
77
## Unexpected Routing Behaviors
78
78
The EPP's core function is to intelligently route requests to the most optimal model server pod in a pool. It uses a score-based algorithm that considers several metrics (such as queue depth, KV cache utilization, etc.) to choose the best pod for each request.
@@ -87,4 +87,4 @@ For more information, check out [EPP scale testing](https://docs.google.com/docu
87
87
88
88
When performance degrades under high load (for example high-latency tail or significantly lower-than-expected successful QPS) with underutilized resources, the issue may be related to excessive logging in the endpoint picker (EPP). Higher verbosity levels (e.g., `--v=2` or greater) generate a large volume of logs. This floods the log buffer and standard output, leading to heavy writelock contention. In extreme cases, this can cause the kubelet to kill the pod due to health check timeouts, leading to a restart cycle.
89
89
90
-
Solution: Ensure log level for the EPP is set to `--v=1`.
90
+
**Solution**: Ensure log level for the EPP is set to `--v=1`.
0 commit comments