Skip to content

Commit fe87eed

Browse files
authored
Make solutions bold and clear
1 parent c271df2 commit fe87eed

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

site-src/guides/troubleshooting.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,12 @@ This guide provides troubleshooting steps and solutions for common issues encoun
77
### `model not found in request body` or `prompt not found in request`
88
If the OpenAI API endpoint you're using isn't working as expected, the issue might be related to the request body format. The endpoint picker (EPP) assumes that if a request is a POST, its body must contain the `model` and `prompt` fields. This is because the gateway currently assumes the requests are for Large Language Models (LLMs).
99

10-
Solution: Make sure your request body contains the missing field.
10+
**Solution**: Make sure your request body contains the missing field.
1111

1212
## 404 Not Found
1313
This is a default gateway error, meaning the request never reached a backend service. This usually means that there is no HTTPRoute configured to match the request path (e.g. /v1/completions). The gateway doesn't know where to send the traffic.
1414

15-
Solution: Ensure you have an HTTPRoute resource deployed that specifies the correct host, path, and backendRef to your InferencePool.
15+
**Solution**: Ensure you have an HTTPRoute resource deployed that specifies the correct host, path, and backendRef to your InferencePool.
1616

1717
## 429 Too Many Requests
1818
### `system saturated, sheddable request dropped`
@@ -31,7 +31,7 @@ This error indicates that the entire request pool has exceeded its saturation th
3131
### `fault filter abort`
3232
This internal error suggests a misconfiguration in the gateway's backend routing. Your HTTPRoute is configured to point to an InferencePool that does not exist or cannot be found by the gateway. The gateway recognizes the route but fails when trying to send traffic to the non-existent backend.
3333

34-
Solution: Verify that the backendRef in your HTTPRoute correctly names an InferencePool resource that is deployed and accessible in the same namespace. If you wish to route to an InferencePool in a different namespace, you can create a `ReferenceGrant` like below:
34+
**Solution**: Verify that the backendRef in your HTTPRoute correctly names an InferencePool resource that is deployed and accessible in the same namespace. If you wish to route to an InferencePool in a different namespace, you can create a `ReferenceGrant` like below:
3535

3636
```
3737
apiVersion: gateway.networking.k8s.io/v1beta1
@@ -54,25 +54,25 @@ spec:
5454
### `upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: Connection refused`
5555
This error indicates that the gateway successfully identified the correct model server pod but failed to establish a connection to it. This is likely caused by the port number specified in the InferencePool's configuration doesn't match the port your model server is listening on. The gateway tries to connect to the wrong port and is refused.
5656

57-
Solution: Verify the port specified in your InferencePool matches the port number exposed by your model server container, and update your InferencePool accordingly.
57+
**Solution**: Verify the port specified in your InferencePool matches the port number exposed by your model server container, and update your InferencePool accordingly.
5858

5959
### `no healthy upstream`
6060
This error indicates that the HTTPRoute and InferencePool are correctly configured, but there are no healthy pods in the pool to route traffic to. This can happen if the pods are crashing, still starting up, or failing their health checks.
6161

62-
Solution: Check the status of your model server pods. Investigate the pod logs for any startup errors or health check failures. Ensure your model server is running and listening on the correct port and that any configured healthchecks / readiness probes are succeeding.
62+
**Solution**: Check the status of your model server pods. Investigate the pod logs for any startup errors or health check failures. Ensure your model server is running and listening on the correct port and that any configured healthchecks / readiness probes are succeeding.
6363

6464
## The endpoint picker (EPP) Crashlooping
6565
When EPP is crashlooping, check the logs of your EPP pod. Some common errors include:
6666

6767
### `failed to list <InferencePool or InferenceObjective or Pod>: … is forbidden`
6868
The EPP needs to watch the InferencePool, InferenceObjectives and Pods that belong to them. This constant watching and reconciliation allows the EPP to maintain an up-to-date view of the environment, enabling it to make dynamic decisions. This particular error indicates that the service account used by the EPP doesn't have the necessary permissions to list the resources it’s watching.
6969

70-
Solution: Create or update the RBAC configuration to grant the [required permissions](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/137a0b4660b96487caac626ed135b3600be876ed/config/manifests/inferencepool-resources.yaml#L129) to the EPP service account.
70+
**Solution**: Create or update the RBAC configuration to grant the [required permissions](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/137a0b4660b96487caac626ed135b3600be876ed/config/manifests/inferencepool-resources.yaml#L129) to the EPP service account.
7171

7272
### `Pool is not initialized, skipping refreshing metrics`
7373
This error indicates that the Inference Pool pods are not initialized.
7474

75-
Solution: Check the EPP start up argument `--pool-name` has the correct InferencePool name specified and the InferencePool exists.
75+
**Solution**: Check the EPP start up argument `--pool-name` has the correct InferencePool name specified and the InferencePool exists.
7676

7777
## Unexpected Routing Behaviors
7878
The EPP's core function is to intelligently route requests to the most optimal model server pod in a pool. It uses a score-based algorithm that considers several metrics (such as queue depth, KV cache utilization, etc.) to choose the best pod for each request.
@@ -87,4 +87,4 @@ For more information, check out [EPP scale testing](https://docs.google.com/docu
8787

8888
When performance degrades under high load (for example high-latency tail or significantly lower-than-expected successful QPS) with underutilized resources, the issue may be related to excessive logging in the endpoint picker (EPP). Higher verbosity levels (e.g., `--v=2` or greater) generate a large volume of logs. This floods the log buffer and standard output, leading to heavy writelock contention. In extreme cases, this can cause the kubelet to kill the pod due to health check timeouts, leading to a restart cycle.
8989

90-
Solution: Ensure log level for the EPP is set to `--v=1`.
90+
**Solution**: Ensure log level for the EPP is set to `--v=1`.

0 commit comments

Comments
 (0)