scaleway · bene2k1 · Apr 18, 2025 · Apr 16, 2025 · Apr 18, 2025
@@ -13,6 +13,19 @@ dates:
 
 Below are common issues that you may encounter when using Generative APIs, their causes, and recommended solutions.
 
+## 400: Bad Request - You exceeded maximum context window for this model
+
+### Cause
+- You provided an input exceeding the maximum context window (also known as context length) for the model you are using. 
+- You provided a long input and requested a long input (in `max_completion_tokens` field), which added together, exceed the maximum context window of the model you are using. 
+
+### Solution
+- Reduce your input size below what is [supported by the model](/generative-apis/reference-content/supported-models/). 
+- Use a model supporting longer context window values.
+- Use [Managed Inference](/managed-inference/), where the context window can be increased for [several configurations with additional GPU vRAM](/managed-inference/reference-content/supported-models/). For instance, `llama-3.3-70b-instruct` model in `fp8` quantization can be served with:
+  - `15k` tokens context window on `H100` instances
+  - `128k` tokens context window on `H100-2` instances.
+
 ## 403: Forbidden - Insufficient permissions to access the resource
 
 ### Cause
@@ -27,6 +40,34 @@ Below are common issues that you may encounter when using Generative APIs, their
   - The URL format is: `https://api.scaleway.ai/{project_id}/v1"`
   - If no `project_id` is specified in the URL (`https://api.scaleway.ai/v1"`), your `default` Project will be used.
 
+## 416: Range Not Satisfiable - max_completion_tokens is limited for this model
+
+### Cause
+- You provided a value for `max_completion_tokens` that is too high and not supported by the model you are using.
+
+### Solution
+- Remove `max_completion_tokens` field from your request or client library, or reduce its value below what is [supported by the model](https://www.scaleway.com/en/docs/generative-apis/reference-content/supported-models/). 
+  - As an example, when using the [init_chat_model from Langchain](https://python.langchain.com/api_reference/_modules/langchain/chat_models/base.html#init_chat_model), you should edit the `max_tokens` value in the following configuration:
+    ```python
+    llm = init_chat_model("llama-3.3-70b-instruct", max_tokens="8000", model_provider="openai", base_url="https://api.scaleway.ai/v1", temperature=0.7)
+    ```
+- Use a model supporting higher `max_completion_tokens` value.
+- Use [Managed Inference](/managed-inference/), where these limits on completion tokens do not apply (your completion tokens amount will still be limited by the maximum context window supported by the model).
+
+## 416: Range Not Satisfiable - max_completion_tokens is limited for this model
+
+### Cause
+- You provided `max_completion_tokens` value too high, that is not supported by the model you are using.
+
+### Solution
+- Remove the `max_completion_tokens` field from your request or client library, or reduce its value below what is [supported by the model](https://www.scaleway.com/en/docs/generative-apis/reference-content/supported-models/). 
+  - As an example, when using the [init_chat_model from Langchain](https://python.langchain.com/api_reference/_modules/langchain/chat_models/base.html#init_chat_model), you should edit the `max_tokens` value in the following configuration:
+    ```python
+    llm = init_chat_model("llama-3.3-70b-instruct", max_tokens="8000", model_provider="openai", base_url="https://api.scaleway.ai/v1", temperature=0.7)
+    ```
+- Use a model supporting a higher `max_completion_tokens` value.
+- Use [Managed Inference](/managed-inference/), where these limits on completion tokens do not apply (your completion tokens amount will still be limited by the maximum context window supported by the model). 
+
 ## 429: Too Many Requests - You exceeded your current quota of requests/tokens per minute
 
 ### Cause