diff --git a/pages/generative-apis/troubleshooting/fixing-common-issues.mdx b/pages/generative-apis/troubleshooting/fixing-common-issues.mdx index db78589a3e..f24c25791d 100644 --- a/pages/generative-apis/troubleshooting/fixing-common-issues.mdx +++ b/pages/generative-apis/troubleshooting/fixing-common-issues.mdx @@ -13,6 +13,19 @@ dates: Below are common issues that you may encounter when using Generative APIs, their causes, and recommended solutions. +## 400: Bad Request - You exceeded maximum context window for this model + +### Cause +- You provided an input exceeding the maximum context window (also known as context length) for the model you are using. +- You provided a long input and requested a long input (in `max_completion_tokens` field), which added together, exceed the maximum context window of the model you are using. + +### Solution +- Reduce your input size below what is [supported by the model](/generative-apis/reference-content/supported-models/). +- Use a model supporting longer context window values. +- Use [Managed Inference](/managed-inference/), where the context window can be increased for [several configurations with additional GPU vRAM](/managed-inference/reference-content/supported-models/). For instance, `llama-3.3-70b-instruct` model in `fp8` quantization can be served with: + - `15k` tokens context window on `H100` instances + - `128k` tokens context window on `H100-2` instances. + ## 403: Forbidden - Insufficient permissions to access the resource ### Cause @@ -27,6 +40,34 @@ Below are common issues that you may encounter when using Generative APIs, their - The URL format is: `https://api.scaleway.ai/{project_id}/v1"` - If no `project_id` is specified in the URL (`https://api.scaleway.ai/v1"`), your `default` Project will be used. +## 416: Range Not Satisfiable - max_completion_tokens is limited for this model + +### Cause +- You provided a value for `max_completion_tokens` that is too high and not supported by the model you are using. + +### Solution +- Remove `max_completion_tokens` field from your request or client library, or reduce its value below what is [supported by the model](https://www.scaleway.com/en/docs/generative-apis/reference-content/supported-models/). + - As an example, when using the [init_chat_model from Langchain](https://python.langchain.com/api_reference/_modules/langchain/chat_models/base.html#init_chat_model), you should edit the `max_tokens` value in the following configuration: + ```python + llm = init_chat_model("llama-3.3-70b-instruct", max_tokens="8000", model_provider="openai", base_url="https://api.scaleway.ai/v1", temperature=0.7) + ``` +- Use a model supporting higher `max_completion_tokens` value. +- Use [Managed Inference](/managed-inference/), where these limits on completion tokens do not apply (your completion tokens amount will still be limited by the maximum context window supported by the model). + +## 416: Range Not Satisfiable - max_completion_tokens is limited for this model + +### Cause +- You provided `max_completion_tokens` value too high, that is not supported by the model you are using. + +### Solution +- Remove the `max_completion_tokens` field from your request or client library, or reduce its value below what is [supported by the model](https://www.scaleway.com/en/docs/generative-apis/reference-content/supported-models/). + - As an example, when using the [init_chat_model from Langchain](https://python.langchain.com/api_reference/_modules/langchain/chat_models/base.html#init_chat_model), you should edit the `max_tokens` value in the following configuration: + ```python + llm = init_chat_model("llama-3.3-70b-instruct", max_tokens="8000", model_provider="openai", base_url="https://api.scaleway.ai/v1", temperature=0.7) + ``` +- Use a model supporting a higher `max_completion_tokens` value. +- Use [Managed Inference](/managed-inference/), where these limits on completion tokens do not apply (your completion tokens amount will still be limited by the maximum context window supported by the model). + ## 429: Too Many Requests - You exceeded your current quota of requests/tokens per minute ### Cause