Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions pages/generative-apis/troubleshooting/fixing-common-issues.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,19 @@ dates:

Below are common issues that you may encounter when using Generative APIs, their causes, and recommended solutions.

## 400: Bad Request - You exceeded maximum context window for this model

### Cause
- You provided an input exceeding the maximum context window (also known as context length) for the model you are using.
- You provided a long input and requested a long input (in `max_completion_tokens` field), which added together, exceed the maximum context window of the model you are using.

### Solution
- Reduce your input size below what is [supported by the model](/generative-apis/reference-content/supported-models/).
- Use a model supporting longer context window values.
- Use [Managed Inference](/managed-inference/), where the context window can be increased for [several configurations with additional GPU vRAM](/managed-inference/reference-content/supported-models/). For instance, `llama-3.3-70b-instruct` model in `fp8` quantization can be served with:
- `15k` tokens context window on `H100` instances
- `128k` tokens context window on `H100-2` instances.

## 403: Forbidden - Insufficient permissions to access the resource

### Cause
Expand All @@ -27,6 +40,34 @@ Below are common issues that you may encounter when using Generative APIs, their
- The URL format is: `https://api.scaleway.ai/{project_id}/v1"`
- If no `project_id` is specified in the URL (`https://api.scaleway.ai/v1"`), your `default` Project will be used.

## 416: Range Not Satisfiable - max_completion_tokens is limited for this model

### Cause
- You provided a value for `max_completion_tokens` that is too high and not supported by the model you are using.

### Solution
- Remove `max_completion_tokens` field from your request or client library, or reduce its value below what is [supported by the model](https://www.scaleway.com/en/docs/generative-apis/reference-content/supported-models/).
- As an example, when using the [init_chat_model from Langchain](https://python.langchain.com/api_reference/_modules/langchain/chat_models/base.html#init_chat_model), you should edit the `max_tokens` value in the following configuration:
```python
llm = init_chat_model("llama-3.3-70b-instruct", max_tokens="8000", model_provider="openai", base_url="https://api.scaleway.ai/v1", temperature=0.7)
```
- Use a model supporting higher `max_completion_tokens` value.
- Use [Managed Inference](/managed-inference/), where these limits on completion tokens do not apply (your completion tokens amount will still be limited by the maximum context window supported by the model).

## 416: Range Not Satisfiable - max_completion_tokens is limited for this model

### Cause
- You provided `max_completion_tokens` value too high, that is not supported by the model you are using.

### Solution
- Remove the `max_completion_tokens` field from your request or client library, or reduce its value below what is [supported by the model](https://www.scaleway.com/en/docs/generative-apis/reference-content/supported-models/).
- As an example, when using the [init_chat_model from Langchain](https://python.langchain.com/api_reference/_modules/langchain/chat_models/base.html#init_chat_model), you should edit the `max_tokens` value in the following configuration:
```python
llm = init_chat_model("llama-3.3-70b-instruct", max_tokens="8000", model_provider="openai", base_url="https://api.scaleway.ai/v1", temperature=0.7)
```
- Use a model supporting a higher `max_completion_tokens` value.
- Use [Managed Inference](/managed-inference/), where these limits on completion tokens do not apply (your completion tokens amount will still be limited by the maximum context window supported by the model).

## 429: Too Many Requests - You exceeded your current quota of requests/tokens per minute

### Cause
Expand Down