You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pages/generative-apis/troubleshooting/fixing-common-issues.mdx
+1-14Lines changed: 1 addition & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,6 +17,7 @@ Below are common issues that you may encounter when using Generative APIs, their
17
17
18
18
### Solution
19
19
- Reduce your input size below what is [supported by the model](/generative-apis/reference-content/supported-models/).
20
+
- If you are using a third party tool such as IDEs, you should edit their configuration to set an appropriate maximum context window for the model. More information for [VS Code (Continue)](/generative-apis/reference-content/adding-ai-to-vscode-using-continue/#configure-continue-through-a-configuration-file), [IntelliJ (Continue)](/generative-apis/reference-content/adding-ai-to-intellij-using-continue/#configure-continue-through-configuration-file) and [Zed](/generative-apis/reference-content/adding-ai-to-zed-ide/).
20
21
- Use a model supporting longer context window values.
21
22
- Use [Managed Inference](/managed-inference/), where the context window can be increased for [several configurations with additional GPU vRAM](/managed-inference/reference-content/supported-models/). For instance, `llama-3.3-70b-instruct` model in `fp8` quantization can be served with:
22
23
-`15k` tokens context window on `H100` Instances
@@ -51,20 +52,6 @@ Below are common issues that you may encounter when using Generative APIs, their
51
52
52
53
## 416: Range Not Satisfiable - max_completion_tokens is limited for this model
53
54
54
-
### Cause
55
-
- You provided a value for `max_completion_tokens` that is too high and not supported by the model you are using.
56
-
57
-
### Solution
58
-
- Remove `max_completion_tokens` field from your request or client library, or reduce its value below what is [supported by the model](https://www.scaleway.com/en/docs/generative-apis/reference-content/supported-models/).
59
-
- As an example, when using the [init_chat_model from Langchain](https://python.langchain.com/api_reference/_modules/langchain/chat_models/base.html#init_chat_model), you should edit the `max_tokens` value in the following configuration:
- Use a model supporting higher `max_completion_tokens` value.
64
-
- Use [Managed Inference](/managed-inference/), where these limits on completion tokens do not apply (your completion tokens amount will still be limited by the maximum context window supported by the model).
65
-
66
-
## 416: Range Not Satisfiable - max_completion_tokens is limited for this model
67
-
68
55
### Cause
69
56
- You provided a value for `max_completion_tokens` which is too high, and not supported by the model you are using.
0 commit comments