fix(genapi): update model context length (#5715)

fpagny · web-flow · commit 2b5d57e7298e · 2025-10-27T15:54:39.000+01:00
* fix(genapi): update model context length

* fix(inference): qwen 235b context length

* fix(genapi): bge-multilingual-gemma2 context window

* fix(inference): bge-multilingual-gemma2 context window
diff --git a/pages/generative-apis/reference-content/supported-models.mdx b/pages/generative-apis/reference-content/supported-models.mdx
@@ -39,7 +39,7 @@ Our API supports the most popular models for [Chat](/generative-apis/how-to/quer
 | Meta        | `llama-3.3-70b-instruct`  | 100k  | 4096 | [Llama 3.3 Community](https://www.llama.com/llama3_3/license/) | [HF](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) |
 | Meta        | `llama-3.1-8b-instruct`  | 128k  | 16384 | [Llama 3.1 Community](https://llama.meta.com/llama3_1/license/) | [HF](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) |
 | Mistral      | `mistral-nemo-instruct-2407`   | 128k | 8192 | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) |
-| Qwen      | `qwen3-235b-a22b-instruct-2507`     | 260k | 8192 | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507) |
+| Qwen      | `qwen3-235b-a22b-instruct-2507`     | 250k | 8192 | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507) |
 | Qwen      | `qwen3-coder-30b-a3b-instruct`     | 128k | 8192 | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) |
 | DeepSeek  | `deepseek-r1-distill-llama-70b`     | 32k | 4096 | [MIT](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md) | [HF](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) |
 
@@ -63,7 +63,7 @@ Our [Embeddings API](/generative-apis/how-to/query-embedding-models) provides bu
 
 | Provider | Model string | Model size | Embedding dimension | Context window |  License | Model card |
 |-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|
-| BAAI        | `bge-multilingual-gemma2`  | 9B  | 3584 | 4096 | [Gemma](https://ai.google.dev/gemma/terms) | [HF](https://huggingface.co/BAAI/bge-multilingual-gemma2) |
+| BAAI        | `bge-multilingual-gemma2`  | 9B  | 3584 | 8192 | [Gemma](https://ai.google.dev/gemma/terms) | [HF](https://huggingface.co/BAAI/bge-multilingual-gemma2) |
 
 ## Request a model
 
diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -18,7 +18,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 |------------|----------|--------------|------------|-----------|---------|
 | [`gpt-oss-120b`](#gpt-oss-120b) | OpenAI | 128k | Text | H100 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`whisper-large-v3`](#whisper-large-v3) | OpenAI | - | Audio transcription | L4, L40S, H100, H100-SXM-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
-| [`qwen3-235b-a22b-instruct-2507`](#qwen3-235b-a22b-instruct-2507) | Qwen | 40k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`qwen3-235b-a22b-instruct-2507`](#qwen3-235b-a22b-instruct-2507) | Qwen | 250k | Text | H100-SXM-2 (40k), H100-SXM-4 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`gemma-3-27b-it`](#gemma-3-27b-it) | Google | 40k | Text, Vision | H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) |
 | [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | 128k | Text | H100 (15k), H100-2 | [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) |
 | [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | Meta | 128k | Text | H100 (15k), H100-2 | [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) |
@@ -40,7 +40,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | [`molmo-72b-0924`](#molmo-72b-0924) | Allen AI | 50k | Text, Vision | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) and [Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)|
 | [`qwen3-coder-30b-a3b-instruct`](#qwen3-coder-30b-a3b-instruct) | Qwen | 128k | Code | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | Qwen | 32k | Code | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
-| [`bge-multilingual-gemma2`](#bge-multilingual-gemma2) |  BAAI | 4k | Embeddings | L4, L40S, H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) |
+| [`bge-multilingual-gemma2`](#bge-multilingual-gemma2) |  BAAI | 8k | Embeddings | L4, L40S, H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) |
 | [`sentence-t5-xxl`](#sentence-t5-xxl) | Sentence transformers | 512 | Embeddings | L4 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 
 \*Maximum context length is only mentioned when instances VRAM size limits context length. Otherwise, maximum context length is the one defined by the model.