feat(inference): update whisper properties

fpagny · web-flow · commit e2a8ddb6ff7c · 2025-10-24T15:21:07.000+02:00
diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -17,6 +17,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | Model name | Provider | Maximum Context length (tokens) | Modalities | Compatible Instances (Max Context in tokens\*) | License |
 |------------|----------|--------------|------------|-----------|---------|
 | [`gpt-oss-120b`](#gpt-oss-120b) | OpenAI | 128k | Text | H100 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`whisper-large-v3`](#whisper-large-v3) | OpenAI | - | Audio transcription | L4, L40S, H100, H100-SXM-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`qwen3-235b-a22b-instruct-2507`](#qwen3-235b-a22b-instruct-2507) | Qwen | 40k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`gemma-3-27b-it`](#gemma-3-27b-it) | Google | 40k | Text, Vision | H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) |
 | [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | 128k | Text | H100 (15k), H100-2 | [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) |
@@ -48,6 +49,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | Model name | Structured output supported | Function calling | Supported languages |
 | --- | --- | --- | --- |
 | `gpt-oss-120b` | Yes | Yes | English |
+| `whisper-large-v3` | - | - | English, French, German, Chinese, Japanese, Korean and 81 additional languages  |
 | `qwen3-235b-a22b-instruct-2507` | Yes | Yes | English, French, German, Chinese, Japanese, Korean and 113 additional languages and dialects |
 | `gemma-3-27b-it` | Yes | Partial | English, Chinese, Japanese, Korean and 31 additional languages |
 | `llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
@@ -192,6 +194,26 @@ mistral/voxtral-small-24b-2507:fp8
   - If audio sent is less than 30 seconds, the rest of the chunk will be considered silent. 
   - 80ms is equal to 1 input token
 
+## Audio transcription models
+
+### Whisper-large-v3
+Whisper-large-v3 is a model developed by OpenAI to perform audio transcription on many languages.
+This model is optimized for transcription in many languages.
+
+| Attribute | Value |
+|-----------|-------|
+| Supported audio formats | WAV and MP3 |
+| Audio chunk duration | 30 seconds |
+
+#### Model names
+```
+openai/whisper-large-v3:bf16
+```
+
+- Mono and stereo audio formats are supported. For stereo formats, both left and right channels are merged before being processed.
+- Audio files are processed in 30 seconds chunks:
+  - If audio sent is less than 30 seconds, the rest of the chunk will be considered silent. 
+
 ## Text models
 
 ### Qwen3-235b-a22b-instruct-2507