Skip to content

Commit e2a8ddb

Browse files
authored
feat(inference): update whisper properties
1 parent 5de53f4 commit e2a8ddb

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

pages/managed-inference/reference-content/model-catalog.mdx

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
1717
| Model name | Provider | Maximum Context length (tokens) | Modalities | Compatible Instances (Max Context in tokens\*) | License |
1818
|------------|----------|--------------|------------|-----------|---------|
1919
| [`gpt-oss-120b`](#gpt-oss-120b) | OpenAI | 128k | Text | H100 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
20+
| [`whisper-large-v3`](#whisper-large-v3) | OpenAI | - | Audio transcription | L4, L40S, H100, H100-SXM-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
2021
| [`qwen3-235b-a22b-instruct-2507`](#qwen3-235b-a22b-instruct-2507) | Qwen | 40k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
2122
| [`gemma-3-27b-it`](#gemma-3-27b-it) | Google | 40k | Text, Vision | H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) |
2223
| [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | 128k | Text | H100 (15k), H100-2 | [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) |
@@ -48,6 +49,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
4849
| Model name | Structured output supported | Function calling | Supported languages |
4950
| --- | --- | --- | --- |
5051
| `gpt-oss-120b` | Yes | Yes | English |
52+
| `whisper-large-v3` | - | - | English, French, German, Chinese, Japanese, Korean and 81 additional languages |
5153
| `qwen3-235b-a22b-instruct-2507` | Yes | Yes | English, French, German, Chinese, Japanese, Korean and 113 additional languages and dialects |
5254
| `gemma-3-27b-it` | Yes | Partial | English, Chinese, Japanese, Korean and 31 additional languages |
5355
| `llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
@@ -192,6 +194,26 @@ mistral/voxtral-small-24b-2507:fp8
192194
- If audio sent is less than 30 seconds, the rest of the chunk will be considered silent.
193195
- 80ms is equal to 1 input token
194196

197+
## Audio transcription models
198+
199+
### Whisper-large-v3
200+
Whisper-large-v3 is a model developed by OpenAI to perform audio transcription on many languages.
201+
This model is optimized for transcription in many languages.
202+
203+
| Attribute | Value |
204+
|-----------|-------|
205+
| Supported audio formats | WAV and MP3 |
206+
| Audio chunk duration | 30 seconds |
207+
208+
#### Model names
209+
```
210+
openai/whisper-large-v3:bf16
211+
```
212+
213+
- Mono and stereo audio formats are supported. For stereo formats, both left and right channels are merged before being processed.
214+
- Audio files are processed in 30 seconds chunks:
215+
- If audio sent is less than 30 seconds, the rest of the chunk will be considered silent.
216+
195217
## Text models
196218

197219
### Qwen3-235b-a22b-instruct-2507

0 commit comments

Comments
 (0)