@@ -17,6 +17,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
1717| Model name | Provider | Maximum Context length (tokens) | Modalities | Compatible Instances (Max Context in tokens\* ) | License |
1818| ------------| ----------| --------------| ------------| -----------| ---------|
1919| [ ` gpt-oss-120b ` ] ( #gpt-oss-120b ) | OpenAI | 128k | Text | H100 | [ Apache 2.0] ( https://www.apache.org/licenses/LICENSE-2.0 ) |
20+ | [ ` whisper-large-v3 ` ] ( #whisper-large-v3 ) | OpenAI | - | Audio transcription | L4, L40S, H100, H100-SXM-2 | [ Apache 2.0] ( https://www.apache.org/licenses/LICENSE-2.0 ) |
2021| [ ` qwen3-235b-a22b-instruct-2507 ` ] ( #qwen3-235b-a22b-instruct-2507 ) | Qwen | 40k | Text | H100-2 | [ Apache 2.0] ( https://www.apache.org/licenses/LICENSE-2.0 ) |
2122| [ ` gemma-3-27b-it ` ] ( #gemma-3-27b-it ) | Google | 40k | Text, Vision | H100, H100-2 | [ Gemma] ( https://ai.google.dev/gemma/terms ) |
2223| [ ` llama-3.3-70b-instruct ` ] ( #llama-33-70b-instruct ) | Meta | 128k | Text | H100 (15k), H100-2 | [ Llama 3.3 Community] ( https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct ) |
@@ -48,6 +49,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
4849| Model name | Structured output supported | Function calling | Supported languages |
4950| --- | --- | --- | --- |
5051| ` gpt-oss-120b ` | Yes | Yes | English |
52+ | ` whisper-large-v3 ` | - | - | English, French, German, Chinese, Japanese, Korean and 81 additional languages |
5153| ` qwen3-235b-a22b-instruct-2507 ` | Yes | Yes | English, French, German, Chinese, Japanese, Korean and 113 additional languages and dialects |
5254| ` gemma-3-27b-it ` | Yes | Partial | English, Chinese, Japanese, Korean and 31 additional languages |
5355| ` llama-3.3-70b-instruct ` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai |
@@ -192,6 +194,26 @@ mistral/voxtral-small-24b-2507:fp8
192194 - If audio sent is less than 30 seconds, the rest of the chunk will be considered silent.
193195 - 80ms is equal to 1 input token
194196
197+ ## Audio transcription models
198+
199+ ### Whisper-large-v3
200+ Whisper-large-v3 is a model developed by OpenAI to perform audio transcription on many languages.
201+ This model is optimized for transcription in many languages.
202+
203+ | Attribute | Value |
204+ | -----------| -------|
205+ | Supported audio formats | WAV and MP3 |
206+ | Audio chunk duration | 30 seconds |
207+
208+ #### Model names
209+ ```
210+ openai/whisper-large-v3:bf16
211+ ```
212+
213+ - Mono and stereo audio formats are supported. For stereo formats, both left and right channels are merged before being processed.
214+ - Audio files are processed in 30 seconds chunks:
215+ - If audio sent is less than 30 seconds, the rest of the chunk will be considered silent.
216+
195217## Text models
196218
197219### Qwen3-235b-a22b-instruct-2507
0 commit comments