Skip to content

Commit a22525d

Browse files
authored
feat(genapi): update model catalog with voxtral
1 parent ba33bb1 commit a22525d

File tree

1 file changed

+26
-0
lines changed

1 file changed

+26
-0
lines changed

pages/managed-inference/reference-content/model-catalog.mdx

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
3030
| [`mistral-small-3.2-24b-instruct-2506`](#mistral-small-32-24b-instruct-2506) | Mistral | 128k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
3131
| [`mistral-small-3.1-24b-instruct-2503`](#mistral-small-31-24b-instruct-2503) | Mistral | 128k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
3232
| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-instruct-2501) | Mistral | 32k | Text | L40S (20k), H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
33+
| [`voxtral-small-24b-2507`](#voxtral-small-24b-2507) | Mistral | 32k | Text | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
3334
| [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
3435
| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
3536
| [`magistral-small-2506`](#magistral-small-2506) | Mistral | 32k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
@@ -60,6 +61,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
6061
| `mistral-small-3.2-24b-instruct-2506` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi |
6162
| `mistral-small-3.1-24b-instruct-2503` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi |
6263
| `mistral-small-24b-instruct-2501` | Yes | Yes | English, French, German, Dutch, Spanish, Italian, Polish, Portuguese, Chinese, Japanese, Korean |
64+
| `voxtral-small-24b-2507` | Yes | Yes | English, French, German, Dutch, Spanish, Italian, Portuguese, Hindi |
6365
| `mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese |
6466
| `mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Italian, Spanish |
6567
| `magistral-small-2506` | Yes | Yes | English, French, German, Spanish, Portuguese, Italian, Japanese, Korean, Russian, Chinese, Arabic, Persian, Indonesian, Malay, Nepali, Polish, Romanian, Serbian, Swedish, Turkish, Ukrainian, Vietnamese, Hindi, Bengali |
@@ -164,6 +166,30 @@ Vision-language models like Molmo can analyze an image and offer insights from v
164166
allenai/molmo-72b-0924:fp8
165167
```
166168

169+
## Multimodal models (Text and Audio)
170+
171+
### Voxtral-small-24b-2507
172+
Voxtral-small-24b-2507 is a model developed by Mistral to perform text processing and audio analysis on many languages.
173+
This model was optimized to enable transcription in many languages while keeping conversational capabilities (translations, classification...)
174+
175+
| Attribute | Value |
176+
|-----------|-------|
177+
| Supports parallel tool calling | Yes |
178+
| Supported audio formats | WAV and MP3 |
179+
| Audio chunk duration | 30 seconds |
180+
| Token duration (audio)| 80ms |
181+
182+
#### Model names
183+
```
184+
mistral/voxtral-small-24b-2507:bf16
185+
mistral/voxtral-small-24b-2507:fp8
186+
```
187+
188+
- Mono and stereo audio formats are supported. For stereo formats, both left and right channels are merged before being processed.
189+
- Audio files are processed by 30 seconds chunks:
190+
- If audio sent is less than 30 seconds, the rest of a chunk will be considered silent.
191+
- 80ms is equal to 1 input token
192+
167193
## Text models
168194

169195
### Qwen3-235b-a22b-instruct-2507

0 commit comments

Comments
 (0)