diff --git a/menu/navigation.json b/menu/navigation.json
index 97b07114d4..693274ba6a 100644
--- a/menu/navigation.json
+++ b/menu/navigation.json
@@ -812,6 +812,10 @@
"label": "Query code models",
"slug": "query-code-models"
},
+ {
+ "label": "Query audio models",
+ "slug": "query-audio-models"
+ },
{
"label": "Use structured outputs",
"slug": "use-structured-outputs"
diff --git a/pages/generative-apis/faq.mdx b/pages/generative-apis/faq.mdx
index 84010a2d6f..97e9eb3b70 100644
--- a/pages/generative-apis/faq.mdx
+++ b/pages/generative-apis/faq.mdx
@@ -83,9 +83,12 @@ Note that in this example, the first line where the free tier applies will not d
### What are tokens and how are they counted?
A token is the minimum unit of content that is seen and processed by a model. Hence, token definitions depend on input types:
- For text, on average, `1` token corresponds to `~4` characters, and thus `0.75` words (as words are on average five characters long)
-- For images, `1` token corresponds to a square of pixels. For example, `mistral-small-3.1-24b-instruct-2503` model image tokens of `28x28` pixels (28-pixels height, and 28-pixels width, hence `784` pixels in total).
+- For images, `1` token corresponds to a square of pixels. For example, `mistral-small-3.1-24b-instruct-2503` model image tokens are `28x28` pixels (28-pixels height, and 28-pixels width, hence `784` pixels in total).
+- For audio:
+ - `1` token corresponds to a duration of time. For example, `voxtral-small-24b-2507` model audio tokens are `80` milliseconds.
+ - Some models process audio in chunks having a minimum duration. For example, `voxtral-small-24b-2507` model process audio in `30` second chunks. This means audio lasting `13` seconds will be considered `375` tokens (`30` seconds / `0.08` seconds). And audio lasting `178` seconds will be considered `2 250` tokens (`30` seconds * `6` / `0.08` seconds).
-The exact token count and definition depend on [tokenizers](https://huggingface.co/learn/llm-course/en/chapter2/4) used by each model. When this difference is significant (such as for image processing), you can find detailed information in each model's documentation (for instance in [`mistral-small-3.1-24b-instruct-2503` size limit documentation](/managed-inference/reference-content/model-catalog/#mistral-small-31-24b-instruct-2503)). When the model is open, you can also find this information in the model files on platforms such as Hugging Face, usually in the `tokenizer_config.json` file.
+The exact token count and definition depend on the [tokenizer](https://huggingface.co/learn/llm-course/en/chapter2/4) used by each model. When this difference is significant (such as for image processing), you can find detailed information in each model's documentation (for instance in [`mistral-small-3.1-24b-instruct-2503` size limit documentation](/managed-inference/reference-content/model-catalog/#mistral-small-31-24b-instruct-2503)). When the model is open, you can also find this information in the model files on platforms such as Hugging Face, usually in the `tokenizer_config.json` file.
### How can I monitor my token consumption?
You can see your token consumption in [Scaleway Cockpit](/cockpit/). You can access it from the Scaleway console under the [Metrics tab](https://console.scaleway.com/generative-api/metrics).
diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx
new file mode 100644
index 0000000000..5ab51a1212
--- /dev/null
+++ b/pages/generative-apis/how-to/query-audio-models.mdx
@@ -0,0 +1,167 @@
+---
+title: How to query audio models
+description: Learn how to interact with powerful audio models using Scaleway's Generative APIs service.
+tags: generative-apis ai-data audio-models voxtral
+dates:
+ validation: 2025-09-22
+ posted: 2025-09-22
+---
+import Requirements from '@macros/iam/requirements.mdx'
+
+Scaleway's Generative APIs service allows users to interact with powerful audio models hosted on the platform.
+
+There are several ways to interact with audio models:
+- The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
+- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion)
+
+
+
+- A Scaleway account logged into the [console](https://console.scaleway.com)
+- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization
+- A valid [API key](/iam/how-to/create-api-keys/) for API authentication
+- Python 3.7+ installed on your system
+
+## Accessing the Playground
+
+Scaleway provides a web playground for instruct-based models hosted on Generative APIs.
+
+1. Navigate to **Generative APIs** under the **AI** section of the [Scaleway console](https://console.scaleway.com/) side menu. The list of models you can query displays.
+2. Click the name of the chat model you want to try. Alternatively, click next to the chat model, and click **Try model** in the menu.
+
+The web playground displays.
+
+## Using the playground
+
+1. Enter a prompt at the bottom of the page, or use one of the suggested prompts in the conversation area.
+2. Edit the hyperparameters listed on the right column, for example the default temperature for more or less randomness on the outputs.
+3. Switch models at the top of the page, to observe the capabilities of chat models offered via Generative APIs.
+4. Click **View code** to get code snippets configured according to your settings in the playground.
+
+
+You can also use the upload button to send supported audio file formats, such as MP3, to audio models for transcription purposes.
+
+
+## Querying audio models via API
+
+You can query the models programmatically using your favorite tools or languages.
+In the example that follows, we will use the OpenAI Python client.
+
+### Installing the OpenAI SDK
+
+Install the OpenAI SDK using pip:
+
+```bash
+pip install openai
+```
+
+### Initializing the client
+
+Initialize the OpenAI client with your base URL and API key:
+
+```python
+from openai import OpenAI
+
+# Initialize the client with your base URL and API key
+client = OpenAI(
+ base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
+ api_key="" # Your unique API secret key from Scaleway
+)
+```
+
+### Transcribing audio
+
+You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be remote or local.
+
+#### Transcribing a remote audio file
+
+In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen.
+
+```python
+import base64
+import requests
+
+MODEL = "voxtral-small-24b-2507"
+
+url = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3"
+response = requests.get(url)
+audio_data = response.content
+encoded_string = base64.b64encode(audio_data).decode("utf-8")
+
+content = [
+ {
+ "role": "user",
+ "content": [
+ {
+ "type": "text",
+ "text": "Transcribe this audio"
+ },
+ {
+ "type": "input_audio",
+ "input_audio": {
+ "data": encoded_string,
+ "format": "mp3"
+ }
+ }
+ ]
+ }
+ ]
+
+
+response = client.chat.completions.create(
+ model=MODEL,
+ messages=content,
+ temperature=0.2, # Adjusts creativity
+ max_tokens=2048, # Limits the length of the output
+ top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature.
+)
+
+print(response.choices[0].message.content)
+```
+
+Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
+
+#### Transcribing a local audio file
+
+In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen.
+
+```python
+import base64
+
+MODEL = "voxtral-small-24b-2507"
+
+with open('scaleway-ai-revolution.mp3', 'rb') as raw_file:
+ audio_data = raw_file.read()
+encoded_string = base64.b64encode(audio_data).decode("utf-8")
+
+content = [
+ {
+ "role": "user",
+ "content": [
+ {
+ "type": "text",
+ "text": "Transcribe this audio"
+ },
+ {
+ "type": "input_audio",
+ "input_audio": {
+ "data": encoded_string,
+ "format": "mp3"
+ }
+ }
+ ]
+ }
+ ]
+
+
+response = client.chat.completions.create(
+ model=MODEL,
+ messages=content,
+ temperature=0.2, # Adjusts creativity
+ max_tokens=2048, # Limits the length of the output
+ top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature.
+)
+
+print(response.choices[0].message.content)
+```
+
+Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
\ No newline at end of file
diff --git a/pages/generative-apis/reference-content/supported-models.mdx b/pages/generative-apis/reference-content/supported-models.mdx
index 6236874f1e..61976f99c1 100644
--- a/pages/generative-apis/reference-content/supported-models.mdx
+++ b/pages/generative-apis/reference-content/supported-models.mdx
@@ -3,19 +3,27 @@ title: Supported models
description: This page lists which open-source chat or embedding models Scaleway is currently hosting
tags: generative-apis ai-data supported-models
dates:
- validation: 2025-08-20
+ validation: 2025-09-12
posted: 2024-09-02
---
-Our API supports the most popular models for [Chat](/generative-apis/how-to/query-language-models), [Vision](/generative-apis/how-to/query-vision-models/) and [Embeddings](/generative-apis/how-to/query-embedding-models/).
+Our API supports the most popular models for [Chat](/generative-apis/how-to/query-language-models), [Vision](/generative-apis/how-to/query-vision-models/), [Audio](/generative-apis/how-to/query-audio-models/) and [Embeddings](/generative-apis/how-to/query-embedding-models/).
-## Multimodal models (chat and vision)
+## Multimodal models
+
+### Chat and Vision models
| Provider | Model string | Context window (Tokens) | Maximum output (Tokens)| License | Model card |
|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|
| Google (Preview) | `gemma-3-27b-it` | 40k | 8192 | [Gemma](https://ai.google.dev/gemma/terms) | [HF](https://huggingface.co/google/gemma-3-27b-it) |
| Mistral | `mistral-small-3.2-24b-instruct-2506` | 128k | 8192 | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506) |
+### Chat and Audio models
+
+| Provider | Model string | Context window (Tokens) | Maximum output (Tokens)| License | Model card |
+|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|
+| Mistral | `voxtral-small-24b-2507` | 32k | 8192 | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/mistralai/Voxtral-Small-24B-2507) |
+
## Chat models
| Provider | Model string | Context window (Tokens) | Maximum output (Tokens)| License | Model card |
diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
index 68ac784629..5c2ef3b6b5 100644
--- a/pages/managed-inference/reference-content/model-catalog.mdx
+++ b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -30,6 +30,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
| [`mistral-small-3.2-24b-instruct-2506`](#mistral-small-32-24b-instruct-2506) | Mistral | 128k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
| [`mistral-small-3.1-24b-instruct-2503`](#mistral-small-31-24b-instruct-2503) | Mistral | 128k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
| [`mistral-small-24b-instruct-2501`](#mistral-small-24b-instruct-2501) | Mistral | 32k | Text | L40S (20k), H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`voxtral-small-24b-2507`](#voxtral-small-24b-2507) | Mistral | 32k | Text | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
| [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
| [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
| [`magistral-small-2506`](#magistral-small-2506) | Mistral | 32k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
@@ -60,6 +61,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
| `mistral-small-3.2-24b-instruct-2506` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi |
| `mistral-small-3.1-24b-instruct-2503` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi |
| `mistral-small-24b-instruct-2501` | Yes | Yes | English, French, German, Dutch, Spanish, Italian, Polish, Portuguese, Chinese, Japanese, Korean |
+| `voxtral-small-24b-2507` | Yes | Yes | English, French, German, Dutch, Spanish, Italian, Portuguese, Hindi |
| `mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese |
| `mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Italian, Spanish |
| `magistral-small-2506` | Yes | Yes | English, French, German, Spanish, Portuguese, Italian, Japanese, Korean, Russian, Chinese, Arabic, Persian, Indonesian, Malay, Nepali, Polish, Romanian, Serbian, Swedish, Turkish, Ukrainian, Vietnamese, Hindi, Bengali |
@@ -164,6 +166,30 @@ Vision-language models like Molmo can analyze an image and offer insights from v
allenai/molmo-72b-0924:fp8
```
+## Multimodal models (Text and Audio)
+
+### Voxtral-small-24b-2507
+Voxtral-small-24b-2507 is a model developed by Mistral to perform text processing and audio analysis on many languages.
+This model was optimized to enable transcription in many languages while keeping conversational capabilities (translations, classification, etc.)
+
+| Attribute | Value |
+|-----------|-------|
+| Supports parallel tool calling | Yes |
+| Supported audio formats | WAV and MP3 |
+| Audio chunk duration | 30 seconds |
+| Token duration (audio)| 80ms |
+
+#### Model names
+```
+mistral/voxtral-small-24b-2507:bf16
+mistral/voxtral-small-24b-2507:fp8
+```
+
+- Mono and stereo audio formats are supported. For stereo formats, both left and right channels are merged before being processed.
+- Audio files are processed in 30 seconds chunks:
+ - If audio sent is less than 30 seconds, the rest of the chunk will be considered silent.
+ - 80ms is equal to 1 input token
+
## Text models
### Qwen3-235b-a22b-instruct-2507
diff --git a/pages/organizations-and-projects/additional-content/organization-quotas.mdx b/pages/organizations-and-projects/additional-content/organization-quotas.mdx
index 0b4e9ace02..f02f9f552a 100644
--- a/pages/organizations-and-projects/additional-content/organization-quotas.mdx
+++ b/pages/organizations-and-projects/additional-content/organization-quotas.mdx
@@ -203,6 +203,7 @@ Generative APIs are rate limited based on:
| mistral-small-3.1-24b-instruct-2503 | 200k | 400k |
| mistral-small-3.2-24b-instruct-2506 | 200k | 400k |
| mistral-nemo-instruct-2407 | 200k | 400k |
+| voxtral-small-24b-2507 | 200k | 400k |
| pixtral-12b-2409 | 200k | 400k |
| qwen3-235b-a22b-instruct-2507 | 200k | 400k |
| qwen2.5-coder-32b-instruct | 200k | 400k |
@@ -221,6 +222,7 @@ Generative APIs are rate limited based on:
| mistral-small-3.1-24b-instruct-2503 | 300 | 600 |
| mistral-small-3.2-24b-instruct-2506 | 300 | 600 |
| mistral-nemo-instruct-2407 | 300 | 600 |
+| voxtral-small-24b-2507 | 300 | 600 |
| pixtral-12b-2409 | 300 | 600 |
| qwen3-235b-a22b-instruct-2507 | 300 | 600 |
| qwen2.5-coder-32b-instruct | 300 | 600 |