From 682639dbc88b4c7b8e4e5af73192dfec9bd6a1d7 Mon Sep 17 00:00:00 2001
From: fpagny <franckpagny@hotmail.fr>
Date: Fri, 12 Sep 2025 15:56:50 +0200
Subject: [PATCH 01/10] feat(genapi): update supported models

---
 .../reference-content/supported-models.mdx         | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/pages/generative-apis/reference-content/supported-models.mdx b/pages/generative-apis/reference-content/supported-models.mdx
index 6236874f1e..61976f99c1 100644
--- a/pages/generative-apis/reference-content/supported-models.mdx
+++ b/pages/generative-apis/reference-content/supported-models.mdx
@@ -3,19 +3,27 @@ title: Supported models
 description: This page lists which open-source chat or embedding models Scaleway is currently hosting
 tags: generative-apis ai-data supported-models
 dates:
-  validation: 2025-08-20
+  validation: 2025-09-12
   posted: 2024-09-02
 ---
 
-Our API supports the most popular models for [Chat](/generative-apis/how-to/query-language-models), [Vision](/generative-apis/how-to/query-vision-models/) and [Embeddings](/generative-apis/how-to/query-embedding-models/).
+Our API supports the most popular models for [Chat](/generative-apis/how-to/query-language-models), [Vision](/generative-apis/how-to/query-vision-models/), [Audio](/generative-apis/how-to/query-audio-models/) and [Embeddings](/generative-apis/how-to/query-embedding-models/).
 
-## Multimodal models (chat and vision)
+## Multimodal models
+
+### Chat and Vision models
 
 | Provider | Model string | Context window (Tokens) | Maximum output (Tokens)| License | Model card |
 |-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|
 | Google (Preview)   | `gemma-3-27b-it`  | 40k  | 8192 | [Gemma](https://ai.google.dev/gemma/terms) | [HF](https://huggingface.co/google/gemma-3-27b-it) |
 | Mistral | `mistral-small-3.2-24b-instruct-2506`  | 128k  | 8192 | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506) |
 
+### Chat and Audio models
+
+| Provider | Model string | Context window (Tokens) | Maximum output (Tokens)| License | Model card |
+|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|
+| Mistral | `voxtral-small-24b-2507`  | 32k  | 8192 | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/mistralai/Voxtral-Small-24B-2507) |
+
 ## Chat models
 
 | Provider | Model string | Context window (Tokens) | Maximum output (Tokens)| License | Model card |

From 8a545d9965ea40efc4bd72065c12ea983a483461 Mon Sep 17 00:00:00 2001
From: fpagny <franckpagny@hotmail.fr>
Date: Fri, 12 Sep 2025 16:45:41 +0200
Subject: [PATCH 02/10] feat(genapi): update model catalog with voxtral

---
 .../reference-content/model-catalog.mdx       | 26 +++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
index 68ac784629..e16fb6f671 100644
--- a/pages/managed-inference/reference-content/model-catalog.mdx
+++ b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -30,6 +30,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | [`mistral-small-3.2-24b-instruct-2506`](#mistral-small-32-24b-instruct-2506) | Mistral | 128k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`mistral-small-3.1-24b-instruct-2503`](#mistral-small-31-24b-instruct-2503) | Mistral | 128k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`mistral-small-24b-instruct-2501`](#mistral-small-24b-instruct-2501) | Mistral | 32k | Text | L40S (20k), H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
+| [`voxtral-small-24b-2507`](#voxtral-small-24b-2507) | Mistral | 32k | Text | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
 | [`magistral-small-2506`](#magistral-small-2506) | Mistral | 32k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
@@ -60,6 +61,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib
 | `mistral-small-3.2-24b-instruct-2506` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi |
 | `mistral-small-3.1-24b-instruct-2503` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi |
 | `mistral-small-24b-instruct-2501` | Yes | Yes | English, French, German, Dutch, Spanish, Italian, Polish, Portuguese, Chinese, Japanese, Korean |
+| `voxtral-small-24b-2507` | Yes | Yes | English, French, German, Dutch, Spanish, Italian, Portuguese, Hindi |
 | `mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese |
 | `mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Italian, Spanish |
 | `magistral-small-2506` | Yes | Yes | English, French, German, Spanish, Portuguese, Italian, Japanese, Korean, Russian, Chinese, Arabic, Persian, Indonesian, Malay, Nepali, Polish, Romanian, Serbian, Swedish, Turkish, Ukrainian, Vietnamese, Hindi, Bengali |
@@ -164,6 +166,30 @@ Vision-language models like Molmo can analyze an image and offer insights from v
 allenai/molmo-72b-0924:fp8
 ```
 
+## Multimodal models (Text and Audio)
+
+### Voxtral-small-24b-2507
+Voxtral-small-24b-2507 is a model developed by Mistral to perform text processing and audio analysis on many languages.
+This model was optimized to enable transcription in many languages while keeping conversational capabilities (translations, classification...)
+
+| Attribute | Value |
+|-----------|-------|
+| Supports parallel tool calling | Yes |
+| Supported audio formats | WAV and MP3 |
+| Audio chunk duration | 30 seconds |
+| Token duration (audio)| 80ms |
+
+#### Model names
+```
+mistral/voxtral-small-24b-2507:bf16
+mistral/voxtral-small-24b-2507:fp8
+```
+
+- Mono and stereo audio formats are supported. For stereo formats, both left and right channels are merged before being processed.
+- Audio files are processed by 30 seconds chunks:
+  - If audio sent is less than 30 seconds, the rest of a chunk will be considered silent. 
+  - 80ms is equal to 1 input token
+
 ## Text models
 
 ### Qwen3-235b-a22b-instruct-2507

From d3f274ab71d886bef5de1b4d1d93683997355ede Mon Sep 17 00:00:00 2001
From: fpagny <franckpagny@hotmail.fr>
Date: Fri, 12 Sep 2025 16:49:48 +0200
Subject: [PATCH 03/10] feat(genapi): update rate limits with voxstral

---
 .../additional-content/organization-quotas.mdx               | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/pages/organizations-and-projects/additional-content/organization-quotas.mdx b/pages/organizations-and-projects/additional-content/organization-quotas.mdx
index 0b4e9ace02..eb95ee67ff 100644
--- a/pages/organizations-and-projects/additional-content/organization-quotas.mdx
+++ b/pages/organizations-and-projects/additional-content/organization-quotas.mdx
@@ -203,6 +203,7 @@ Generative APIs are rate limited based on:
 | mistral-small-3.1-24b-instruct-2503	  | 200k | 400k    |
 | mistral-small-3.2-24b-instruct-2506	  | 200k | 400k    |
 | mistral-nemo-instruct-2407	  | 200k | 400k    |
+| voxtral-small-24b-2507	  | 200k | 400k    |
 | pixtral-12b-2409		  | 200k | 400k  |
 | qwen3-235b-a22b-instruct-2507	  | 200k | 400k   |
 | qwen2.5-coder-32b-instruct	  | 200k | 400k   |
@@ -221,6 +222,10 @@ Generative APIs are rate limited based on:
 | mistral-small-3.1-24b-instruct-2503	  | 300 | 600   |
 | mistral-small-3.2-24b-instruct-2506	  | 300 | 600   |
 | mistral-nemo-instruct-2407	  | 300 | 600    |
+<<<<<<< HEAD
+=======
+| voxtral-small-24b-2507	  | 300 | 600    |
+>>>>>>> 238655665 (feat(genapi): update rate limits with voxstral)
 | pixtral-12b-2409		  | 300 | 600     |
 | qwen3-235b-a22b-instruct-2507	  | 300 | 600   |
 | qwen2.5-coder-32b-instruct	  | 300 | 600   |

From 086bdb914aff6672da08ad04c94c70c14c8681b7 Mon Sep 17 00:00:00 2001
From: fpagny <franckpagny@hotmail.fr>
Date: Fri, 12 Sep 2025 17:09:00 +0200
Subject: [PATCH 04/10] feat(genapi): update faq for audio models

---
 pages/generative-apis/faq.mdx | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/pages/generative-apis/faq.mdx b/pages/generative-apis/faq.mdx
index 84010a2d6f..bacca1277e 100644
--- a/pages/generative-apis/faq.mdx
+++ b/pages/generative-apis/faq.mdx
@@ -83,7 +83,10 @@ Note that in this example, the first line where the free tier applies will not d
 ### What are tokens and how are they counted?
 A token is the minimum unit of content that is seen and processed by a model. Hence, token definitions depend on input types:
 - For text, on average, `1` token corresponds to `~4` characters, and thus `0.75` words (as words are on average five characters long)
-- For images, `1` token corresponds to a square of pixels. For example, `mistral-small-3.1-24b-instruct-2503` model image tokens of `28x28` pixels (28-pixels height, and 28-pixels width, hence `784` pixels in total).
+- For images, `1` token corresponds to a square of pixels. For example, `mistral-small-3.1-24b-instruct-2503` model image tokens are `28x28` pixels (28-pixels height, and 28-pixels width, hence `784` pixels in total).
+- For audio:
+  - `1` token corresponds to a time duration. For example, `voxtral-small-24b-2507` model audio tokens are `80` milliseconds.
+  - Some models process audio by chunks having a minimum duration. For example, `voxtral-small-24b-2507` model process audio by `30` seconds chunks. This means an audio of `13` seconds will be considered `375` tokens (`30` seconds / `0.08` seconds). And an audio of `178` seconds will considered `2 250` tokens (`30` seconds * `6` / `0.08` seconds).
 
 The exact token count and definition depend on [tokenizers](https://huggingface.co/learn/llm-course/en/chapter2/4) used by each model. When this difference is significant (such as for image processing), you can find detailed information in each model's documentation (for instance in [`mistral-small-3.1-24b-instruct-2503` size limit documentation](/managed-inference/reference-content/model-catalog/#mistral-small-31-24b-instruct-2503)). When the model is open, you can also find this information in the model files on platforms such as Hugging Face, usually in the `tokenizer_config.json` file.
 

From 037820acd37844de92619c9b9f474f75d2a77d3d Mon Sep 17 00:00:00 2001
From: Rowena <rjones@scaleway.com>
Date: Tue, 23 Sep 2025 17:59:34 +0200
Subject: [PATCH 05/10] feat(genapis): add audio model info

---
 menu/navigation.json                          |   4 +
 pages/generative-apis/faq.mdx                 |   6 +-
 .../how-to/query-audio-models.mdx             | 169 ++++++++++++++++++
 3 files changed, 176 insertions(+), 3 deletions(-)
 create mode 100644 pages/generative-apis/how-to/query-audio-models.mdx

diff --git a/menu/navigation.json b/menu/navigation.json
index 97b07114d4..693274ba6a 100644
--- a/menu/navigation.json
+++ b/menu/navigation.json
@@ -812,6 +812,10 @@
                     "label": "Query code models",
                     "slug": "query-code-models"
                   },
+                  {
+                    "label": "Query audio models",
+                    "slug": "query-audio-models"
+                  },
                   {
                     "label": "Use structured outputs",
                     "slug": "use-structured-outputs"
diff --git a/pages/generative-apis/faq.mdx b/pages/generative-apis/faq.mdx
index bacca1277e..97e9eb3b70 100644
--- a/pages/generative-apis/faq.mdx
+++ b/pages/generative-apis/faq.mdx
@@ -85,10 +85,10 @@ A token is the minimum unit of content that is seen and processed by a model. He
 - For text, on average, `1` token corresponds to `~4` characters, and thus `0.75` words (as words are on average five characters long)
 - For images, `1` token corresponds to a square of pixels. For example, `mistral-small-3.1-24b-instruct-2503` model image tokens are `28x28` pixels (28-pixels height, and 28-pixels width, hence `784` pixels in total).
 - For audio:
-  - `1` token corresponds to a time duration. For example, `voxtral-small-24b-2507` model audio tokens are `80` milliseconds.
-  - Some models process audio by chunks having a minimum duration. For example, `voxtral-small-24b-2507` model process audio by `30` seconds chunks. This means an audio of `13` seconds will be considered `375` tokens (`30` seconds / `0.08` seconds). And an audio of `178` seconds will considered `2 250` tokens (`30` seconds * `6` / `0.08` seconds).
+  - `1` token corresponds to a duration of time. For example, `voxtral-small-24b-2507` model audio tokens are `80` milliseconds.
+  - Some models process audio in chunks having a minimum duration. For example, `voxtral-small-24b-2507` model process audio in `30` second chunks. This means audio lasting `13` seconds will be considered `375` tokens (`30` seconds / `0.08` seconds). And audio lasting `178` seconds will be considered `2 250` tokens (`30` seconds * `6` / `0.08` seconds).
 
-The exact token count and definition depend on [tokenizers](https://huggingface.co/learn/llm-course/en/chapter2/4) used by each model. When this difference is significant (such as for image processing), you can find detailed information in each model's documentation (for instance in [`mistral-small-3.1-24b-instruct-2503` size limit documentation](/managed-inference/reference-content/model-catalog/#mistral-small-31-24b-instruct-2503)). When the model is open, you can also find this information in the model files on platforms such as Hugging Face, usually in the `tokenizer_config.json` file.
+The exact token count and definition depend on the [tokenizer](https://huggingface.co/learn/llm-course/en/chapter2/4) used by each model. When this difference is significant (such as for image processing), you can find detailed information in each model's documentation (for instance in [`mistral-small-3.1-24b-instruct-2503` size limit documentation](/managed-inference/reference-content/model-catalog/#mistral-small-31-24b-instruct-2503)). When the model is open, you can also find this information in the model files on platforms such as Hugging Face, usually in the `tokenizer_config.json` file.
 
 ### How can I monitor my token consumption?
 You can see your token consumption in [Scaleway Cockpit](/cockpit/). You can access it from the Scaleway console under the [Metrics tab](https://console.scaleway.com/generative-api/metrics).
diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx
new file mode 100644
index 0000000000..800d7b9a1d
--- /dev/null
+++ b/pages/generative-apis/how-to/query-audio-models.mdx
@@ -0,0 +1,169 @@
+---
+title: How to query audio models
+description: Learn how to interact with powerful audio models using Scaleway's Generative APIs service.
+tags: generative-apis ai-data audio-models voxtral audio-model
+dates:
+  validation: 2025-08-22
+  posted: 2024-08-28
+---
+import Requirements from '@macros/iam/requirements.mdx'
+import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx'
+
+Scaleway's Generative APIs service allows users to interact with powerful audio models hosted on the platform.
+
+There are several ways to interact with audio models:
+- The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
+- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion)
+
+<Requirements />
+
+- A Scaleway account logged into the [console](https://console.scaleway.com)
+- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization
+- A valid [API key](/iam/how-to/create-api-keys/) for API authentication
+- Python 3.7+ installed on your system
+
+## Accessing the Playground
+
+Scaleway provides a web playground for instruct-based models hosted on Generative APIs.
+
+1. Navigate to **Generative APIs** under the **AI** section of the [Scaleway console](https://console.scaleway.com/) side menu. The list of models you can query displays.
+2. Click the name of the chat model you want to try. Alternatively, click <Icon name="more" /> next to the chat model, and click **Try model** in the menu. 
+
+The web playground displays.
+
+## Using the playground
+
+1. Enter a prompt at the bottom of the page, or use one of the suggested prompts in the conversation area.
+2. Edit the hyperparameters listed on the right column, for example the default temperature for more or less randomness on the outputs. 
+3. Switch models at the top of the page, to observe the capabilities of chat models offered via Generative APIs. 
+4. Click **View code** to get code snippets configured according to your settings in the playground.
+
+<Message type="tip">
+You can also use the upload button to send supported audio file formats, such as MP3, to the model for transcription purposes.
+</Message> 
+
+## Querying audio models via API
+
+You can query the models programmatically using your favorite tools or languages.
+In the example that follows, we will use the OpenAI Python client.
+
+### Installing the OpenAI SDK
+
+Install the OpenAI SDK using pip:
+
+```bash
+pip install openai
+```
+
+### Initializing the client
+
+Initialize the OpenAI client with your base URL and API key:
+
+```python
+from openai import OpenAI
+
+# Initialize the client with your base URL and API key
+client = OpenAI(
+    base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
+    api_key="<SCW_SECRET_KEY>"  # Your unique API secret key from Scaleway
+)
+```
+
+### Transcribing audio
+
+You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be local or remote.
+
+### Transcribing a local audio file
+
+In the example below, a local audio file called `scaleway-ai-revolution.mp3` is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen.
+
+```python
+import base64
+
+MODEL = "voxtral-small-24b-2507"
+
+with open('scaleway-ai-revolution.mp3', 'rb') as raw_file:
+        audio_data = raw_file.read()
+encoded_string = base64.b64encode(audio_data).decode("utf-8")
+
+content = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Transcribe this audio"
+                },
+                {
+                    "type": "input_audio",
+                    "input_audio": {
+                        "data": encoded_string,
+                        "format": "mp3"
+                    }
+                }
+            ]
+        }
+    ]
+
+
+response = client.chat.completions.create(
+    model=MODEL,
+    messages=content,
+    temperature=0.2,  # Adjusts creativity
+    max_tokens=2048,   # Limits the length of the output
+    top_p=0.95         # Controls diversity through nucleus sampling. You usually only need to use temperature.
+)
+
+print(response.choices[0].message.content)
+```
+
+Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
+
+### Transcribing a remote audio file
+
+In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen.
+
+```python
+import base64
+import requests
+
+MODEL = "voxtral-small-24b-2507"
+
+url = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3"
+response = requests.get(url)
+audio_data = response.content
+encoded_string = base64.b64encode(audio_data).decode("utf-8")
+
+content = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Transcribe this audio"
+                },
+                {
+                    "type": "input_audio",
+                    "input_audio": {
+                        "data": encoded_string,
+                        "format": "mp3"
+                    }
+                }
+            ]
+        }
+    ]
+
+
+response = client.chat.completions.create(
+    model=MODEL,
+    messages=content,
+    temperature=0.2,  # Adjusts creativity
+    max_tokens=2048,   # Limits the length of the output
+    top_p=0.95         # Controls diversity through nucleus sampling. You usually only need to use temperature.
+)
+
+print(response.choices[0].message.content)
+
+```
+
+Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.

From a08cc94610773c37881e953f23fcec10c3b9c55f Mon Sep 17 00:00:00 2001
From: Rowena <rjones@scaleway.com>
Date: Tue, 23 Sep 2025 18:01:22 +0200
Subject: [PATCH 06/10] fix(genapis): fix dates

---
 pages/generative-apis/how-to/query-audio-models.mdx | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx
index 800d7b9a1d..a36b7aee4f 100644
--- a/pages/generative-apis/how-to/query-audio-models.mdx
+++ b/pages/generative-apis/how-to/query-audio-models.mdx
@@ -3,8 +3,8 @@ title: How to query audio models
 description: Learn how to interact with powerful audio models using Scaleway's Generative APIs service.
 tags: generative-apis ai-data audio-models voxtral audio-model
 dates:
-  validation: 2025-08-22
-  posted: 2024-08-28
+  validation: 2025-09-22
+  posted: 2025-09-22
 ---
 import Requirements from '@macros/iam/requirements.mdx'
 import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx'
@@ -73,7 +73,7 @@ client = OpenAI(
 
 You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be local or remote.
 
-### Transcribing a local audio file
+#### Transcribing a local audio file
 
 In the example below, a local audio file called `scaleway-ai-revolution.mp3` is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen.
 
@@ -119,7 +119,7 @@ print(response.choices[0].message.content)
 
 Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
 
-### Transcribing a remote audio file
+#### Transcribing a remote audio file
 
 In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen.
 

From d37d26106f8f2e2bb94a0a4258395334a64bb4fd Mon Sep 17 00:00:00 2001
From: Rowena Jones <36301604+RoRoJ@users.noreply.github.com>
Date: Wed, 24 Sep 2025 10:55:32 +0200
Subject: [PATCH 07/10] Apply suggestions from code review

---
 pages/generative-apis/how-to/query-audio-models.mdx         | 6 ++----
 pages/managed-inference/reference-content/model-catalog.mdx | 6 +++---
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx
index a36b7aee4f..9b94b98d10 100644
--- a/pages/generative-apis/how-to/query-audio-models.mdx
+++ b/pages/generative-apis/how-to/query-audio-models.mdx
@@ -1,13 +1,12 @@
 ---
 title: How to query audio models
 description: Learn how to interact with powerful audio models using Scaleway's Generative APIs service.
-tags: generative-apis ai-data audio-models voxtral audio-model
+tags: generative-apis ai-data audio-models voxtral
 dates:
   validation: 2025-09-22
   posted: 2025-09-22
 ---
 import Requirements from '@macros/iam/requirements.mdx'
-import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx'
 
 Scaleway's Generative APIs service allows users to interact with powerful audio models hosted on the platform.
 
@@ -39,7 +38,7 @@ The web playground displays.
 4. Click **View code** to get code snippets configured according to your settings in the playground.
 
 <Message type="tip">
-You can also use the upload button to send supported audio file formats, such as MP3, to the model for transcription purposes.
+You can also use the upload button to send supported audio file formats, such as MP3, to audio models for transcription purposes.
 </Message> 
 
 ## Querying audio models via API
@@ -163,7 +162,6 @@ response = client.chat.completions.create(
 )
 
 print(response.choices[0].message.content)
-
 ```
 
 Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
index e16fb6f671..5c2ef3b6b5 100644
--- a/pages/managed-inference/reference-content/model-catalog.mdx
+++ b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -170,7 +170,7 @@ allenai/molmo-72b-0924:fp8
 
 ### Voxtral-small-24b-2507
 Voxtral-small-24b-2507 is a model developed by Mistral to perform text processing and audio analysis on many languages.
-This model was optimized to enable transcription in many languages while keeping conversational capabilities (translations, classification...)
+This model was optimized to enable transcription in many languages while keeping conversational capabilities (translations, classification, etc.)
 
 | Attribute | Value |
 |-----------|-------|
@@ -186,8 +186,8 @@ mistral/voxtral-small-24b-2507:fp8
 ```
 
 - Mono and stereo audio formats are supported. For stereo formats, both left and right channels are merged before being processed.
-- Audio files are processed by 30 seconds chunks:
-  - If audio sent is less than 30 seconds, the rest of a chunk will be considered silent. 
+- Audio files are processed in 30 seconds chunks:
+  - If audio sent is less than 30 seconds, the rest of the chunk will be considered silent. 
   - 80ms is equal to 1 input token
 
 ## Text models

From 97c5b626529976c72adc6a653cfe501d1a00e6d2 Mon Sep 17 00:00:00 2001
From: Rowena <rjones@scaleway.com>
Date: Wed, 24 Sep 2025 10:59:27 +0200
Subject: [PATCH 08/10] fix(genapis): fix conflcit

---
 .../additional-content/organization-quotas.mdx                 | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/pages/organizations-and-projects/additional-content/organization-quotas.mdx b/pages/organizations-and-projects/additional-content/organization-quotas.mdx
index eb95ee67ff..f02f9f552a 100644
--- a/pages/organizations-and-projects/additional-content/organization-quotas.mdx
+++ b/pages/organizations-and-projects/additional-content/organization-quotas.mdx
@@ -222,10 +222,7 @@ Generative APIs are rate limited based on:
 | mistral-small-3.1-24b-instruct-2503	  | 300 | 600   |
 | mistral-small-3.2-24b-instruct-2506	  | 300 | 600   |
 | mistral-nemo-instruct-2407	  | 300 | 600    |
-<<<<<<< HEAD
-=======
 | voxtral-small-24b-2507	  | 300 | 600    |
->>>>>>> 238655665 (feat(genapi): update rate limits with voxstral)
 | pixtral-12b-2409		  | 300 | 600     |
 | qwen3-235b-a22b-instruct-2507	  | 300 | 600   |
 | qwen2.5-coder-32b-instruct	  | 300 | 600   |

From 36f6aafcf52850d851c083e09fada2954778ed6d Mon Sep 17 00:00:00 2001
From: Rowena Jones <36301604+RoRoJ@users.noreply.github.com>
Date: Wed, 24 Sep 2025 14:02:21 +0200
Subject: [PATCH 09/10] Update
 pages/generative-apis/how-to/query-audio-models.mdx

Co-authored-by: fpagny <franckpagny@hotmail.fr>
---
 pages/generative-apis/how-to/query-audio-models.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx
index 9b94b98d10..90ed459187 100644
--- a/pages/generative-apis/how-to/query-audio-models.mdx
+++ b/pages/generative-apis/how-to/query-audio-models.mdx
@@ -74,7 +74,7 @@ You can now generate a text transcription of a given audio file using the Chat C
 
 #### Transcribing a local audio file
 
-In the example below, a local audio file called `scaleway-ai-revolution.mp3` is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen.
+In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen.
 
 ```python
 import base64

From ab9d0c6823569d144a147d30ce3a86ed1d8bd93e Mon Sep 17 00:00:00 2001
From: Rowena <rjones@scaleway.com>
Date: Wed, 24 Sep 2025 14:09:12 +0200
Subject: [PATCH 10/10] fix(gen): switch order

---
 .../how-to/query-audio-models.mdx             | 24 +++++++++----------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx
index 90ed459187..5ab51a1212 100644
--- a/pages/generative-apis/how-to/query-audio-models.mdx
+++ b/pages/generative-apis/how-to/query-audio-models.mdx
@@ -70,19 +70,21 @@ client = OpenAI(
 
 ### Transcribing audio
 
-You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be local or remote.
+You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be remote or local.
 
-#### Transcribing a local audio file
+#### Transcribing a remote audio file
 
-In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen.
+In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen.
 
 ```python
 import base64
+import requests
 
 MODEL = "voxtral-small-24b-2507"
 
-with open('scaleway-ai-revolution.mp3', 'rb') as raw_file:
-        audio_data = raw_file.read()
+url = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3"
+response = requests.get(url)
+audio_data = response.content
 encoded_string = base64.b64encode(audio_data).decode("utf-8")
 
 content = [
@@ -118,19 +120,17 @@ print(response.choices[0].message.content)
 
 Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
 
-#### Transcribing a remote audio file
+#### Transcribing a local audio file
 
-In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen.
+In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen.
 
 ```python
 import base64
-import requests
 
 MODEL = "voxtral-small-24b-2507"
 
-url = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3"
-response = requests.get(url)
-audio_data = response.content
+with open('scaleway-ai-revolution.mp3', 'rb') as raw_file:
+        audio_data = raw_file.read()
 encoded_string = base64.b64encode(audio_data).decode("utf-8")
 
 content = [
@@ -164,4 +164,4 @@ response = client.chat.completions.create(
 print(response.choices[0].message.content)
 ```
 
-Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
+Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
\ No newline at end of file