feat(genapis): add audio model info

RoRoJ · RoRoJ · commit 037820acd378 · 2025-09-24T10:58:16.000+02:00
diff --git a/menu/navigation.json b/menu/navigation.json
@@ -812,6 +812,10 @@
                     "label": "Query code models",
                     "slug": "query-code-models"
                   },
+                  {
+                    "label": "Query audio models",
+                    "slug": "query-audio-models"
+                  },
                   {
                     "label": "Use structured outputs",
                     "slug": "use-structured-outputs"
diff --git a/pages/generative-apis/faq.mdx b/pages/generative-apis/faq.mdx
@@ -85,10 +85,10 @@ A token is the minimum unit of content that is seen and processed by a model. He
 - For text, on average, `1` token corresponds to `~4` characters, and thus `0.75` words (as words are on average five characters long)
 - For images, `1` token corresponds to a square of pixels. For example, `mistral-small-3.1-24b-instruct-2503` model image tokens are `28x28` pixels (28-pixels height, and 28-pixels width, hence `784` pixels in total).
 - For audio:
-  - `1` token corresponds to a time duration. For example, `voxtral-small-24b-2507` model audio tokens are `80` milliseconds.
-  - Some models process audio by chunks having a minimum duration. For example, `voxtral-small-24b-2507` model process audio by `30` seconds chunks. This means an audio of `13` seconds will be considered `375` tokens (`30` seconds / `0.08` seconds). And an audio of `178` seconds will considered `2 250` tokens (`30` seconds * `6` / `0.08` seconds).
+  - `1` token corresponds to a duration of time. For example, `voxtral-small-24b-2507` model audio tokens are `80` milliseconds.
+  - Some models process audio in chunks having a minimum duration. For example, `voxtral-small-24b-2507` model process audio in `30` second chunks. This means audio lasting `13` seconds will be considered `375` tokens (`30` seconds / `0.08` seconds). And audio lasting `178` seconds will be considered `2 250` tokens (`30` seconds * `6` / `0.08` seconds).
 
-The exact token count and definition depend on [tokenizers](https://huggingface.co/learn/llm-course/en/chapter2/4) used by each model. When this difference is significant (such as for image processing), you can find detailed information in each model's documentation (for instance in [`mistral-small-3.1-24b-instruct-2503` size limit documentation](/managed-inference/reference-content/model-catalog/#mistral-small-31-24b-instruct-2503)). When the model is open, you can also find this information in the model files on platforms such as Hugging Face, usually in the `tokenizer_config.json` file.
+The exact token count and definition depend on the [tokenizer](https://huggingface.co/learn/llm-course/en/chapter2/4) used by each model. When this difference is significant (such as for image processing), you can find detailed information in each model's documentation (for instance in [`mistral-small-3.1-24b-instruct-2503` size limit documentation](/managed-inference/reference-content/model-catalog/#mistral-small-31-24b-instruct-2503)). When the model is open, you can also find this information in the model files on platforms such as Hugging Face, usually in the `tokenizer_config.json` file.
 
 ### How can I monitor my token consumption?
 You can see your token consumption in [Scaleway Cockpit](/cockpit/). You can access it from the Scaleway console under the [Metrics tab](https://console.scaleway.com/generative-api/metrics).
diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx
@@ -0,0 +1,169 @@
+---
+title: How to query audio models
+description: Learn how to interact with powerful audio models using Scaleway's Generative APIs service.
+tags: generative-apis ai-data audio-models voxtral audio-model
+dates:
+  validation: 2025-08-22
+  posted: 2024-08-28
+---
+import Requirements from '@macros/iam/requirements.mdx'
+import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx'
+
+Scaleway's Generative APIs service allows users to interact with powerful audio models hosted on the platform.
+
+There are several ways to interact with audio models:
+- The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
+- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion)
+
+<Requirements />
+
+- A Scaleway account logged into the [console](https://console.scaleway.com)
+- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization
+- A valid [API key](/iam/how-to/create-api-keys/) for API authentication
+- Python 3.7+ installed on your system
+
+## Accessing the Playground
+
+Scaleway provides a web playground for instruct-based models hosted on Generative APIs.
+
+1. Navigate to **Generative APIs** under the **AI** section of the [Scaleway console](https://console.scaleway.com/) side menu. The list of models you can query displays.
+2. Click the name of the chat model you want to try. Alternatively, click <Icon name="more" /> next to the chat model, and click **Try model** in the menu. 
+
+The web playground displays.
+
+## Using the playground
+
+1. Enter a prompt at the bottom of the page, or use one of the suggested prompts in the conversation area.
+2. Edit the hyperparameters listed on the right column, for example the default temperature for more or less randomness on the outputs. 
+3. Switch models at the top of the page, to observe the capabilities of chat models offered via Generative APIs. 
+4. Click **View code** to get code snippets configured according to your settings in the playground.
+
+<Message type="tip">
+You can also use the upload button to send supported audio file formats, such as MP3, to the model for transcription purposes.
+</Message> 
+
+## Querying audio models via API
+
+You can query the models programmatically using your favorite tools or languages.
+In the example that follows, we will use the OpenAI Python client.
+
+### Installing the OpenAI SDK
+
+Install the OpenAI SDK using pip:
+
+```bash
+pip install openai
+```
+
+### Initializing the client
+
+Initialize the OpenAI client with your base URL and API key:
+
+```python
+from openai import OpenAI
+
+# Initialize the client with your base URL and API key
+client = OpenAI(
+    base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
+    api_key="<SCW_SECRET_KEY>"  # Your unique API secret key from Scaleway
+)
+```
+
+### Transcribing audio
+
+You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be local or remote.
+
+### Transcribing a local audio file
+
+In the example below, a local audio file called `scaleway-ai-revolution.mp3` is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen.
+
+```python
+import base64
+
+MODEL = "voxtral-small-24b-2507"
+
+with open('scaleway-ai-revolution.mp3', 'rb') as raw_file:
+        audio_data = raw_file.read()
+encoded_string = base64.b64encode(audio_data).decode("utf-8")
+
+content = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Transcribe this audio"
+                },
+                {
+                    "type": "input_audio",
+                    "input_audio": {
+                        "data": encoded_string,
+                        "format": "mp3"
+                    }
+                }
+            ]
+        }
+    ]
+
+
+response = client.chat.completions.create(
+    model=MODEL,
+    messages=content,
+    temperature=0.2,  # Adjusts creativity
+    max_tokens=2048,   # Limits the length of the output
+    top_p=0.95         # Controls diversity through nucleus sampling. You usually only need to use temperature.
+)
+
+print(response.choices[0].message.content)
+```
+
+Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
+
+### Transcribing a remote audio file
+
+In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen.
+
+```python
+import base64
+import requests
+
+MODEL = "voxtral-small-24b-2507"
+
+url = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3"
+response = requests.get(url)
+audio_data = response.content
+encoded_string = base64.b64encode(audio_data).decode("utf-8")
+
+content = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Transcribe this audio"
+                },
+                {
+                    "type": "input_audio",
+                    "input_audio": {
+                        "data": encoded_string,
+                        "format": "mp3"
+                    }
+                }
+            ]
+        }
+    ]
+
+
+response = client.chat.completions.create(
+    model=MODEL,
+    messages=content,
+    temperature=0.2,  # Adjusts creativity
+    max_tokens=2048,   # Limits the length of the output
+    top_p=0.95         # Controls diversity through nucleus sampling. You usually only need to use temperature.
+)
+
+print(response.choices[0].message.content)
+
+```
+
+Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.