Skip to content

Commit 037820a

Browse files
committed
feat(genapis): add audio model info
1 parent 086bdb9 commit 037820a

File tree

3 files changed

+176
-3
lines changed

3 files changed

+176
-3
lines changed

menu/navigation.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -812,6 +812,10 @@
812812
"label": "Query code models",
813813
"slug": "query-code-models"
814814
},
815+
{
816+
"label": "Query audio models",
817+
"slug": "query-audio-models"
818+
},
815819
{
816820
"label": "Use structured outputs",
817821
"slug": "use-structured-outputs"

pages/generative-apis/faq.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,10 @@ A token is the minimum unit of content that is seen and processed by a model. He
8585
- For text, on average, `1` token corresponds to `~4` characters, and thus `0.75` words (as words are on average five characters long)
8686
- For images, `1` token corresponds to a square of pixels. For example, `mistral-small-3.1-24b-instruct-2503` model image tokens are `28x28` pixels (28-pixels height, and 28-pixels width, hence `784` pixels in total).
8787
- For audio:
88-
- `1` token corresponds to a time duration. For example, `voxtral-small-24b-2507` model audio tokens are `80` milliseconds.
89-
- Some models process audio by chunks having a minimum duration. For example, `voxtral-small-24b-2507` model process audio by `30` seconds chunks. This means an audio of `13` seconds will be considered `375` tokens (`30` seconds / `0.08` seconds). And an audio of `178` seconds will considered `2 250` tokens (`30` seconds * `6` / `0.08` seconds).
88+
- `1` token corresponds to a duration of time. For example, `voxtral-small-24b-2507` model audio tokens are `80` milliseconds.
89+
- Some models process audio in chunks having a minimum duration. For example, `voxtral-small-24b-2507` model process audio in `30` second chunks. This means audio lasting `13` seconds will be considered `375` tokens (`30` seconds / `0.08` seconds). And audio lasting `178` seconds will be considered `2 250` tokens (`30` seconds * `6` / `0.08` seconds).
9090

91-
The exact token count and definition depend on [tokenizers](https://huggingface.co/learn/llm-course/en/chapter2/4) used by each model. When this difference is significant (such as for image processing), you can find detailed information in each model's documentation (for instance in [`mistral-small-3.1-24b-instruct-2503` size limit documentation](/managed-inference/reference-content/model-catalog/#mistral-small-31-24b-instruct-2503)). When the model is open, you can also find this information in the model files on platforms such as Hugging Face, usually in the `tokenizer_config.json` file.
91+
The exact token count and definition depend on the [tokenizer](https://huggingface.co/learn/llm-course/en/chapter2/4) used by each model. When this difference is significant (such as for image processing), you can find detailed information in each model's documentation (for instance in [`mistral-small-3.1-24b-instruct-2503` size limit documentation](/managed-inference/reference-content/model-catalog/#mistral-small-31-24b-instruct-2503)). When the model is open, you can also find this information in the model files on platforms such as Hugging Face, usually in the `tokenizer_config.json` file.
9292

9393
### How can I monitor my token consumption?
9494
You can see your token consumption in [Scaleway Cockpit](/cockpit/). You can access it from the Scaleway console under the [Metrics tab](https://console.scaleway.com/generative-api/metrics).
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
---
2+
title: How to query audio models
3+
description: Learn how to interact with powerful audio models using Scaleway's Generative APIs service.
4+
tags: generative-apis ai-data audio-models voxtral audio-model
5+
dates:
6+
validation: 2025-08-22
7+
posted: 2024-08-28
8+
---
9+
import Requirements from '@macros/iam/requirements.mdx'
10+
import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx'
11+
12+
Scaleway's Generative APIs service allows users to interact with powerful audio models hosted on the platform.
13+
14+
There are several ways to interact with audio models:
15+
- The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
16+
- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion)
17+
18+
<Requirements />
19+
20+
- A Scaleway account logged into the [console](https://console.scaleway.com)
21+
- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization
22+
- A valid [API key](/iam/how-to/create-api-keys/) for API authentication
23+
- Python 3.7+ installed on your system
24+
25+
## Accessing the Playground
26+
27+
Scaleway provides a web playground for instruct-based models hosted on Generative APIs.
28+
29+
1. Navigate to **Generative APIs** under the **AI** section of the [Scaleway console](https://console.scaleway.com/) side menu. The list of models you can query displays.
30+
2. Click the name of the chat model you want to try. Alternatively, click <Icon name="more" /> next to the chat model, and click **Try model** in the menu.
31+
32+
The web playground displays.
33+
34+
## Using the playground
35+
36+
1. Enter a prompt at the bottom of the page, or use one of the suggested prompts in the conversation area.
37+
2. Edit the hyperparameters listed on the right column, for example the default temperature for more or less randomness on the outputs.
38+
3. Switch models at the top of the page, to observe the capabilities of chat models offered via Generative APIs.
39+
4. Click **View code** to get code snippets configured according to your settings in the playground.
40+
41+
<Message type="tip">
42+
You can also use the upload button to send supported audio file formats, such as MP3, to the model for transcription purposes.
43+
</Message>
44+
45+
## Querying audio models via API
46+
47+
You can query the models programmatically using your favorite tools or languages.
48+
In the example that follows, we will use the OpenAI Python client.
49+
50+
### Installing the OpenAI SDK
51+
52+
Install the OpenAI SDK using pip:
53+
54+
```bash
55+
pip install openai
56+
```
57+
58+
### Initializing the client
59+
60+
Initialize the OpenAI client with your base URL and API key:
61+
62+
```python
63+
from openai import OpenAI
64+
65+
# Initialize the client with your base URL and API key
66+
client = OpenAI(
67+
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
68+
api_key="<SCW_SECRET_KEY>" # Your unique API secret key from Scaleway
69+
)
70+
```
71+
72+
### Transcribing audio
73+
74+
You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be local or remote.
75+
76+
### Transcribing a local audio file
77+
78+
In the example below, a local audio file called `scaleway-ai-revolution.mp3` is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen.
79+
80+
```python
81+
import base64
82+
83+
MODEL = "voxtral-small-24b-2507"
84+
85+
with open('scaleway-ai-revolution.mp3', 'rb') as raw_file:
86+
audio_data = raw_file.read()
87+
encoded_string = base64.b64encode(audio_data).decode("utf-8")
88+
89+
content = [
90+
{
91+
"role": "user",
92+
"content": [
93+
{
94+
"type": "text",
95+
"text": "Transcribe this audio"
96+
},
97+
{
98+
"type": "input_audio",
99+
"input_audio": {
100+
"data": encoded_string,
101+
"format": "mp3"
102+
}
103+
}
104+
]
105+
}
106+
]
107+
108+
109+
response = client.chat.completions.create(
110+
model=MODEL,
111+
messages=content,
112+
temperature=0.2, # Adjusts creativity
113+
max_tokens=2048, # Limits the length of the output
114+
top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature.
115+
)
116+
117+
print(response.choices[0].message.content)
118+
```
119+
120+
Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
121+
122+
### Transcribing a remote audio file
123+
124+
In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen.
125+
126+
```python
127+
import base64
128+
import requests
129+
130+
MODEL = "voxtral-small-24b-2507"
131+
132+
url = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3"
133+
response = requests.get(url)
134+
audio_data = response.content
135+
encoded_string = base64.b64encode(audio_data).decode("utf-8")
136+
137+
content = [
138+
{
139+
"role": "user",
140+
"content": [
141+
{
142+
"type": "text",
143+
"text": "Transcribe this audio"
144+
},
145+
{
146+
"type": "input_audio",
147+
"input_audio": {
148+
"data": encoded_string,
149+
"format": "mp3"
150+
}
151+
}
152+
]
153+
}
154+
]
155+
156+
157+
response = client.chat.completions.create(
158+
model=MODEL,
159+
messages=content,
160+
temperature=0.2, # Adjusts creativity
161+
max_tokens=2048, # Limits the length of the output
162+
top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature.
163+
)
164+
165+
print(response.choices[0].message.content)
166+
167+
```
168+
169+
Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.

0 commit comments

Comments
 (0)