Skip to content

Commit e08c529

Browse files
RoRoJfpagny
andauthored
feat(gen apis): add audio transcription API (#5686)
* feat(genapis): add how to query audio models * feat(genapis): add audio transcriptions api * Apply suggestions from code review Co-authored-by: fpagny <[email protected]> * fix(genapis): review --------- Co-authored-by: fpagny <[email protected]>
1 parent 59b1d02 commit e08c529

File tree

1 file changed

+134
-87
lines changed

1 file changed

+134
-87
lines changed

pages/generative-apis/how-to/query-audio-models.mdx

Lines changed: 134 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: How to query audio models
33
description: Learn how to interact with powerful audio models using Scaleway's Generative APIs service.
44
tags: generative-apis ai-data audio-models voxtral
55
dates:
6-
validation: 2025-09-22
6+
validation: 2025-10-17
77
posted: 2025-09-22
88
---
99
import Requirements from '@macros/iam/requirements.mdx'
@@ -12,7 +12,7 @@ Scaleway's Generative APIs service allows users to interact with powerful audio
1212

1313
There are several ways to interact with audio models:
1414
- The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
15-
- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion)
15+
- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Audio Transcriptions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-audio-create-an-audio-transcription)
1616

1717
<Requirements />
1818

@@ -21,7 +21,7 @@ There are several ways to interact with audio models:
2121
- A valid [API key](/iam/how-to/create-api-keys/) for API authentication
2222
- Python 3.7+ installed on your system
2323

24-
## Accessing the Playground
24+
## Accessing the playground
2525

2626
Scaleway provides a web playground for instruct-based models hosted on Generative APIs.
2727

@@ -46,6 +46,20 @@ You can also use the upload button to send supported audio file formats, such as
4646
You can query the models programmatically using your favorite tools or languages.
4747
In the example that follows, we will use the OpenAI Python client.
4848

49+
### Audio Transcriptions API or Chat Completions API?
50+
51+
Both the [Audio Transcriptions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-audio-create-an-audio-transcription) and the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) are OpenAI-compatible REST APIs that accept audio input.
52+
53+
The **Audio Transcriptions API** is designed for pure speech-to-text (audio transcription) tasks, such as transcribing a voice note or meeting recording file. It can be used with compatible audio models, such as `whisper-large-v3`.
54+
55+
The **Chat Completions API** is more suitable for understanding audio input as part of a broader task, rather than a pure transcription task. For example, building a voice chat assistant which listens and responds in natural language, or sending multiple inputs (audio and text) to be interpreted or classified (answering questions like "Is this audio a ringtone?"). This API can be used for audio tasks with compatible multimodal models, such as `voxtral-small-24b`.
56+
57+
<Message type="note">
58+
Scaleway's support for the Audio Transcriptions API is currently at beta stage. Support of the full feature set will be incremental.
59+
</Message>
60+
61+
For full details on these APIs, see the [reference documentation](https://www.scaleway.com/en/developers/api/generative-apis/).
62+
4963
### Installing the OpenAI SDK
5064

5165
Install the OpenAI SDK using pip:
@@ -70,98 +84,131 @@ client = OpenAI(
7084

7185
### Transcribing audio
7286

73-
You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be remote or local.
87+
You can now generate a text transcription of a given audio file using a suitable API / model combination of your choice.
7488

75-
#### Transcribing a remote audio file
89+
<Tabs id="transcribing-audio">
7690

77-
In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen.
91+
<TabsTab label="Audio Transcriptions API (Beta)">
92+
93+
<Message type="note">
94+
The Audio Transcriptions API expects audio files to be found locally. It does not support passing the URL of a remote audio file.
95+
</Message>
7896

79-
```python
80-
import base64
81-
import requests
82-
83-
MODEL = "voxtral-small-24b-2507"
84-
85-
url = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3"
86-
response = requests.get(url)
87-
audio_data = response.content
88-
encoded_string = base64.b64encode(audio_data).decode("utf-8")
89-
90-
content = [
91-
{
92-
"role": "user",
93-
"content": [
94-
{
95-
"type": "text",
96-
"text": "Transcribe this audio"
97-
},
98-
{
99-
"type": "input_audio",
100-
"input_audio": {
101-
"data": encoded_string,
102-
"format": "mp3"
103-
}
104-
}
105-
]
106-
}
107-
]
108-
109-
110-
response = client.chat.completions.create(
111-
model=MODEL,
112-
messages=content,
113-
temperature=0.2, # Adjusts creativity
114-
max_tokens=2048, # Limits the length of the output
115-
top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature.
116-
)
97+
In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is sent to the model. The resulting text transcription is printed to the screen.
11798

118-
print(response.choices[0].message.content)
119-
```
99+
```python
100+
MODEL = "openai/whisper-large-v3:fp16"
101+
AUDIO = 'scaleway-ai-revolution.mp3'
120102

121-
Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
103+
audio_file = open(AUDIO, "rb")
122104

123-
#### Transcribing a local audio file
105+
response = client.audio.transcriptions.create(
106+
model=MODEL,
107+
file=audio_file,
108+
language='en'
109+
)
124110

125-
In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen.
111+
print(response.text)
112+
```
126113

127-
```python
128-
import base64
129-
130-
MODEL = "voxtral-small-24b-2507"
131-
132-
with open('scaleway-ai-revolution.mp3', 'rb') as raw_file:
133-
audio_data = raw_file.read()
134-
encoded_string = base64.b64encode(audio_data).decode("utf-8")
135-
136-
content = [
137-
{
138-
"role": "user",
139-
"content": [
140-
{
141-
"type": "text",
142-
"text": "Transcribe this audio"
143-
},
144-
{
145-
"type": "input_audio",
146-
"input_audio": {
147-
"data": encoded_string,
148-
"format": "mp3"
114+
See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-audio-create-an-audio-transcription) for a full list of all available parameters.
115+
116+
</TabsTab>
117+
118+
<TabsTab label="Chat Completions API">
119+
120+
#### Transcribing a remote audio file
121+
122+
In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen.
123+
124+
```python
125+
import base64
126+
import requests
127+
128+
MODEL = "voxtral-small-24b-2507"
129+
130+
url = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3"
131+
response = requests.get(url)
132+
audio_data = response.content
133+
encoded_string = base64.b64encode(audio_data).decode("utf-8")
134+
135+
content = [
136+
{
137+
"role": "user",
138+
"content": [
139+
{
140+
"type": "text",
141+
"text": "Transcribe this audio"
142+
},
143+
{
144+
"type": "input_audio",
145+
"input_audio": {
146+
"data": encoded_string,
147+
"format": "mp3"
148+
}
149149
}
150-
}
151-
]
152-
}
153-
]
154-
155-
156-
response = client.chat.completions.create(
157-
model=MODEL,
158-
messages=content,
159-
temperature=0.2, # Adjusts creativity
160-
max_tokens=2048, # Limits the length of the output
161-
top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature.
162-
)
150+
]
151+
}
152+
]
153+
154+
155+
response = client.chat.completions.create(
156+
model=MODEL,
157+
messages=content,
158+
temperature=0.2, # Adjusts creativity
159+
max_tokens=2048, # Limits the length of the output
160+
top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature.
161+
)
162+
163+
print(response.choices[0].message.content)
164+
```
165+
166+
See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-an) for a full list of all available parameters.
167+
168+
#### Transcribing a local audio file
169+
170+
In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen.
171+
172+
```python
173+
import base64
174+
175+
MODEL = "voxtral-small-24b-2507"
176+
177+
with open('scaleway-ai-revolution.mp3', 'rb') as raw_file:
178+
audio_data = raw_file.read()
179+
encoded_string = base64.b64encode(audio_data).decode("utf-8")
180+
181+
content = [
182+
{
183+
"role": "user",
184+
"content": [
185+
{
186+
"type": "text",
187+
"text": "Transcribe this audio"
188+
},
189+
{
190+
"type": "input_audio",
191+
"input_audio": {
192+
"data": encoded_string,
193+
"format": "mp3"
194+
}
195+
}
196+
]
197+
}
198+
]
163199

164-
print(response.choices[0].message.content)
165-
```
166200

167-
Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
201+
response = client.chat.completions.create(
202+
model=MODEL,
203+
messages=content,
204+
temperature=0.2, # Adjusts creativity
205+
max_tokens=2048, # Limits the length of the output
206+
top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature.
207+
)
208+
209+
print(response.choices[0].message.content)
210+
```
211+
212+
Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
213+
</TabsTab>
214+
</Tabs>

0 commit comments

Comments
 (0)