You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -12,7 +12,7 @@ Scaleway's Generative APIs service allows users to interact with powerful audio
12
12
13
13
There are several ways to interact with audio models:
14
14
- The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
15
-
- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion)
15
+
- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Audio Transcriptions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-TODO)
16
16
17
17
<Requirements />
18
18
@@ -46,6 +46,20 @@ You can also use the upload button to send supported audio file formats, such as
46
46
You can query the models programmatically using your favorite tools or languages.
47
47
In the example that follows, we will use the OpenAI Python client.
48
48
49
+
### Chat Completions API or Audio Transcriptions API?
50
+
51
+
Both the [Chat Completions API](TODO) and the [Audio Transcriptions API](TODO) are OpenAI-compatible REST APIs that accept audio input.
52
+
53
+
The **Chat Completions API** is more suitable when transcribing audio input is part of a broader task, rather than pure transcription. Examples could include building a voice chat assistant which listens and responds in natural language, or sending multiple inputs (audio and text) to be interpreted and commented on. This API can be used with compatible multimodal models, such as `voxtral-small-24b`.
54
+
55
+
The **Audio Transcriptions API** is designed for pure speech-to-text (audio transcription) tasks, such as transcribing a voice note or meeting recording file. It can be used with compatible audio models, such as `whisper-large-v3`.
56
+
57
+
<Messagetype="note">
58
+
Scaleway's support for the Audio Transcriptions API is currently at beta stage. TODO CHECK: incremental support of feature set?
59
+
</Message>
60
+
61
+
For full details on the differences between these APIs, see the [official OpenAI documentation](https://platform.openai.com/docs/guides/audio#choosing-the-right-api).
62
+
49
63
### Installing the OpenAI SDK
50
64
51
65
Install the OpenAI SDK using pip:
@@ -70,98 +84,129 @@ client = OpenAI(
70
84
71
85
### Transcribing audio
72
86
73
-
You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be remote or local.
87
+
You can now generate a text transcription of a given audio file using a suitable API / model combination of your choice.
74
88
75
-
#### Transcribing a remote audio file
89
+
<Tabsid="transcribing-audio">
76
90
77
-
In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen.
api_key=os.getenv("SCW_SECRET_KEY") # Your unique API secret key from Scaleway
99
+
)
120
100
121
-
Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
101
+
MODEL = "openai/whisper-large-v3:fp16"
102
+
AUDIO = 'interview-jbk-62s.mp3'
122
103
123
-
#### Transcribing a local audio file
104
+
audio_file = open(AUDIO, "rb")
124
105
125
-
In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen.
106
+
response = client.audio.transcriptions.create(
107
+
model=MODEL,
108
+
file=audio_file,
109
+
language='fr'
110
+
)
126
111
127
-
```python
128
-
import base64
129
-
130
-
MODEL="voxtral-small-24b-2507"
131
-
132
-
withopen('scaleway-ai-revolution.mp3', 'rb') as raw_file:
In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen.
max_tokens=2048, # Limits the length of the output
161
-
top_p=0.95# Controls diversity through nucleus sampling. You usually only need to use temperature.
162
-
)
148
+
]
149
+
}
150
+
]
151
+
152
+
153
+
response = client.chat.completions.create(
154
+
model=MODEL,
155
+
messages=content,
156
+
temperature=0.2, # Adjusts creativity
157
+
max_tokens=2048, # Limits the length of the output
158
+
top_p=0.95# Controls diversity through nucleus sampling. You usually only need to use temperature.
159
+
)
160
+
161
+
print(response.choices[0].message.content)
162
+
```
163
+
164
+
Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
165
+
166
+
#### Transcribing a local audio file
167
+
168
+
In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen.
169
+
170
+
```python
171
+
import base64
172
+
173
+
MODEL="voxtral-small-24b-2507"
174
+
175
+
withopen('scaleway-ai-revolution.mp3', 'rb') as raw_file:
Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
199
+
response = client.chat.completions.create(
200
+
model=MODEL,
201
+
messages=content,
202
+
temperature=0.2, # Adjusts creativity
203
+
max_tokens=2048, # Limits the length of the output
204
+
top_p=0.95# Controls diversity through nucleus sampling. You usually only need to use temperature.
205
+
)
206
+
207
+
print(response.choices[0].message.content)
208
+
```
209
+
210
+
Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
0 commit comments