You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pages/generative-apis/how-to/query-audio-models.mdx
+18-16Lines changed: 18 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ Scaleway's Generative APIs service allows users to interact with powerful audio
12
12
13
13
There are several ways to interact with audio models:
14
14
- The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
15
-
- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Audio Transcriptions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-TODO)
15
+
- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Audio Transcriptions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-audio-create-an-audio-transcription)
16
16
17
17
<Requirements />
18
18
@@ -21,7 +21,7 @@ There are several ways to interact with audio models:
21
21
- A valid [API key](/iam/how-to/create-api-keys/) for API authentication
22
22
- Python 3.7+ installed on your system
23
23
24
-
## Accessing the Playground
24
+
## Accessing the playground
25
25
26
26
Scaleway provides a web playground for instruct-based models hosted on Generative APIs.
27
27
@@ -48,14 +48,14 @@ In the example that follows, we will use the OpenAI Python client.
48
48
49
49
### Chat Completions API or Audio Transcriptions API?
50
50
51
-
Both the [Chat Completions API](TODO) and the [Audio Transcriptions API](TODO) are OpenAI-compatible REST APIs that accept audio input.
51
+
Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Audio Transcriptions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-audio-create-an-audio-transcription) are OpenAI-compatible REST APIs that accept audio input.
52
52
53
-
The **Chat Completions API** is more suitable when transcribing audio input is part of a broader task, rather than pure transcription. Examples could include building a voice chat assistant which listens and responds in natural language, or sending multiple inputs (audio and text) to be interpreted and commented on. This API can be used with compatible multimodal models, such as `voxtral-small-24b`.
53
+
The **Chat Completions API** is more suitable when transcribing audio input is part of a broader task, rather than a pure transcription task. For example, building a voice chat assistant which listens and responds in natural language, or sending multiple inputs (audio and text) to be interpreted. This API can be used for audio tasks with compatible multimodal models, such as `voxtral-small-24b`.
54
54
55
55
The **Audio Transcriptions API** is designed for pure speech-to-text (audio transcription) tasks, such as transcribing a voice note or meeting recording file. It can be used with compatible audio models, such as `whisper-large-v3`.
56
56
57
57
<Messagetype="note">
58
-
Scaleway's support for the Audio Transcriptions API is currently at beta stage. TODO CHECK: incremental support of feature set?
58
+
Scaleway's support for the Audio Transcriptions API is currently at beta stage. Support of the full feature set will be incremental.
59
59
</Message>
60
60
61
61
For full details on the differences between these APIs, see the [official OpenAI documentation](https://platform.openai.com/docs/guides/audio#choosing-the-right-api).
@@ -88,28 +88,30 @@ You can now generate a text transcription of a given audio file using a suitable
88
88
89
89
<Tabsid="transcribing-audio">
90
90
91
-
<TabsTablabel="Audio Transcriptions API">
91
+
<TabsTablabel="Audio Transcriptions API (Beta)">
92
+
93
+
<Messagetype="note">
94
+
The Audio Transcriptions API expects audio files to be found locally. It does not support passing the URL of a remote audio file.
api_key=os.getenv("SCW_SECRET_KEY") # Your unique API secret key from Scaleway
99
-
)
97
+
In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is sent to the model. The resulting text transcription is printed to the screen.
100
98
99
+
```python
101
100
MODEL="openai/whisper-large-v3:fp16"
102
-
AUDIO = 'interview-jbk-62s.mp3'
101
+
AUDIO='scaleway-ai-revolution.mp3'
103
102
104
103
audio_file =open(AUDIO, "rb")
105
104
106
105
response = client.audio.transcriptions.create(
107
106
model=MODEL,
108
107
file=audio_file,
109
-
language='fr'
108
+
language='en'
110
109
)
111
110
112
111
print(response.text)
112
+
```
113
+
114
+
See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-audio-create-an-audio-transcription) for a full list of all available parameters.
113
115
114
116
</TabsTab>
115
117
@@ -161,7 +163,7 @@ You can now generate a text transcription of a given audio file using a suitable
161
163
print(response.choices[0].message.content)
162
164
```
163
165
164
-
Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
166
+
See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-an) for a full list of all available parameters.
0 commit comments