From 5ad8732dc36243cd9b1de180c594a5a7b1dae8f4 Mon Sep 17 00:00:00 2001 From: Rowena Date: Fri, 17 Oct 2025 17:55:26 +0200 Subject: [PATCH 1/4] feat(genapis): add how to query audio models --- .../how-to/query-audio-models.mdx | 217 +++++++++++------- 1 file changed, 131 insertions(+), 86 deletions(-) diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx index 5ab51a1212..007512e87c 100644 --- a/pages/generative-apis/how-to/query-audio-models.mdx +++ b/pages/generative-apis/how-to/query-audio-models.mdx @@ -3,7 +3,7 @@ title: How to query audio models description: Learn how to interact with powerful audio models using Scaleway's Generative APIs service. tags: generative-apis ai-data audio-models voxtral dates: - validation: 2025-09-22 + validation: 2025-10-17 posted: 2025-09-22 --- import Requirements from '@macros/iam/requirements.mdx' @@ -12,7 +12,7 @@ Scaleway's Generative APIs service allows users to interact with powerful audio There are several ways to interact with audio models: - The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time. -- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) +- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Audio Transcriptions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-TODO) @@ -46,6 +46,20 @@ You can also use the upload button to send supported audio file formats, such as You can query the models programmatically using your favorite tools or languages. In the example that follows, we will use the OpenAI Python client. +### Chat Completions API or Audio Transcriptions API? + +Both the [Chat Completions API](TODO) and the [Audio Transcriptions API](TODO) are OpenAI-compatible REST APIs that accept audio input. + +The **Chat Completions API** is more suitable when transcribing audio input is part of a broader task, rather than pure transcription. Examples could include building a voice chat assistant which listens and responds in natural language, or sending multiple inputs (audio and text) to be interpreted and commented on. This API can be used with compatible multimodal models, such as `voxtral-small-24b`. + +The **Audio Transcriptions API** is designed for pure speech-to-text (audio transcription) tasks, such as transcribing a voice note or meeting recording file. It can be used with compatible audio models, such as `whisper-large-v3`. + + +Scaleway's support for the Audio Transcriptions API is currently at beta stage. TODO CHECK: incremental support of feature set? + + +For full details on the differences between these APIs, see the [official OpenAI documentation](https://platform.openai.com/docs/guides/audio#choosing-the-right-api). + ### Installing the OpenAI SDK Install the OpenAI SDK using pip: @@ -70,98 +84,129 @@ client = OpenAI( ### Transcribing audio -You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be remote or local. +You can now generate a text transcription of a given audio file using a suitable API / model combination of your choice. -#### Transcribing a remote audio file + -In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen. + -```python -import base64 -import requests - -MODEL = "voxtral-small-24b-2507" - -url = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3" -response = requests.get(url) -audio_data = response.content -encoded_string = base64.b64encode(audio_data).decode("utf-8") - -content = [ - { - "role": "user", - "content": [ - { - "type": "text", - "text": "Transcribe this audio" - }, - { - "type": "input_audio", - "input_audio": { - "data": encoded_string, - "format": "mp3" - } - } - ] - } - ] - - -response = client.chat.completions.create( - model=MODEL, - messages=content, - temperature=0.2, # Adjusts creativity - max_tokens=2048, # Limits the length of the output - top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature. -) + from openai import OpenAI + import os -print(response.choices[0].message.content) -``` + client = OpenAI( + base_url="https://aa2cee79-0e20-4515-8ec0-0a8084dfbd9e.ifr.fr-par.scaleway.com/v1", + api_key=os.getenv("SCW_SECRET_KEY") # Your unique API secret key from Scaleway + ) -Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters. + MODEL = "openai/whisper-large-v3:fp16" + AUDIO = 'interview-jbk-62s.mp3' -#### Transcribing a local audio file + audio_file = open(AUDIO, "rb") -In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen. + response = client.audio.transcriptions.create( + model=MODEL, + file=audio_file, + language='fr' + ) -```python -import base64 - -MODEL = "voxtral-small-24b-2507" - -with open('scaleway-ai-revolution.mp3', 'rb') as raw_file: - audio_data = raw_file.read() -encoded_string = base64.b64encode(audio_data).decode("utf-8") - -content = [ - { - "role": "user", - "content": [ - { - "type": "text", - "text": "Transcribe this audio" - }, - { - "type": "input_audio", - "input_audio": { - "data": encoded_string, - "format": "mp3" + print(response.text) + + + + + + #### Transcribing a remote audio file + + In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen. + + ```python + import base64 + import requests + + MODEL = "voxtral-small-24b-2507" + + url = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3" + response = requests.get(url) + audio_data = response.content + encoded_string = base64.b64encode(audio_data).decode("utf-8") + + content = [ + { + "role": "user", + "content": [ + { + "type": "text", + "text": "Transcribe this audio" + }, + { + "type": "input_audio", + "input_audio": { + "data": encoded_string, + "format": "mp3" + } } - } - ] - } - ] - - -response = client.chat.completions.create( - model=MODEL, - messages=content, - temperature=0.2, # Adjusts creativity - max_tokens=2048, # Limits the length of the output - top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature. -) + ] + } + ] + + + response = client.chat.completions.create( + model=MODEL, + messages=content, + temperature=0.2, # Adjusts creativity + max_tokens=2048, # Limits the length of the output + top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature. + ) + + print(response.choices[0].message.content) + ``` + + Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters. + + #### Transcribing a local audio file + + In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen. + + ```python + import base64 + + MODEL = "voxtral-small-24b-2507" + + with open('scaleway-ai-revolution.mp3', 'rb') as raw_file: + audio_data = raw_file.read() + encoded_string = base64.b64encode(audio_data).decode("utf-8") + + content = [ + { + "role": "user", + "content": [ + { + "type": "text", + "text": "Transcribe this audio" + }, + { + "type": "input_audio", + "input_audio": { + "data": encoded_string, + "format": "mp3" + } + } + ] + } + ] -print(response.choices[0].message.content) -``` -Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters. \ No newline at end of file + response = client.chat.completions.create( + model=MODEL, + messages=content, + temperature=0.2, # Adjusts creativity + max_tokens=2048, # Limits the length of the output + top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature. + ) + + print(response.choices[0].message.content) + ``` + + Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters. + + \ No newline at end of file From 9ac06ee242c81121bbbb83869cc91df6a807288c Mon Sep 17 00:00:00 2001 From: Rowena Date: Mon, 20 Oct 2025 17:16:09 +0200 Subject: [PATCH 2/4] feat(genapis): add audio transcriptions api --- .../how-to/query-audio-models.mdx | 34 ++++++++++--------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx index 007512e87c..462b19e8de 100644 --- a/pages/generative-apis/how-to/query-audio-models.mdx +++ b/pages/generative-apis/how-to/query-audio-models.mdx @@ -12,7 +12,7 @@ Scaleway's Generative APIs service allows users to interact with powerful audio There are several ways to interact with audio models: - The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time. -- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Audio Transcriptions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-TODO) +- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Audio Transcriptions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-audio-create-an-audio-transcription) @@ -21,7 +21,7 @@ There are several ways to interact with audio models: - A valid [API key](/iam/how-to/create-api-keys/) for API authentication - Python 3.7+ installed on your system -## Accessing the Playground +## Accessing the playground Scaleway provides a web playground for instruct-based models hosted on Generative APIs. @@ -48,14 +48,14 @@ In the example that follows, we will use the OpenAI Python client. ### Chat Completions API or Audio Transcriptions API? -Both the [Chat Completions API](TODO) and the [Audio Transcriptions API](TODO) are OpenAI-compatible REST APIs that accept audio input. +Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Audio Transcriptions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-audio-create-an-audio-transcription) are OpenAI-compatible REST APIs that accept audio input. -The **Chat Completions API** is more suitable when transcribing audio input is part of a broader task, rather than pure transcription. Examples could include building a voice chat assistant which listens and responds in natural language, or sending multiple inputs (audio and text) to be interpreted and commented on. This API can be used with compatible multimodal models, such as `voxtral-small-24b`. +The **Chat Completions API** is more suitable when transcribing audio input is part of a broader task, rather than a pure transcription task. For example, building a voice chat assistant which listens and responds in natural language, or sending multiple inputs (audio and text) to be interpreted. This API can be used for audio tasks with compatible multimodal models, such as `voxtral-small-24b`. The **Audio Transcriptions API** is designed for pure speech-to-text (audio transcription) tasks, such as transcribing a voice note or meeting recording file. It can be used with compatible audio models, such as `whisper-large-v3`. -Scaleway's support for the Audio Transcriptions API is currently at beta stage. TODO CHECK: incremental support of feature set? +Scaleway's support for the Audio Transcriptions API is currently at beta stage. Support of the full feature set will be incremental. For full details on the differences between these APIs, see the [official OpenAI documentation](https://platform.openai.com/docs/guides/audio#choosing-the-right-api). @@ -88,28 +88,30 @@ You can now generate a text transcription of a given audio file using a suitable - + + + + The Audio Transcriptions API expects audio files to be found locally. It does not support passing the URL of a remote audio file. + - from openai import OpenAI - import os - - client = OpenAI( - base_url="https://aa2cee79-0e20-4515-8ec0-0a8084dfbd9e.ifr.fr-par.scaleway.com/v1", - api_key=os.getenv("SCW_SECRET_KEY") # Your unique API secret key from Scaleway - ) + In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is sent to the model. The resulting text transcription is printed to the screen. + ```python MODEL = "openai/whisper-large-v3:fp16" - AUDIO = 'interview-jbk-62s.mp3' + AUDIO = 'scaleway-ai-revolution.mp3' audio_file = open(AUDIO, "rb") response = client.audio.transcriptions.create( model=MODEL, file=audio_file, - language='fr' + language='en' ) print(response.text) + ``` + + See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-audio-create-an-audio-transcription) for a full list of all available parameters. @@ -161,7 +163,7 @@ You can now generate a text transcription of a given audio file using a suitable print(response.choices[0].message.content) ``` - Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters. + See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-an) for a full list of all available parameters. #### Transcribing a local audio file From e41e82e50d7231e09ab515af7591b0696348a6cf Mon Sep 17 00:00:00 2001 From: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> Date: Tue, 21 Oct 2025 11:32:04 +0200 Subject: [PATCH 3/4] Apply suggestions from code review Co-authored-by: fpagny --- pages/generative-apis/how-to/query-audio-models.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx index 462b19e8de..d3994980e7 100644 --- a/pages/generative-apis/how-to/query-audio-models.mdx +++ b/pages/generative-apis/how-to/query-audio-models.mdx @@ -46,14 +46,14 @@ You can also use the upload button to send supported audio file formats, such as You can query the models programmatically using your favorite tools or languages. In the example that follows, we will use the OpenAI Python client. -### Chat Completions API or Audio Transcriptions API? +### Audio Transcriptions API or Chat Completions API? -Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Audio Transcriptions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-audio-create-an-audio-transcription) are OpenAI-compatible REST APIs that accept audio input. - -The **Chat Completions API** is more suitable when transcribing audio input is part of a broader task, rather than a pure transcription task. For example, building a voice chat assistant which listens and responds in natural language, or sending multiple inputs (audio and text) to be interpreted. This API can be used for audio tasks with compatible multimodal models, such as `voxtral-small-24b`. +Both the [Audio Transcriptions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-audio-create-an-audio-transcription) and the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) are OpenAI-compatible REST APIs that accept audio input. The **Audio Transcriptions API** is designed for pure speech-to-text (audio transcription) tasks, such as transcribing a voice note or meeting recording file. It can be used with compatible audio models, such as `whisper-large-v3`. +The **Chat Completions API** is more suitable for understanding audio input as part of a broader task, rather than a pure transcription task. For example, building a voice chat assistant which listens and responds in natural language, or sending multiple inputs (audio and text) to be interpreted or classified (answering questions like "Is this audio a ringtone?"). This API can be used for audio tasks with compatible multimodal models, such as `voxtral-small-24b`. + Scaleway's support for the Audio Transcriptions API is currently at beta stage. Support of the full feature set will be incremental. From eeecf71184e7c84c2bf3eae8045f57b416e6c0fe Mon Sep 17 00:00:00 2001 From: Rowena Date: Tue, 21 Oct 2025 11:35:56 +0200 Subject: [PATCH 4/4] fix(genapis): review --- pages/generative-apis/how-to/query-audio-models.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx index d3994980e7..fca47fdc51 100644 --- a/pages/generative-apis/how-to/query-audio-models.mdx +++ b/pages/generative-apis/how-to/query-audio-models.mdx @@ -58,7 +58,7 @@ The **Chat Completions API** is more suitable for understanding audio input as p Scaleway's support for the Audio Transcriptions API is currently at beta stage. Support of the full feature set will be incremental. -For full details on the differences between these APIs, see the [official OpenAI documentation](https://platform.openai.com/docs/guides/audio#choosing-the-right-api). +For full details on these APIs, see the [reference documentation](https://www.scaleway.com/en/developers/api/generative-apis/). ### Installing the OpenAI SDK