From 682639dbc88b4c7b8e4e5af73192dfec9bd6a1d7 Mon Sep 17 00:00:00 2001 From: fpagny Date: Fri, 12 Sep 2025 15:56:50 +0200 Subject: [PATCH 01/10] feat(genapi): update supported models --- .../reference-content/supported-models.mdx | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/pages/generative-apis/reference-content/supported-models.mdx b/pages/generative-apis/reference-content/supported-models.mdx index 6236874f1e..61976f99c1 100644 --- a/pages/generative-apis/reference-content/supported-models.mdx +++ b/pages/generative-apis/reference-content/supported-models.mdx @@ -3,19 +3,27 @@ title: Supported models description: This page lists which open-source chat or embedding models Scaleway is currently hosting tags: generative-apis ai-data supported-models dates: - validation: 2025-08-20 + validation: 2025-09-12 posted: 2024-09-02 --- -Our API supports the most popular models for [Chat](/generative-apis/how-to/query-language-models), [Vision](/generative-apis/how-to/query-vision-models/) and [Embeddings](/generative-apis/how-to/query-embedding-models/). +Our API supports the most popular models for [Chat](/generative-apis/how-to/query-language-models), [Vision](/generative-apis/how-to/query-vision-models/), [Audio](/generative-apis/how-to/query-audio-models/) and [Embeddings](/generative-apis/how-to/query-embedding-models/). -## Multimodal models (chat and vision) +## Multimodal models + +### Chat and Vision models | Provider | Model string | Context window (Tokens) | Maximum output (Tokens)| License | Model card | |-----------------|-----------------|-----------------|-----------------|-----------------|-----------------| | Google (Preview) | `gemma-3-27b-it` | 40k | 8192 | [Gemma](https://ai.google.dev/gemma/terms) | [HF](https://huggingface.co/google/gemma-3-27b-it) | | Mistral | `mistral-small-3.2-24b-instruct-2506` | 128k | 8192 | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506) | +### Chat and Audio models + +| Provider | Model string | Context window (Tokens) | Maximum output (Tokens)| License | Model card | +|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------| +| Mistral | `voxtral-small-24b-2507` | 32k | 8192 | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/mistralai/Voxtral-Small-24B-2507) | + ## Chat models | Provider | Model string | Context window (Tokens) | Maximum output (Tokens)| License | Model card | From 8a545d9965ea40efc4bd72065c12ea983a483461 Mon Sep 17 00:00:00 2001 From: fpagny Date: Fri, 12 Sep 2025 16:45:41 +0200 Subject: [PATCH 02/10] feat(genapi): update model catalog with voxtral --- .../reference-content/model-catalog.mdx | 26 +++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index 68ac784629..e16fb6f671 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -30,6 +30,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib | [`mistral-small-3.2-24b-instruct-2506`](#mistral-small-32-24b-instruct-2506) | Mistral | 128k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`mistral-small-3.1-24b-instruct-2503`](#mistral-small-31-24b-instruct-2503) | Mistral | 128k | Text, Vision | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`mistral-small-24b-instruct-2501`](#mistral-small-24b-instruct-2501) | Mistral | 32k | Text | L40S (20k), H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`voxtral-small-24b-2507`](#voxtral-small-24b-2507) | Mistral | 32k | Text | H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`mistral-nemo-instruct-2407`](#mistral-nemo-instruct-2407) | Mistral | 128k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`mixtral-8x7b-instruct-v0.1`](#mixtral-8x7b-instruct-v01) | Mistral | 32k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`magistral-small-2506`](#magistral-small-2506) | Mistral | 32k | Text | L40S, H100, H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | @@ -60,6 +61,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib | `mistral-small-3.2-24b-instruct-2506` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi | | `mistral-small-3.1-24b-instruct-2503` | Yes | Yes | English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi | | `mistral-small-24b-instruct-2501` | Yes | Yes | English, French, German, Dutch, Spanish, Italian, Polish, Portuguese, Chinese, Japanese, Korean | +| `voxtral-small-24b-2507` | Yes | Yes | English, French, German, Dutch, Spanish, Italian, Portuguese, Hindi | | `mistral-nemo-instruct-2407` | Yes | Yes | English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese | | `mixtral-8x7b-instruct-v0.1` | Yes | No | English, French, German, Italian, Spanish | | `magistral-small-2506` | Yes | Yes | English, French, German, Spanish, Portuguese, Italian, Japanese, Korean, Russian, Chinese, Arabic, Persian, Indonesian, Malay, Nepali, Polish, Romanian, Serbian, Swedish, Turkish, Ukrainian, Vietnamese, Hindi, Bengali | @@ -164,6 +166,30 @@ Vision-language models like Molmo can analyze an image and offer insights from v allenai/molmo-72b-0924:fp8 ``` +## Multimodal models (Text and Audio) + +### Voxtral-small-24b-2507 +Voxtral-small-24b-2507 is a model developed by Mistral to perform text processing and audio analysis on many languages. +This model was optimized to enable transcription in many languages while keeping conversational capabilities (translations, classification...) + +| Attribute | Value | +|-----------|-------| +| Supports parallel tool calling | Yes | +| Supported audio formats | WAV and MP3 | +| Audio chunk duration | 30 seconds | +| Token duration (audio)| 80ms | + +#### Model names +``` +mistral/voxtral-small-24b-2507:bf16 +mistral/voxtral-small-24b-2507:fp8 +``` + +- Mono and stereo audio formats are supported. For stereo formats, both left and right channels are merged before being processed. +- Audio files are processed by 30 seconds chunks: + - If audio sent is less than 30 seconds, the rest of a chunk will be considered silent. + - 80ms is equal to 1 input token + ## Text models ### Qwen3-235b-a22b-instruct-2507 From d3f274ab71d886bef5de1b4d1d93683997355ede Mon Sep 17 00:00:00 2001 From: fpagny Date: Fri, 12 Sep 2025 16:49:48 +0200 Subject: [PATCH 03/10] feat(genapi): update rate limits with voxstral --- .../additional-content/organization-quotas.mdx | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/pages/organizations-and-projects/additional-content/organization-quotas.mdx b/pages/organizations-and-projects/additional-content/organization-quotas.mdx index 0b4e9ace02..eb95ee67ff 100644 --- a/pages/organizations-and-projects/additional-content/organization-quotas.mdx +++ b/pages/organizations-and-projects/additional-content/organization-quotas.mdx @@ -203,6 +203,7 @@ Generative APIs are rate limited based on: | mistral-small-3.1-24b-instruct-2503 | 200k | 400k | | mistral-small-3.2-24b-instruct-2506 | 200k | 400k | | mistral-nemo-instruct-2407 | 200k | 400k | +| voxtral-small-24b-2507 | 200k | 400k | | pixtral-12b-2409 | 200k | 400k | | qwen3-235b-a22b-instruct-2507 | 200k | 400k | | qwen2.5-coder-32b-instruct | 200k | 400k | @@ -221,6 +222,10 @@ Generative APIs are rate limited based on: | mistral-small-3.1-24b-instruct-2503 | 300 | 600 | | mistral-small-3.2-24b-instruct-2506 | 300 | 600 | | mistral-nemo-instruct-2407 | 300 | 600 | +<<<<<<< HEAD +======= +| voxtral-small-24b-2507 | 300 | 600 | +>>>>>>> 238655665 (feat(genapi): update rate limits with voxstral) | pixtral-12b-2409 | 300 | 600 | | qwen3-235b-a22b-instruct-2507 | 300 | 600 | | qwen2.5-coder-32b-instruct | 300 | 600 | From 086bdb914aff6672da08ad04c94c70c14c8681b7 Mon Sep 17 00:00:00 2001 From: fpagny Date: Fri, 12 Sep 2025 17:09:00 +0200 Subject: [PATCH 04/10] feat(genapi): update faq for audio models --- pages/generative-apis/faq.mdx | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/pages/generative-apis/faq.mdx b/pages/generative-apis/faq.mdx index 84010a2d6f..bacca1277e 100644 --- a/pages/generative-apis/faq.mdx +++ b/pages/generative-apis/faq.mdx @@ -83,7 +83,10 @@ Note that in this example, the first line where the free tier applies will not d ### What are tokens and how are they counted? A token is the minimum unit of content that is seen and processed by a model. Hence, token definitions depend on input types: - For text, on average, `1` token corresponds to `~4` characters, and thus `0.75` words (as words are on average five characters long) -- For images, `1` token corresponds to a square of pixels. For example, `mistral-small-3.1-24b-instruct-2503` model image tokens of `28x28` pixels (28-pixels height, and 28-pixels width, hence `784` pixels in total). +- For images, `1` token corresponds to a square of pixels. For example, `mistral-small-3.1-24b-instruct-2503` model image tokens are `28x28` pixels (28-pixels height, and 28-pixels width, hence `784` pixels in total). +- For audio: + - `1` token corresponds to a time duration. For example, `voxtral-small-24b-2507` model audio tokens are `80` milliseconds. + - Some models process audio by chunks having a minimum duration. For example, `voxtral-small-24b-2507` model process audio by `30` seconds chunks. This means an audio of `13` seconds will be considered `375` tokens (`30` seconds / `0.08` seconds). And an audio of `178` seconds will considered `2 250` tokens (`30` seconds * `6` / `0.08` seconds). The exact token count and definition depend on [tokenizers](https://huggingface.co/learn/llm-course/en/chapter2/4) used by each model. When this difference is significant (such as for image processing), you can find detailed information in each model's documentation (for instance in [`mistral-small-3.1-24b-instruct-2503` size limit documentation](/managed-inference/reference-content/model-catalog/#mistral-small-31-24b-instruct-2503)). When the model is open, you can also find this information in the model files on platforms such as Hugging Face, usually in the `tokenizer_config.json` file. From 037820acd37844de92619c9b9f474f75d2a77d3d Mon Sep 17 00:00:00 2001 From: Rowena Date: Tue, 23 Sep 2025 17:59:34 +0200 Subject: [PATCH 05/10] feat(genapis): add audio model info --- menu/navigation.json | 4 + pages/generative-apis/faq.mdx | 6 +- .../how-to/query-audio-models.mdx | 169 ++++++++++++++++++ 3 files changed, 176 insertions(+), 3 deletions(-) create mode 100644 pages/generative-apis/how-to/query-audio-models.mdx diff --git a/menu/navigation.json b/menu/navigation.json index 97b07114d4..693274ba6a 100644 --- a/menu/navigation.json +++ b/menu/navigation.json @@ -812,6 +812,10 @@ "label": "Query code models", "slug": "query-code-models" }, + { + "label": "Query audio models", + "slug": "query-audio-models" + }, { "label": "Use structured outputs", "slug": "use-structured-outputs" diff --git a/pages/generative-apis/faq.mdx b/pages/generative-apis/faq.mdx index bacca1277e..97e9eb3b70 100644 --- a/pages/generative-apis/faq.mdx +++ b/pages/generative-apis/faq.mdx @@ -85,10 +85,10 @@ A token is the minimum unit of content that is seen and processed by a model. He - For text, on average, `1` token corresponds to `~4` characters, and thus `0.75` words (as words are on average five characters long) - For images, `1` token corresponds to a square of pixels. For example, `mistral-small-3.1-24b-instruct-2503` model image tokens are `28x28` pixels (28-pixels height, and 28-pixels width, hence `784` pixels in total). - For audio: - - `1` token corresponds to a time duration. For example, `voxtral-small-24b-2507` model audio tokens are `80` milliseconds. - - Some models process audio by chunks having a minimum duration. For example, `voxtral-small-24b-2507` model process audio by `30` seconds chunks. This means an audio of `13` seconds will be considered `375` tokens (`30` seconds / `0.08` seconds). And an audio of `178` seconds will considered `2 250` tokens (`30` seconds * `6` / `0.08` seconds). + - `1` token corresponds to a duration of time. For example, `voxtral-small-24b-2507` model audio tokens are `80` milliseconds. + - Some models process audio in chunks having a minimum duration. For example, `voxtral-small-24b-2507` model process audio in `30` second chunks. This means audio lasting `13` seconds will be considered `375` tokens (`30` seconds / `0.08` seconds). And audio lasting `178` seconds will be considered `2 250` tokens (`30` seconds * `6` / `0.08` seconds). -The exact token count and definition depend on [tokenizers](https://huggingface.co/learn/llm-course/en/chapter2/4) used by each model. When this difference is significant (such as for image processing), you can find detailed information in each model's documentation (for instance in [`mistral-small-3.1-24b-instruct-2503` size limit documentation](/managed-inference/reference-content/model-catalog/#mistral-small-31-24b-instruct-2503)). When the model is open, you can also find this information in the model files on platforms such as Hugging Face, usually in the `tokenizer_config.json` file. +The exact token count and definition depend on the [tokenizer](https://huggingface.co/learn/llm-course/en/chapter2/4) used by each model. When this difference is significant (such as for image processing), you can find detailed information in each model's documentation (for instance in [`mistral-small-3.1-24b-instruct-2503` size limit documentation](/managed-inference/reference-content/model-catalog/#mistral-small-31-24b-instruct-2503)). When the model is open, you can also find this information in the model files on platforms such as Hugging Face, usually in the `tokenizer_config.json` file. ### How can I monitor my token consumption? You can see your token consumption in [Scaleway Cockpit](/cockpit/). You can access it from the Scaleway console under the [Metrics tab](https://console.scaleway.com/generative-api/metrics). diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx new file mode 100644 index 0000000000..800d7b9a1d --- /dev/null +++ b/pages/generative-apis/how-to/query-audio-models.mdx @@ -0,0 +1,169 @@ +--- +title: How to query audio models +description: Learn how to interact with powerful audio models using Scaleway's Generative APIs service. +tags: generative-apis ai-data audio-models voxtral audio-model +dates: + validation: 2025-08-22 + posted: 2024-08-28 +--- +import Requirements from '@macros/iam/requirements.mdx' +import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx' + +Scaleway's Generative APIs service allows users to interact with powerful audio models hosted on the platform. + +There are several ways to interact with audio models: +- The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time. +- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) + + + +- A Scaleway account logged into the [console](https://console.scaleway.com) +- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization +- A valid [API key](/iam/how-to/create-api-keys/) for API authentication +- Python 3.7+ installed on your system + +## Accessing the Playground + +Scaleway provides a web playground for instruct-based models hosted on Generative APIs. + +1. Navigate to **Generative APIs** under the **AI** section of the [Scaleway console](https://console.scaleway.com/) side menu. The list of models you can query displays. +2. Click the name of the chat model you want to try. Alternatively, click next to the chat model, and click **Try model** in the menu. + +The web playground displays. + +## Using the playground + +1. Enter a prompt at the bottom of the page, or use one of the suggested prompts in the conversation area. +2. Edit the hyperparameters listed on the right column, for example the default temperature for more or less randomness on the outputs. +3. Switch models at the top of the page, to observe the capabilities of chat models offered via Generative APIs. +4. Click **View code** to get code snippets configured according to your settings in the playground. + + +You can also use the upload button to send supported audio file formats, such as MP3, to the model for transcription purposes. + + +## Querying audio models via API + +You can query the models programmatically using your favorite tools or languages. +In the example that follows, we will use the OpenAI Python client. + +### Installing the OpenAI SDK + +Install the OpenAI SDK using pip: + +```bash +pip install openai +``` + +### Initializing the client + +Initialize the OpenAI client with your base URL and API key: + +```python +from openai import OpenAI + +# Initialize the client with your base URL and API key +client = OpenAI( + base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL + api_key="" # Your unique API secret key from Scaleway +) +``` + +### Transcribing audio + +You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be local or remote. + +### Transcribing a local audio file + +In the example below, a local audio file called `scaleway-ai-revolution.mp3` is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen. + +```python +import base64 + +MODEL = "voxtral-small-24b-2507" + +with open('scaleway-ai-revolution.mp3', 'rb') as raw_file: + audio_data = raw_file.read() +encoded_string = base64.b64encode(audio_data).decode("utf-8") + +content = [ + { + "role": "user", + "content": [ + { + "type": "text", + "text": "Transcribe this audio" + }, + { + "type": "input_audio", + "input_audio": { + "data": encoded_string, + "format": "mp3" + } + } + ] + } + ] + + +response = client.chat.completions.create( + model=MODEL, + messages=content, + temperature=0.2, # Adjusts creativity + max_tokens=2048, # Limits the length of the output + top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature. +) + +print(response.choices[0].message.content) +``` + +Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters. + +### Transcribing a remote audio file + +In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen. + +```python +import base64 +import requests + +MODEL = "voxtral-small-24b-2507" + +url = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3" +response = requests.get(url) +audio_data = response.content +encoded_string = base64.b64encode(audio_data).decode("utf-8") + +content = [ + { + "role": "user", + "content": [ + { + "type": "text", + "text": "Transcribe this audio" + }, + { + "type": "input_audio", + "input_audio": { + "data": encoded_string, + "format": "mp3" + } + } + ] + } + ] + + +response = client.chat.completions.create( + model=MODEL, + messages=content, + temperature=0.2, # Adjusts creativity + max_tokens=2048, # Limits the length of the output + top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature. +) + +print(response.choices[0].message.content) + +``` + +Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters. From a08cc94610773c37881e953f23fcec10c3b9c55f Mon Sep 17 00:00:00 2001 From: Rowena Date: Tue, 23 Sep 2025 18:01:22 +0200 Subject: [PATCH 06/10] fix(genapis): fix dates --- pages/generative-apis/how-to/query-audio-models.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx index 800d7b9a1d..a36b7aee4f 100644 --- a/pages/generative-apis/how-to/query-audio-models.mdx +++ b/pages/generative-apis/how-to/query-audio-models.mdx @@ -3,8 +3,8 @@ title: How to query audio models description: Learn how to interact with powerful audio models using Scaleway's Generative APIs service. tags: generative-apis ai-data audio-models voxtral audio-model dates: - validation: 2025-08-22 - posted: 2024-08-28 + validation: 2025-09-22 + posted: 2025-09-22 --- import Requirements from '@macros/iam/requirements.mdx' import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx' @@ -73,7 +73,7 @@ client = OpenAI( You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be local or remote. -### Transcribing a local audio file +#### Transcribing a local audio file In the example below, a local audio file called `scaleway-ai-revolution.mp3` is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen. @@ -119,7 +119,7 @@ print(response.choices[0].message.content) Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters. -### Transcribing a remote audio file +#### Transcribing a remote audio file In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen. From d37d26106f8f2e2bb94a0a4258395334a64bb4fd Mon Sep 17 00:00:00 2001 From: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> Date: Wed, 24 Sep 2025 10:55:32 +0200 Subject: [PATCH 07/10] Apply suggestions from code review --- pages/generative-apis/how-to/query-audio-models.mdx | 6 ++---- pages/managed-inference/reference-content/model-catalog.mdx | 6 +++--- 2 files changed, 5 insertions(+), 7 deletions(-) diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx index a36b7aee4f..9b94b98d10 100644 --- a/pages/generative-apis/how-to/query-audio-models.mdx +++ b/pages/generative-apis/how-to/query-audio-models.mdx @@ -1,13 +1,12 @@ --- title: How to query audio models description: Learn how to interact with powerful audio models using Scaleway's Generative APIs service. -tags: generative-apis ai-data audio-models voxtral audio-model +tags: generative-apis ai-data audio-models voxtral dates: validation: 2025-09-22 posted: 2025-09-22 --- import Requirements from '@macros/iam/requirements.mdx' -import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx' Scaleway's Generative APIs service allows users to interact with powerful audio models hosted on the platform. @@ -39,7 +38,7 @@ The web playground displays. 4. Click **View code** to get code snippets configured according to your settings in the playground. -You can also use the upload button to send supported audio file formats, such as MP3, to the model for transcription purposes. +You can also use the upload button to send supported audio file formats, such as MP3, to audio models for transcription purposes. ## Querying audio models via API @@ -163,7 +162,6 @@ response = client.chat.completions.create( ) print(response.choices[0].message.content) - ``` Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters. diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index e16fb6f671..5c2ef3b6b5 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -170,7 +170,7 @@ allenai/molmo-72b-0924:fp8 ### Voxtral-small-24b-2507 Voxtral-small-24b-2507 is a model developed by Mistral to perform text processing and audio analysis on many languages. -This model was optimized to enable transcription in many languages while keeping conversational capabilities (translations, classification...) +This model was optimized to enable transcription in many languages while keeping conversational capabilities (translations, classification, etc.) | Attribute | Value | |-----------|-------| @@ -186,8 +186,8 @@ mistral/voxtral-small-24b-2507:fp8 ``` - Mono and stereo audio formats are supported. For stereo formats, both left and right channels are merged before being processed. -- Audio files are processed by 30 seconds chunks: - - If audio sent is less than 30 seconds, the rest of a chunk will be considered silent. +- Audio files are processed in 30 seconds chunks: + - If audio sent is less than 30 seconds, the rest of the chunk will be considered silent. - 80ms is equal to 1 input token ## Text models From 97c5b626529976c72adc6a653cfe501d1a00e6d2 Mon Sep 17 00:00:00 2001 From: Rowena Date: Wed, 24 Sep 2025 10:59:27 +0200 Subject: [PATCH 08/10] fix(genapis): fix conflcit --- .../additional-content/organization-quotas.mdx | 3 --- 1 file changed, 3 deletions(-) diff --git a/pages/organizations-and-projects/additional-content/organization-quotas.mdx b/pages/organizations-and-projects/additional-content/organization-quotas.mdx index eb95ee67ff..f02f9f552a 100644 --- a/pages/organizations-and-projects/additional-content/organization-quotas.mdx +++ b/pages/organizations-and-projects/additional-content/organization-quotas.mdx @@ -222,10 +222,7 @@ Generative APIs are rate limited based on: | mistral-small-3.1-24b-instruct-2503 | 300 | 600 | | mistral-small-3.2-24b-instruct-2506 | 300 | 600 | | mistral-nemo-instruct-2407 | 300 | 600 | -<<<<<<< HEAD -======= | voxtral-small-24b-2507 | 300 | 600 | ->>>>>>> 238655665 (feat(genapi): update rate limits with voxstral) | pixtral-12b-2409 | 300 | 600 | | qwen3-235b-a22b-instruct-2507 | 300 | 600 | | qwen2.5-coder-32b-instruct | 300 | 600 | From 36f6aafcf52850d851c083e09fada2954778ed6d Mon Sep 17 00:00:00 2001 From: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> Date: Wed, 24 Sep 2025 14:02:21 +0200 Subject: [PATCH 09/10] Update pages/generative-apis/how-to/query-audio-models.mdx Co-authored-by: fpagny --- pages/generative-apis/how-to/query-audio-models.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx index 9b94b98d10..90ed459187 100644 --- a/pages/generative-apis/how-to/query-audio-models.mdx +++ b/pages/generative-apis/how-to/query-audio-models.mdx @@ -74,7 +74,7 @@ You can now generate a text transcription of a given audio file using the Chat C #### Transcribing a local audio file -In the example below, a local audio file called `scaleway-ai-revolution.mp3` is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen. +In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen. ```python import base64 From ab9d0c6823569d144a147d30ce3a86ed1d8bd93e Mon Sep 17 00:00:00 2001 From: Rowena Date: Wed, 24 Sep 2025 14:09:12 +0200 Subject: [PATCH 10/10] fix(gen): switch order --- .../how-to/query-audio-models.mdx | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/pages/generative-apis/how-to/query-audio-models.mdx b/pages/generative-apis/how-to/query-audio-models.mdx index 90ed459187..5ab51a1212 100644 --- a/pages/generative-apis/how-to/query-audio-models.mdx +++ b/pages/generative-apis/how-to/query-audio-models.mdx @@ -70,19 +70,21 @@ client = OpenAI( ### Transcribing audio -You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be local or remote. +You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be remote or local. -#### Transcribing a local audio file +#### Transcribing a remote audio file -In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen. +In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen. ```python import base64 +import requests MODEL = "voxtral-small-24b-2507" -with open('scaleway-ai-revolution.mp3', 'rb') as raw_file: - audio_data = raw_file.read() +url = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3" +response = requests.get(url) +audio_data = response.content encoded_string = base64.b64encode(audio_data).decode("utf-8") content = [ @@ -118,19 +120,17 @@ print(response.choices[0].message.content) Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters. -#### Transcribing a remote audio file +#### Transcribing a local audio file -In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen. +In the example below, a local audio file [scaleway-ai-revolution.mp3](https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3) is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen. ```python import base64 -import requests MODEL = "voxtral-small-24b-2507" -url = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3" -response = requests.get(url) -audio_data = response.content +with open('scaleway-ai-revolution.mp3', 'rb') as raw_file: + audio_data = raw_file.read() encoded_string = base64.b64encode(audio_data).decode("utf-8") content = [ @@ -164,4 +164,4 @@ response = client.chat.completions.create( print(response.choices[0].message.content) ``` -Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters. +Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters. \ No newline at end of file