|
| 1 | +--- |
| 2 | +title: How to query audio models |
| 3 | +description: Learn how to interact with powerful audio models using Scaleway's Generative APIs service. |
| 4 | +tags: generative-apis ai-data audio-models voxtral audio-model |
| 5 | +dates: |
| 6 | + validation: 2025-08-22 |
| 7 | + posted: 2024-08-28 |
| 8 | +--- |
| 9 | +import Requirements from '@macros/iam/requirements.mdx' |
| 10 | +import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx' |
| 11 | + |
| 12 | +Scaleway's Generative APIs service allows users to interact with powerful audio models hosted on the platform. |
| 13 | + |
| 14 | +There are several ways to interact with audio models: |
| 15 | +- The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time. |
| 16 | +- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) |
| 17 | + |
| 18 | +<Requirements /> |
| 19 | + |
| 20 | +- A Scaleway account logged into the [console](https://console.scaleway.com) |
| 21 | +- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization |
| 22 | +- A valid [API key](/iam/how-to/create-api-keys/) for API authentication |
| 23 | +- Python 3.7+ installed on your system |
| 24 | + |
| 25 | +## Accessing the Playground |
| 26 | + |
| 27 | +Scaleway provides a web playground for instruct-based models hosted on Generative APIs. |
| 28 | + |
| 29 | +1. Navigate to **Generative APIs** under the **AI** section of the [Scaleway console](https://console.scaleway.com/) side menu. The list of models you can query displays. |
| 30 | +2. Click the name of the chat model you want to try. Alternatively, click <Icon name="more" /> next to the chat model, and click **Try model** in the menu. |
| 31 | + |
| 32 | +The web playground displays. |
| 33 | + |
| 34 | +## Using the playground |
| 35 | + |
| 36 | +1. Enter a prompt at the bottom of the page, or use one of the suggested prompts in the conversation area. |
| 37 | +2. Edit the hyperparameters listed on the right column, for example the default temperature for more or less randomness on the outputs. |
| 38 | +3. Switch models at the top of the page, to observe the capabilities of chat models offered via Generative APIs. |
| 39 | +4. Click **View code** to get code snippets configured according to your settings in the playground. |
| 40 | + |
| 41 | +<Message type="tip"> |
| 42 | +You can also use the upload button to send supported audio file formats, such as MP3, to the model for transcription purposes. |
| 43 | +</Message> |
| 44 | + |
| 45 | +## Querying audio models via API |
| 46 | + |
| 47 | +You can query the models programmatically using your favorite tools or languages. |
| 48 | +In the example that follows, we will use the OpenAI Python client. |
| 49 | + |
| 50 | +### Installing the OpenAI SDK |
| 51 | + |
| 52 | +Install the OpenAI SDK using pip: |
| 53 | + |
| 54 | +```bash |
| 55 | +pip install openai |
| 56 | +``` |
| 57 | + |
| 58 | +### Initializing the client |
| 59 | + |
| 60 | +Initialize the OpenAI client with your base URL and API key: |
| 61 | + |
| 62 | +```python |
| 63 | +from openai import OpenAI |
| 64 | + |
| 65 | +# Initialize the client with your base URL and API key |
| 66 | +client = OpenAI( |
| 67 | + base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL |
| 68 | + api_key="<SCW_SECRET_KEY>" # Your unique API secret key from Scaleway |
| 69 | +) |
| 70 | +``` |
| 71 | + |
| 72 | +### Transcribing audio |
| 73 | + |
| 74 | +You can now generate a text transcription of a given audio file using the Chat Completions API. This audio file can be local or remote. |
| 75 | + |
| 76 | +### Transcribing a local audio file |
| 77 | + |
| 78 | +In the example below, a local audio file called `scaleway-ai-revolution.mp3` is base-64 encoded and sent to the model, alongside a transcription prompt. The resulting text transcription is printed to the screen. |
| 79 | + |
| 80 | +```python |
| 81 | +import base64 |
| 82 | + |
| 83 | +MODEL = "voxtral-small-24b-2507" |
| 84 | + |
| 85 | +with open('scaleway-ai-revolution.mp3', 'rb') as raw_file: |
| 86 | + audio_data = raw_file.read() |
| 87 | +encoded_string = base64.b64encode(audio_data).decode("utf-8") |
| 88 | + |
| 89 | +content = [ |
| 90 | + { |
| 91 | + "role": "user", |
| 92 | + "content": [ |
| 93 | + { |
| 94 | + "type": "text", |
| 95 | + "text": "Transcribe this audio" |
| 96 | + }, |
| 97 | + { |
| 98 | + "type": "input_audio", |
| 99 | + "input_audio": { |
| 100 | + "data": encoded_string, |
| 101 | + "format": "mp3" |
| 102 | + } |
| 103 | + } |
| 104 | + ] |
| 105 | + } |
| 106 | + ] |
| 107 | + |
| 108 | + |
| 109 | +response = client.chat.completions.create( |
| 110 | + model=MODEL, |
| 111 | + messages=content, |
| 112 | + temperature=0.2, # Adjusts creativity |
| 113 | + max_tokens=2048, # Limits the length of the output |
| 114 | + top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature. |
| 115 | +) |
| 116 | + |
| 117 | +print(response.choices[0].message.content) |
| 118 | +``` |
| 119 | + |
| 120 | +Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters. |
| 121 | + |
| 122 | +### Transcribing a remote audio file |
| 123 | + |
| 124 | +In the example below, an audio file from a remote URL (`https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3`) is downloaded using the `requests` library, base64-encoded, and then sent to the model in a chat completion request alongside a transcription prompt. The resulting text transcription is printed to the screen. |
| 125 | + |
| 126 | +```python |
| 127 | +import base64 |
| 128 | +import requests |
| 129 | + |
| 130 | +MODEL = "voxtral-small-24b-2507" |
| 131 | + |
| 132 | +url = "https://genapi-documentation-assets.s3.fr-par.scw.cloud/scaleway-ai-revolution.mp3" |
| 133 | +response = requests.get(url) |
| 134 | +audio_data = response.content |
| 135 | +encoded_string = base64.b64encode(audio_data).decode("utf-8") |
| 136 | + |
| 137 | +content = [ |
| 138 | + { |
| 139 | + "role": "user", |
| 140 | + "content": [ |
| 141 | + { |
| 142 | + "type": "text", |
| 143 | + "text": "Transcribe this audio" |
| 144 | + }, |
| 145 | + { |
| 146 | + "type": "input_audio", |
| 147 | + "input_audio": { |
| 148 | + "data": encoded_string, |
| 149 | + "format": "mp3" |
| 150 | + } |
| 151 | + } |
| 152 | + ] |
| 153 | + } |
| 154 | + ] |
| 155 | + |
| 156 | + |
| 157 | +response = client.chat.completions.create( |
| 158 | + model=MODEL, |
| 159 | + messages=content, |
| 160 | + temperature=0.2, # Adjusts creativity |
| 161 | + max_tokens=2048, # Limits the length of the output |
| 162 | + top_p=0.95 # Controls diversity through nucleus sampling. You usually only need to use temperature. |
| 163 | +) |
| 164 | + |
| 165 | +print(response.choices[0].message.content) |
| 166 | + |
| 167 | +``` |
| 168 | + |
| 169 | +Various parameters such as `temperature` and `max_tokens` control the output. See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters. |
0 commit comments