Skip to content

Commit d7e9af3

Browse files
committed
feat(genapi): introducing vision models
1 parent 8b839ba commit d7e9af3

File tree

3 files changed

+255
-14
lines changed

3 files changed

+255
-14
lines changed

ai-data/generative-apis/how-to/query-text-models.mdx renamed to ai-data/generative-apis/how-to/query-language-models.mdx

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,24 @@
11
---
22
meta:
3-
title: How to query text models
4-
description: Learn how to interact with powerful text models using Scaleway's Generative APIs service.
3+
title: How to query language models
4+
description: Learn how to interact with powerful language models using Scaleway's Generative APIs service.
55
content:
6-
h1: How to query text models
7-
paragraph: Learn how to interact with powerful text models using Scaleway's Generative APIs service.
8-
tags: generative-apis ai-data text-models
6+
h1: How to query language models
7+
paragraph: Learn how to interact with powerful language models using Scaleway's Generative APIs service.
8+
tags: generative-apis ai-data language-models
99
dates:
10-
validation: 2024-08-28
10+
validation: 2024-09-30
1111
posted: 2024-08-28
1212
---
1313

14-
Scaleway's Generative APIs service allows users to interact with powerful text models hosted on the platform.
14+
Scaleway's Generative APIs service allows users to interact with powerful language models hosted on the platform.
1515

16-
There are several ways to interact with text models:
17-
- The Scaleway [console](https://console.scaleway.com) will soon provide a complete [playground](/ai-data/generative-apis/how-to/query-text-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
18-
- Via the [Chat API](/ai-data/generative-apis/how-to/query-text-models/#querying-text-models-via-api)
16+
There are several ways to interact with language models:
17+
- The Scaleway [console](https://console.scaleway.com) provides complete [playground](/ai-data/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
18+
- Via the [Chat API](/ai-data/generative-apis/how-to/query-language-models/#querying-language-models-via-api)
1919

2020
<Macro id="requirements" />
2121

22-
- Access to this service is restricted while in beta. You can request access to the product by filling out a form on Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-apis).
2322
- A Scaleway account logged into the [console](https://console.scaleway.com)
2423
- [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization
2524
- A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) for API authentication
@@ -40,7 +39,7 @@ The web playground displays.
4039
3. Switch model at the top of the page, to observe the capabilities of chat models offered via Generative APIs.
4140
4. Click **View code** to get code snippets configured according to your settings in the playground.
4241

43-
## Querying text models via API
42+
## Querying language models via API
4443

4544
The [Chat API](/ai-data/generative-apis/api-cli/using-chat-api/) is an OpenAI-compatible REST API for generating and manipulating conversations.
4645

Lines changed: 238 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
---
2+
meta:
3+
title: How to query vision models
4+
description: Learn how to interact with powerful vision models using Scaleway's Generative APIs service.
5+
content:
6+
h1: How to query vision models
7+
paragraph: Learn how to interact with powerful vision models using Scaleway's Generative APIs service.
8+
tags: generative-apis ai-data vision-models
9+
dates:
10+
validation: 2024-09-30
11+
posted: 2024-09-30
12+
---
13+
14+
Scaleway's Generative APIs service allows users to interact with powerful vision models hosted on the platform.
15+
16+
<Message type="note">
17+
Vision models can understand and analyze images, not generate them.
18+
</Message>
19+
20+
There are several ways to interact with vision models:
21+
- The Scaleway [console](https://console.scaleway.com) provides complete [playground](/ai-data/generative-apis/how-to/query-vision-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
22+
- Via the [Chat API](/ai-data/generative-apis/how-to/query-vision-models/#querying-vision-models-via-api)
23+
24+
<Macro id="requirements" />
25+
26+
- A Scaleway account logged into the [console](https://console.scaleway.com)
27+
- [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization
28+
- A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) for API authentication
29+
- Python 3.7+ installed on your system
30+
31+
## Accessing the Playground
32+
33+
Scaleway provides a web playground for vision models hosted on Generative APIs.
34+
35+
1. Navigate to Generative APIs under the AI section of the [Scaleway console](https://console.scaleway.com/) side menu. The list of models you can query displays.
36+
2. Click the name of the vision model you want to try. Alternatively, click <Icon name="more" /> next to the vision model, and click **Try model** in the menu.
37+
38+
The web playground displays.
39+
40+
## Using the Playground
41+
1. Upload one or multiple images to the prompt area at the bottom of the page. Enter a prompt, for example, to describe the image(s) you attached.
42+
2. Edit the hyperparameters listed on the right column, for example the default temperature for more or less randomness on the outputs.
43+
3. Switch model at the top of the page, to observe the capabilities of chat and vision models offered via Generative APIs.
44+
4. Click **View code** to get code snippets configured according to your settings in the playground.
45+
46+
## Querying vision models via API
47+
48+
The [Chat API](/ai-data/generative-apis/api-cli/using-chat-api/) is an OpenAI-compatible REST API for generating and manipulating conversations.
49+
50+
You can query the vision models programmatically using your favorite tools or languages.
51+
Vision models take both text and images as inputs.
52+
53+
<Message type="tip">
54+
Unlike traditional language models, vision models will take a content array for the user role, structuring text and images as inputs.
55+
</Message>
56+
57+
In the following example, we will use the OpenAI Python client.
58+
59+
### Installing the OpenAI SDK
60+
61+
Install the OpenAI SDK using pip:
62+
63+
```bash
64+
pip install openai
65+
```
66+
67+
### Initializing the client
68+
69+
Initialize the OpenAI client with your base URL and API key:
70+
71+
```python
72+
from openai import OpenAI
73+
74+
# Initialize the client with your base URL and API key
75+
client = OpenAI(
76+
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
77+
api_key="<SCW_SECRET_KEY>" # Your unique API secret key from Scaleway
78+
)
79+
```
80+
81+
### Generating a chat completion
82+
83+
You can now create a chat completion, for example with the `pixtral-12b-2409` model:
84+
85+
```python
86+
# Create a chat completion using the 'pixtral-12b-2409' model
87+
response = client.chat.completions.create(
88+
model="pixtral-12b-2409",
89+
messages=[
90+
{
91+
"role": "user",
92+
"content": [
93+
{"type": "text", "text": "What is this image?"},
94+
{"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
95+
] # Vision models will take a content array with text and image_url objects.
96+
97+
}
98+
],
99+
temperature=0.7, # Adjusts creativity
100+
max_tokens=2048, # Limits the length of the output
101+
top_p=0.9 # Controls diversity through nucleus sampling. You usually only need to use temperature.
102+
)
103+
104+
# Print the generated response
105+
print(response.choices[0].message.content)
106+
```
107+
108+
This code sends messages, prompt and image, to the vision model and returns an answer based on your input. The `temperature`, `max_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively.
109+
110+
A conversation style may include a default system prompt. You may set this prompt by setting the first message with the role system. For example:
111+
112+
```python
113+
[
114+
{
115+
"role": "system",
116+
"content": "You are Xavier Niel."
117+
}
118+
]
119+
```
120+
121+
### Passing images to Pixtral
122+
123+
1. **Image URLs**: If the image is available online, you can just include the image URL in your request as demonstrated above. This approach is simple and does not require any encoding.
124+
2. **Base64 encoded**: image Base64 encoding is a standard way to transform binary data, like images, into a text format, making it easier to transmit over the internet.
125+
126+
The following Python code sample shows you how to encode an image in base64 format and pass it to your request payload.
127+
128+
```python
129+
import base64
130+
from io import BytesIO
131+
from PIL import Image
132+
133+
def encode_image(img):
134+
buffered = BytesIO()
135+
img.save(buffered, format="JPEG")
136+
encoded_string = base64.b64encode(buffered.getvalue()).decode("utf-8")
137+
return encoded_string
138+
139+
img = Image.open("path_to_your_image.jpg")
140+
base64_img = encode_image(img)
141+
142+
payload = {
143+
"messages": [
144+
{
145+
"role": "user",
146+
"content": [
147+
{
148+
"type": "text",
149+
"text": "What is this image?"
150+
},
151+
{
152+
"type": "image_url",
153+
"image_url": {
154+
"url": f"data:image/jpeg;base64,{base64_img}"
155+
}
156+
}
157+
]
158+
}
159+
],
160+
... # other parameters
161+
}
162+
163+
```
164+
165+
### Model parameters and their effects
166+
167+
The following parameters will influence the output of the model:
168+
169+
- **`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`. The content is an array that can contain text and/or image objects.
170+
- **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
171+
- **`max_tokens`**: The maximum number of tokens (words or parts of words) in the generated output.
172+
- **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
173+
- **`stop`**: A string or list of strings where the model will stop generating further tokens. This is useful for controlling the end of the output.
174+
175+
<Message type="warning">
176+
If you encounter an error such as "Forbidden 403" refer to the [API documentation](/ai-data/generative-apis/api-cli/understanding-errors) for troubleshooting tips.
177+
</Message>
178+
179+
## Streaming
180+
181+
By default, the outputs are returned to the client only after the generation process is complete. However, a common alternative is to stream the results back to the client as they are generated. This is particularly useful in chat applications, where it allows the client to view the results incrementally as each token is produced.
182+
Following is an example using the chat completions API:
183+
184+
```python
185+
from openai import OpenAI
186+
187+
client = OpenAI(
188+
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
189+
api_key="<SCW_API_KEY>" # Your unique API key from Scaleway
190+
)
191+
response = client.chat.completions.create(
192+
model="pixtral-12b-2409",
193+
messages=[{
194+
"role": "user",
195+
"content": [
196+
{"type": "text", "text": "What is this image?"},
197+
{"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
198+
]
199+
}],
200+
stream=True,
201+
)
202+
203+
for chunk in response:
204+
if chunk.choices[0].delta.content:
205+
print(chunk.choices[0].delta.content, end="")
206+
```
207+
208+
## Async
209+
210+
The service also supports asynchronous mode for any chat completion.
211+
212+
```python
213+
214+
import asyncio
215+
from openai import AsyncOpenAI
216+
217+
client = AsyncOpenAI(
218+
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
219+
api_key="<SCW_API_KEY>" # Your unique API key from Scaleway
220+
)
221+
222+
async def main():
223+
stream = await client.chat.completions.create(
224+
model="pixtral-12b-2409",
225+
messages=[{
226+
"role": "user",
227+
"content": [
228+
{"type": "text", "text": "What is this image?"},
229+
{"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
230+
]
231+
}],
232+
stream=True,
233+
)
234+
async for chunk in stream:
235+
print(chunk.choices[0].delta.content, end="")
236+
237+
asyncio.run(main())
238+
```

menu/navigation.json

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -660,13 +660,17 @@
660660
{
661661
"items": [
662662
{
663-
"label": "Query text models",
664-
"slug": "query-text-models"
663+
"label": "Query language models",
664+
"slug": "query-language-models"
665665
},
666666
{
667667
"label": "Query embedding models",
668668
"slug": "query-embedding-models"
669669
},
670+
{
671+
"label": "Query vision models",
672+
"slug": "query-vision-models"
673+
},
670674
{
671675
"label": "Use structured outputs",
672676
"slug": "use-structured-outputs"

0 commit comments

Comments
 (0)