You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) are OpenAI-compatible REST APIs that can be used for generating and manipulating conversations. The Chat Completions API is focused on generating conversational responses, while the Responses API is a more general REST API for chat, structured outputs, tool use, and multimodal inputs.
6
+
7
+
The **Chat Completions** API was released in 2023, and is an industry standard for building AI applications, being specifically designed for handling multi-turn conversations. It is stateless, but allows users to manage conversation history by appending each new message to the ongoing conversation. Messages in the conversation can include text, images and audio extracts. The API supports `function` tool-calling, allowing developers to define functions that the model can choose to call. If it does so, it returns the function name and arguments, which the developer's code must execute and feed back into the conversation.
8
+
9
+
The **Responses** API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response.
10
+
11
+
<Messagetype="note">
12
+
Scaleway's support for the Responses API is currently at beta stage. Support of the full feature set will be incremental: currently statefulness and tools other than `function` calling are not supported.
13
+
</Message>
14
+
15
+
Most supported Generative API models can be used with both Chat Completions and Responses API. For the **`gtp-oss-120b` model, use of the Responses API is recommended, as it will allow you to access all of its features, especially tool-calling.
16
+
17
+
For full details on the differences between these APIs, see the [official OpenAI documentation](https://platform.openai.com/docs/guides/migrate-to-responses).
Copy file name to clipboardExpand all lines: pages/generative-apis/faq.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,7 @@ No, you cannot increase maximum output tokens above [limits for each models](/ge
40
40
These limits are in place to protect you against:
41
41
- Long generation which may be ended by an HTTP timeout. Limits are designed to ensure a model will send its HTTP response in less than 5 minutes.
42
42
- Uncontrolled billing, as several models are known to be able to enter infinite generation loops (specific prompts can make the model generate the same sentence over and over, without stopping at all).
43
-
If you require higher maximum output tokens, you can use [Managed Inference](https://console.scaleway.com/inference/deployments) where these limts do not apply (as your bill will be limited by the size of your deployment).
43
+
If you require higher maximum output tokens, you can use [Managed Inference](https://console.scaleway.com/inference/deployments) where these limits do not apply (as your bill will be limited by the size of your deployment).
44
44
45
45
### Can I use OpenAI libraries and APIs with Scaleway's Generative APIs?
46
46
Yes, Scaleway's Generative APIs are designed to be compatible with OpenAI libraries and SDKs, including the OpenAI Python client library and LangChain SDKs. This allows for seamless integration with existing workflows.
Scaleway's Generative APIs service allows users to interact with powerful language models hosted on the platform.
13
13
14
14
There are several ways to interact with language models:
15
15
- The Scaleway [console](https://console.scaleway.com) provides complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
16
-
- Via the [Chat API](/generative-apis/how-to/query-language-models/#querying-language-models-via-api)
16
+
- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response)
17
17
18
18
<Requirements />
19
19
@@ -39,10 +39,12 @@ The web playground displays.
39
39
40
40
## Querying language models via API
41
41
42
-
The [Chat API](/generative-apis/api-cli/using-chat-api/) is an OpenAI-compatible REST API for generating and manipulating conversations.
43
-
44
42
You can query the models programmatically using your favorite tools or languages.
45
-
In the following example, we will use the OpenAI Python client.
43
+
In the example that follows, we will use the OpenAI Python client.
44
+
45
+
### Chat Completions API or Responses API?
46
+
47
+
<ChatCompVsResponsesApi />
46
48
47
49
### Installing the OpenAI SDK
48
50
@@ -68,48 +70,97 @@ client = OpenAI(
68
70
69
71
### Generating a chat completion
70
72
71
-
You can now create a chat completion, for example with the `llama-3.1-8b-instruct` model:
73
+
You can now create a chat completion using either the Chat Completions or Responses API, as shown in the following examples:
72
74
73
-
```python
74
-
# Create a chat completion using the 'llama-3.1-8b-instruct' model
75
-
response = client.chat.completions.create(
76
-
model="llama-3.1-8b-instruct",
77
-
messages=[{"role": "user", "content": "Describe a futuristic city with advanced technology and green energy solutions."}],
78
-
temperature=0.2, # Adjusts creativity
79
-
max_tokens=100, # Limits the length of the output
80
-
top_p=0.7# Controls diversity through nucleus sampling. You usually only need to use temperature.
81
-
)
75
+
<Tabsid="generating-chat-completion">
82
76
83
-
# Print the generated response
84
-
print(response.choices[0].message.content)
85
-
```
77
+
<TabsTablabel="Chat Completions API">
78
+
79
+
```python
80
+
# Create a chat completion using the 'llama-3.1-8b-instruct' model
81
+
response = client.chat.completions.create(
82
+
model="llama-3.1-8b-instruct",
83
+
messages=[{"role": "user", "content": "Describe a futuristic city with advanced technology and green energy solutions."}],
84
+
temperature=0.2, # Adjusts creativity
85
+
max_completion_tokens=100, # Limits the length of the output
86
+
top_p=0.7# Controls diversity through nucleus sampling. You usually only need to use temperature.
87
+
)
88
+
89
+
# Print the generated response
90
+
print(response.choices[0].message.content)
91
+
```
92
+
93
+
This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_completion_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively.
94
+
95
+
</TabsTab>
96
+
97
+
<TabsTablabel="Responses API (Beta)">
98
+
99
+
```python
100
+
# Create a chat completion using the 'gpt-oss-120b' model
101
+
response = client.responses.create(
102
+
model="gpt-oss-120b",
103
+
input=[{"role": "user", "content": "Briefly describe a futuristic city with advanced technology and green energy solutions."}],
104
+
temperature=0.2, # Adjusts creativity
105
+
max_output_tokens=100, # Limits the length of the output
106
+
top_p=0.7# Controls diversity through nucleus sampling. You usually only need to use temperature.
107
+
108
+
)
109
+
# Print the generated response. Here, the last output message will contain the final content.
110
+
# Previous outputs will contain reasoning content.
111
+
print(response.output[-1].content[0].text)
112
+
```
113
+
</TabsTab>
114
+
</Tabs>
86
115
87
-
This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively.
88
116
89
117
A conversation style may include a default system prompt. You may set this prompt by setting the first message with the role system. For example:
90
118
91
-
```python
92
-
[
93
-
{
94
-
"role": "system",
95
-
"content": "You are Xavier Niel."
96
-
},
97
-
{
98
-
"role": "user",
99
-
"content": "Hello, what is your name?"
100
-
}
101
-
]
102
-
```
119
+
```python
120
+
[
121
+
{
122
+
"role": "system",
123
+
"content": "You are Xavier Niel."
124
+
},
125
+
{
126
+
"role": "user",
127
+
"content": "Hello, what is your name?"
128
+
}
129
+
]
130
+
```
131
+
132
+
Adding such a system prompt can also help resolve issues if you receive responses such as `I'm not sure what tools are available to me. Can you please provide a library of tools that I can use to generate a response?`.
103
133
104
134
### Model parameters and their effects
105
135
106
136
The following parameters will influence the output of the model:
107
137
108
-
-**`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`.
109
-
-**`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
110
-
-**`max_tokens`**: The maximum number of tokens (words or parts of words) in the generated output.
111
-
-**`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
112
-
-**`stop`**: A string or list of strings where the model will stop generating further tokens. This is useful for controlling the end of the output.
138
+
<Tabsid="model-params">
139
+
140
+
<TabsTablabel="Chat Completions API">
141
+
142
+
-**`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`.
143
+
-**`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
144
+
-**`max_completion_tokens`**: The maximum number of tokens (words or parts of words) in the generated output.
145
+
-**`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
146
+
-**`stop`**: A string or list of strings where the model will stop generating further tokens. This is useful for controlling the end of the output.
147
+
148
+
See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
149
+
150
+
</TabsTab>
151
+
152
+
<TabsTablabel="Responses API (Beta)">
153
+
154
+
-**`input`**: A single text string, or an array of string/multi-modal inputs to provide to the model to generate a response. When using the array option, you can define a `role` and list of `content` inputs of different types (texts, files, images etc.)
155
+
-**`max_output_tokens`**: A maximum number of output tokens that can be generated for a completion. Different default maximum values
156
+
are enforced for each model, to avoid edge cases where tokens are generated indefinitely.
157
+
-**`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
158
+
-**`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
159
+
160
+
See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) for a full list of all available parameters.
161
+
162
+
</TabsTab>
163
+
</Tabs>
113
164
114
165
<Messagetype="warning">
115
166
If you encounter an error such as "Forbidden 403" refer to the [API documentation](/generative-apis/api-cli/understanding-errors) for troubleshooting tips.
@@ -118,7 +169,8 @@ The following parameters will influence the output of the model:
118
169
## Streaming
119
170
120
171
By default, the outputs are returned to the client only after the generation process is complete. However, a common alternative is to stream the results back to the client as they are generated. This is particularly useful in chat applications, where it allows the client to view the results incrementally as each token is produced.
121
-
Following is an example using the chat completions API:
172
+
173
+
Following is an example using the Chat Completions API, but the `stream` parameter can be set in the same way with the Responses API.
122
174
123
175
```python
124
176
from openai import OpenAI
@@ -145,28 +197,62 @@ for chunk in response:
145
197
146
198
The service also supports asynchronous mode for any chat completion.
147
199
148
-
```python
149
-
150
-
import asyncio
151
-
from openai import AsyncOpenAI
152
-
153
-
client = AsyncOpenAI(
154
-
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
155
-
api_key="<SCW_API_KEY>"# Your unique API key from Scaleway
156
-
)
157
-
158
-
asyncdefmain():
159
-
stream =await client.chat.completions.create(
160
-
model="llama-3.1-8b-instruct",
161
-
messages=[{
162
-
"role": "user",
163
-
"content": "Sing me a song",
164
-
}],
165
-
stream=True,
166
-
)
167
-
asyncfor chunk in stream:
168
-
if chunk.choices and chunk.choices[0].delta.content:
169
-
print(chunk.choices[0].delta.content, end="")
170
-
171
-
asyncio.run(main())
172
-
```
200
+
<Tabsid="async">
201
+
202
+
<TabsTablabel="Chat Completions API">
203
+
204
+
```python
205
+
206
+
import asyncio
207
+
from openai import AsyncOpenAI
208
+
209
+
client = AsyncOpenAI(
210
+
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
211
+
api_key="<SCW_API_KEY>"# Your unique API key from Scaleway
212
+
)
213
+
214
+
asyncdefmain():
215
+
stream = client.chat.completions.create(
216
+
model="llama-3.1-8b-instruct",
217
+
messages=[{
218
+
"role": "user",
219
+
"content": "Sing me a song",
220
+
}],
221
+
stream=True,
222
+
)
223
+
asyncfor chunk in stream:
224
+
if chunk.choices and chunk.choices[0].delta.content:
225
+
print(chunk.choices[0].delta.content, end="")
226
+
227
+
asyncio.run(main())
228
+
```
229
+
</TabsTab>
230
+
<TabsTablabel="Responses API (Beta)">
231
+
```python
232
+
import asyncio
233
+
from openai import AsyncOpenAI
234
+
235
+
client = AsyncOpenAI(
236
+
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
237
+
api_key="<SCW_API_KEY>"# Your unique API key from Scaleway
0 commit comments