You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: macros/ai/chat-comp-vs-responses-api.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,6 @@ Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/gener
6
6
7
7
The **Chat Completions** API was released in 2023, and is an industry standard for building AI applications, being specifically designed for handling multi-turn conversations. It is stateless, but allows users to manage conversation history by appending each new message to the ongoing conversation. It supports `function` tool-calling, where the developer defines a set of functions, which the model can decide whether to call when generating a response. If it decides to call one of these functions, it returns the function name and arguments, and the developer's own code must actually execute the function and feed the result back into the conversation for use by the model.
8
8
9
-
The **Responses** API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response, though currently only `function` tools are supported by Scaleway. Overall, Scaleway's support for the Responses API is currently at beta stage. All supported Generative API models can be used with Responses API, and note that for the `gtp-oss-120b` model, only the Responses API will allow you to access all of its features.
9
+
The **Responses** API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response, though currently only `function` tools are supported by Scaleway. Overall, **Scaleway's support for the Responses API is currently at beta stage**. All supported Generative API models can be used with Responses API, and note that for the `gtp-oss-120b` model, only the Responses API will allow you to access all of its features.
10
10
11
11
For full details on the differences between these APIs, see the [official OpenAI documentation](https://platform.openai.com/docs/guides/migrate-to-responses).
Copy file name to clipboardExpand all lines: pages/generative-apis/how-to/query-language-models.mdx
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -82,15 +82,15 @@ You can now create a chat completion using either the Chat Completions or Respon
82
82
model="llama-3.1-8b-instruct",
83
83
messages=[{"role": "user", "content": "Describe a futuristic city with advanced technology and green energy solutions."}],
84
84
temperature=0.2, # Adjusts creativity
85
-
max_tokens=100, # Limits the length of the output
85
+
max_completion_tokens=100, # Limits the length of the output
86
86
top_p=0.7# Controls diversity through nucleus sampling. You usually only need to use temperature.
87
87
)
88
88
89
89
# Print the generated response
90
90
print(response.choices[0].message.content)
91
91
```
92
92
93
-
This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively.
93
+
This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_completion_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively.
94
94
95
95
</TabsTab>
96
96
@@ -129,6 +129,8 @@ A conversation style may include a default system prompt. You may set this promp
129
129
]
130
130
```
131
131
132
+
Adding such a system prompt can also help resolve issues if you receive responses such as `I'm not sure what tools are available to me. Can you please provide a library of tools that I can use to generate a response?`.
133
+
132
134
### Model parameters and their effects
133
135
134
136
The following parameters will influence the output of the model:
@@ -139,7 +141,7 @@ The following parameters will influence the output of the model:
139
141
140
142
-**`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`.
141
143
-**`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
142
-
-**`max_tokens`**: The maximum number of tokens (words or parts of words) in the generated output.
144
+
-**`max_completion_tokens`**: The maximum number of tokens (words or parts of words) in the generated output.
143
145
-**`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
144
146
-**`stop`**: A string or list of strings where the model will stop generating further tokens. This is useful for controlling the end of the output.
145
147
@@ -210,7 +212,7 @@ The service also supports asynchronous mode for any chat completion.
210
212
)
211
213
212
214
asyncdefmain():
213
-
stream =awaitclient.chat.completions.create(
215
+
stream = client.chat.completions.create(
214
216
model="llama-3.1-8b-instruct",
215
217
messages=[{
216
218
"role": "user",
@@ -237,7 +239,7 @@ The service also supports asynchronous mode for any chat completion.
@@ -109,15 +109,10 @@ You can now create a chat completion:
109
109
print(response.choices[0].message.content)
110
110
```
111
111
</TabsTab>
112
-
<TabsTablabel="Responses API">
112
+
<TabsTablabel="Responses API (Beta)">
113
113
```python
114
114
from openai import OpenAI
115
115
116
-
# Initialize the client with your base URL and API key
117
-
client = OpenAI(
118
-
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
119
-
api_key="<SCW_SECRET_KEY>"# Your unique API secret key from Scaleway
120
-
)
121
116
# Create a chat completion using the 'mistral-small-3.2-24b-instruct-2506' model
122
117
response = client.responses.create(
123
118
model="mistral-small-3.2-24b-instruct-2506",
@@ -169,7 +164,7 @@ To encode Base64 images in Python, you first need to install `Pillow` library:
169
164
pip install pillow
170
165
```
171
166
172
-
Then, the following Python code sample shows you how to encode an image in Base64 format and pass it to your request payload:
167
+
Then, the following Python code sample shows you how to encode an image in Base64 format and pass it to a request payload for the Chat Completions API:
173
168
174
169
```python
175
170
import base64
@@ -207,9 +202,9 @@ payload = {
207
202
208
203
```
209
204
210
-
### Model parameters and their effects
205
+
### Model parameters and their effects
211
206
212
-
The following parameters will influence the output of the model:
207
+
When using the Chat Completions API, the following parameters will influence the output of the model:
213
208
214
209
-**`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`. The content is an array that can contain text and/or image objects.
215
210
-**`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
@@ -225,142 +220,65 @@ The following parameters will influence the output of the model:
225
220
226
221
By default, the outputs are returned to the client only after the generation process is complete. However, a common alternative is to stream the results back to the client as they are generated. This is particularly useful in chat applications, where it allows the client to view the results incrementally as each token is produced.
227
222
228
-
Examples are provided below:
229
-
230
-
<Tabsid="vision-streaming">
231
-
<TabsTablabel="Chat Completions API">
232
-
```python
233
-
from openai import OpenAI
234
-
235
-
client = OpenAI(
236
-
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
237
-
api_key="<SCW_API_KEY>"# Your unique API key from Scaleway
0 commit comments