Skip to content

Commit 4ce46b9

Browse files
RoRoJgcalmettesjcirinosclwybene2k1
authored
feat(ai): integrate examples and explanations for Responses API (#5459)
* fix(genai): responses api * feat(ai): continue to integrate responses api * fix(ai): finish updating for responses * Apply suggestions from code review Co-authored-by: Guillaume Calmettes <[email protected]> Co-authored-by: Jessica <[email protected]> Co-authored-by: Benedikt Rollik <[email protected]> * fix(responses): remove from vision * fix(ai): final corrections --------- Co-authored-by: Guillaume Calmettes <[email protected]> Co-authored-by: Jessica <[email protected]> Co-authored-by: Benedikt Rollik <[email protected]>
1 parent 1efa41b commit 4ce46b9

File tree

9 files changed

+381
-211
lines changed

9 files changed

+381
-211
lines changed
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
---
2+
macro: chat-comp-vs-responses-api
3+
---
4+
5+
Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) are OpenAI-compatible REST APIs that can be used for generating and manipulating conversations. The Chat Completions API is focused on generating conversational responses, while the Responses API is a more general REST API for chat, structured outputs, tool use, and multimodal inputs.
6+
7+
The **Chat Completions** API was released in 2023, and is an industry standard for building AI applications, being specifically designed for handling multi-turn conversations. It is stateless, but allows users to manage conversation history by appending each new message to the ongoing conversation. Messages in the conversation can include text, images and audio extracts. The API supports `function` tool-calling, allowing developers to define functions that the model can choose to call. If it does so, it returns the function name and arguments, which the developer's code must execute and feed back into the conversation.
8+
9+
The **Responses** API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response.
10+
11+
<Message type="note">
12+
Scaleway's support for the Responses API is currently at beta stage. Support of the full feature set will be incremental: currently statefulness and tools other than `function` calling are not supported.
13+
</Message>
14+
15+
Most supported Generative API models can be used with both Chat Completions and Responses API. For the **`gtp-oss-120b` model, use of the Responses API is recommended, as it will allow you to access all of its features, especially tool-calling.
16+
17+
For full details on the differences between these APIs, see the [official OpenAI documentation](https://platform.openai.com/docs/guides/migrate-to-responses).

pages/generative-apis/faq.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ No, you cannot increase maximum output tokens above [limits for each models](/ge
4040
These limits are in place to protect you against:
4141
- Long generation which may be ended by an HTTP timeout. Limits are designed to ensure a model will send its HTTP response in less than 5 minutes.
4242
- Uncontrolled billing, as several models are known to be able to enter infinite generation loops (specific prompts can make the model generate the same sentence over and over, without stopping at all).
43-
If you require higher maximum output tokens, you can use [Managed Inference](https://console.scaleway.com/inference/deployments) where these limts do not apply (as your bill will be limited by the size of your deployment).
43+
If you require higher maximum output tokens, you can use [Managed Inference](https://console.scaleway.com/inference/deployments) where these limits do not apply (as your bill will be limited by the size of your deployment).
4444

4545
### Can I use OpenAI libraries and APIs with Scaleway's Generative APIs?
4646
Yes, Scaleway's Generative APIs are designed to be compatible with OpenAI libraries and SDKs, including the OpenAI Python client library and LangChain SDKs. This allows for seamless integration with existing workflows.

pages/generative-apis/how-to/query-language-models.mdx

Lines changed: 149 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,17 @@ title: How to query language models
33
description: Learn how to interact with powerful language models using Scaleway's Generative APIs service.
44
tags: generative-apis ai-data language-models chat-completions-api
55
dates:
6-
validation: 2025-05-12
6+
validation: 2025-08-22
77
posted: 2024-08-28
88
---
99
import Requirements from '@macros/iam/requirements.mdx'
10-
10+
import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx'
1111

1212
Scaleway's Generative APIs service allows users to interact with powerful language models hosted on the platform.
1313

1414
There are several ways to interact with language models:
1515
- The Scaleway [console](https://console.scaleway.com) provides complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
16-
- Via the [Chat API](/generative-apis/how-to/query-language-models/#querying-language-models-via-api)
16+
- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response)
1717

1818
<Requirements />
1919

@@ -39,10 +39,12 @@ The web playground displays.
3939

4040
## Querying language models via API
4141

42-
The [Chat API](/generative-apis/api-cli/using-chat-api/) is an OpenAI-compatible REST API for generating and manipulating conversations.
43-
4442
You can query the models programmatically using your favorite tools or languages.
45-
In the following example, we will use the OpenAI Python client.
43+
In the example that follows, we will use the OpenAI Python client.
44+
45+
### Chat Completions API or Responses API?
46+
47+
<ChatCompVsResponsesApi />
4648

4749
### Installing the OpenAI SDK
4850

@@ -68,48 +70,97 @@ client = OpenAI(
6870

6971
### Generating a chat completion
7072

71-
You can now create a chat completion, for example with the `llama-3.1-8b-instruct` model:
73+
You can now create a chat completion using either the Chat Completions or Responses API, as shown in the following examples:
7274

73-
```python
74-
# Create a chat completion using the 'llama-3.1-8b-instruct' model
75-
response = client.chat.completions.create(
76-
model="llama-3.1-8b-instruct",
77-
messages=[{"role": "user", "content": "Describe a futuristic city with advanced technology and green energy solutions."}],
78-
temperature=0.2, # Adjusts creativity
79-
max_tokens=100, # Limits the length of the output
80-
top_p=0.7 # Controls diversity through nucleus sampling. You usually only need to use temperature.
81-
)
75+
<Tabs id="generating-chat-completion">
8276

83-
# Print the generated response
84-
print(response.choices[0].message.content)
85-
```
77+
<TabsTab label="Chat Completions API">
78+
79+
```python
80+
# Create a chat completion using the 'llama-3.1-8b-instruct' model
81+
response = client.chat.completions.create(
82+
model="llama-3.1-8b-instruct",
83+
messages=[{"role": "user", "content": "Describe a futuristic city with advanced technology and green energy solutions."}],
84+
temperature=0.2, # Adjusts creativity
85+
max_completion_tokens=100, # Limits the length of the output
86+
top_p=0.7 # Controls diversity through nucleus sampling. You usually only need to use temperature.
87+
)
88+
89+
# Print the generated response
90+
print(response.choices[0].message.content)
91+
```
92+
93+
This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_completion_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively.
94+
95+
</TabsTab>
96+
97+
<TabsTab label="Responses API (Beta)">
98+
99+
```python
100+
# Create a chat completion using the 'gpt-oss-120b' model
101+
response = client.responses.create(
102+
model="gpt-oss-120b",
103+
input=[{"role": "user", "content": "Briefly describe a futuristic city with advanced technology and green energy solutions."}],
104+
temperature=0.2, # Adjusts creativity
105+
max_output_tokens=100, # Limits the length of the output
106+
top_p=0.7 # Controls diversity through nucleus sampling. You usually only need to use temperature.
107+
108+
)
109+
# Print the generated response. Here, the last output message will contain the final content.
110+
# Previous outputs will contain reasoning content.
111+
print(response.output[-1].content[0].text)
112+
```
113+
</TabsTab>
114+
</Tabs>
86115

87-
This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively.
88116

89117
A conversation style may include a default system prompt. You may set this prompt by setting the first message with the role system. For example:
90118

91-
```python
92-
[
93-
{
94-
"role": "system",
95-
"content": "You are Xavier Niel."
96-
},
97-
{
98-
"role": "user",
99-
"content": "Hello, what is your name?"
100-
}
101-
]
102-
```
119+
```python
120+
[
121+
{
122+
"role": "system",
123+
"content": "You are Xavier Niel."
124+
},
125+
{
126+
"role": "user",
127+
"content": "Hello, what is your name?"
128+
}
129+
]
130+
```
131+
132+
Adding such a system prompt can also help resolve issues if you receive responses such as `I'm not sure what tools are available to me. Can you please provide a library of tools that I can use to generate a response?`.
103133

104134
### Model parameters and their effects
105135

106136
The following parameters will influence the output of the model:
107137

108-
- **`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`.
109-
- **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
110-
- **`max_tokens`**: The maximum number of tokens (words or parts of words) in the generated output.
111-
- **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
112-
- **`stop`**: A string or list of strings where the model will stop generating further tokens. This is useful for controlling the end of the output.
138+
<Tabs id="model-params">
139+
140+
<TabsTab label = "Chat Completions API">
141+
142+
- **`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`.
143+
- **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
144+
- **`max_completion_tokens`**: The maximum number of tokens (words or parts of words) in the generated output.
145+
- **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
146+
- **`stop`**: A string or list of strings where the model will stop generating further tokens. This is useful for controlling the end of the output.
147+
148+
See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
149+
150+
</TabsTab>
151+
152+
<TabsTab label="Responses API (Beta)">
153+
154+
- **`input`**: A single text string, or an array of string/multi-modal inputs to provide to the model to generate a response. When using the array option, you can define a `role` and list of `content` inputs of different types (texts, files, images etc.)
155+
- **`max_output_tokens`**: A maximum number of output tokens that can be generated for a completion. Different default maximum values
156+
are enforced for each model, to avoid edge cases where tokens are generated indefinitely.
157+
- **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
158+
- **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
159+
160+
See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) for a full list of all available parameters.
161+
162+
</TabsTab>
163+
</Tabs>
113164

114165
<Message type="warning">
115166
If you encounter an error such as "Forbidden 403" refer to the [API documentation](/generative-apis/api-cli/understanding-errors) for troubleshooting tips.
@@ -118,7 +169,8 @@ The following parameters will influence the output of the model:
118169
## Streaming
119170

120171
By default, the outputs are returned to the client only after the generation process is complete. However, a common alternative is to stream the results back to the client as they are generated. This is particularly useful in chat applications, where it allows the client to view the results incrementally as each token is produced.
121-
Following is an example using the chat completions API:
172+
173+
Following is an example using the Chat Completions API, but the `stream` parameter can be set in the same way with the Responses API.
122174

123175
```python
124176
from openai import OpenAI
@@ -145,28 +197,62 @@ for chunk in response:
145197

146198
The service also supports asynchronous mode for any chat completion.
147199

148-
```python
149-
150-
import asyncio
151-
from openai import AsyncOpenAI
152-
153-
client = AsyncOpenAI(
154-
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
155-
api_key="<SCW_API_KEY>" # Your unique API key from Scaleway
156-
)
157-
158-
async def main():
159-
stream = await client.chat.completions.create(
160-
model="llama-3.1-8b-instruct",
161-
messages=[{
162-
"role": "user",
163-
"content": "Sing me a song",
164-
}],
165-
stream=True,
166-
)
167-
async for chunk in stream:
168-
if chunk.choices and chunk.choices[0].delta.content:
169-
print(chunk.choices[0].delta.content, end="")
170-
171-
asyncio.run(main())
172-
```
200+
<Tabs id="async">
201+
202+
<TabsTab label="Chat Completions API">
203+
204+
```python
205+
206+
import asyncio
207+
from openai import AsyncOpenAI
208+
209+
client = AsyncOpenAI(
210+
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
211+
api_key="<SCW_API_KEY>" # Your unique API key from Scaleway
212+
)
213+
214+
async def main():
215+
stream = client.chat.completions.create(
216+
model="llama-3.1-8b-instruct",
217+
messages=[{
218+
"role": "user",
219+
"content": "Sing me a song",
220+
}],
221+
stream=True,
222+
)
223+
async for chunk in stream:
224+
if chunk.choices and chunk.choices[0].delta.content:
225+
print(chunk.choices[0].delta.content, end="")
226+
227+
asyncio.run(main())
228+
```
229+
</TabsTab>
230+
<TabsTab label="Responses API (Beta)">
231+
```python
232+
import asyncio
233+
from openai import AsyncOpenAI
234+
235+
client = AsyncOpenAI(
236+
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
237+
api_key="<SCW_API_KEY>" # Your unique API key from Scaleway
238+
)
239+
240+
async def main():
241+
stream = await client.responses.create(
242+
model="gpt-oss-120b",
243+
input=[{
244+
"role": "user",
245+
"content": "Sing me a song"
246+
}],
247+
stream=True,
248+
)
249+
async for event in stream:
250+
if event.type == "response.output_text.delta":
251+
print(event.delta, end="")
252+
elif event.type == "response.completed":
253+
break
254+
255+
asyncio.run(main())
256+
```
257+
</TabsTab>
258+
</Tabs>

0 commit comments

Comments
 (0)