Skip to content

Commit 78ef48c

Browse files
committed
feat(ai): continue to integrate responses api
1 parent e2cbfff commit 78ef48c

File tree

5 files changed

+792
-334
lines changed

5 files changed

+792
-334
lines changed
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
macro: chat-comp-vs-responses-api
3+
---
4+
5+
Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) are OpenAI-compatible REST APIs that can be used for generating and manipulating conversations. The Chat Completions API is focused on generating conversational responses, while the Responses API is a more general REST API for chat, structured outputs, tool use, and multimodal inputs.
6+
7+
The **Chat Completions** API was released in 2023, and is an industry standard for building AI applications, being specifically designed for handling multi-turn conversations. It is stateless, but allows users to manage conversation history by appending each new message to the ongoing conversation. It supports `function` tool-calling, where the developer defines a set of functions, which the model can decide whether to call when generating a response. If it decides to call one of these functions, it returns the function name and arguments, and the developer's own code must actually execute the function and feed the result back into the conversation for use by the model.
8+
9+
The **Responses** API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response, though currently only `function` tools are supported by Scaleway. Overall, Scaleway's support for the Responses API is currently at beta stage. All supported Generative API models can be used with Responses API, and note that for the `gtp-oss-120b` model, only the Responses API will allow you to access all of its features.
10+
11+
For full details on the differences between these APIs, see the [official OpenAI documentation](https://platform.openai.com/docs/guides/migrate-to-responses).

pages/generative-apis/how-to/query-language-models.mdx

Lines changed: 144 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,11 @@ title: How to query language models
33
description: Learn how to interact with powerful language models using Scaleway's Generative APIs service.
44
tags: generative-apis ai-data language-models chat-completions-api
55
dates:
6-
validation: 2025-05-12
6+
validation: 2025-08-22
77
posted: 2024-08-28
88
---
99
import Requirements from '@macros/iam/requirements.mdx'
10-
10+
import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx'
1111

1212
Scaleway's Generative APIs service allows users to interact with powerful language models hosted on the platform.
1313

@@ -39,25 +39,12 @@ The web playground displays.
3939

4040
## Querying language models via API
4141

42-
Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) are an OpenAI-compatible REST APIs for generating and manipulating conversations.
42+
You can query the models programmatically using your favorite tools or languages.
43+
In the example that follows, we will use the OpenAI Python client.
4344

4445
### Chat Completions API or Responses API?
4546

46-
The table below compares the Chat Completions API to the Responses API. TODO CONTINUE HERE
47-
48-
| Aspect | Responses API | Chat Completions API |
49-
|-------------------------|------------------------------------------------|-----------------------------------------|
50-
| **Description | Unified API for model responses (successor to Chat + Assistants). Offers tool-calling by built-in tools (e.g. web or file search) while the model generates a responses, though currently only `function` tools are supported by Scaleway. | Older API for chat-style completions. Offers only `function` tool-calling. |
51-
| **Status** | Beta | GA |
52-
| **Endpoint** | `/v1/{project_id}/responses` | `/v1/{project_id}/chat/completions` |
53-
| **Use cases** | Agentic apps, tool-augmented workflows, multi-step tasks, future-proof apps | Simple chatbots, Q&A, summarization, stateless interactions |
54-
| **Features** | - Plain chat completions<br>- Tool calling (Code Interpreter, image gen, file search, web search)<br>- MCP tool integration<br>- Background mode<br>- Reasoning summaries | - Plain chat completions only<br>- Function calling (basic tool use)<br>- No MCP, no background mode, no reasoning summaries |
55-
56-
| **Long-term support** | Future standard API (will replace Chat & Assistants) | Maintained for now, EOL expected mid-2026 |
57-
| **Complexity** | More powerful but requires newer SDK methods (`client.responses.create`) | Simpler, lighter (`client.chat.completions.create`) |
58-
59-
You can query the models programmatically using your favorite tools or languages.
60-
In the following example, we will use the OpenAI Python client.
47+
<ChatCompVsResponsesApi />
6148

6249
### Installing the OpenAI SDK
6350

@@ -83,48 +70,95 @@ client = OpenAI(
8370

8471
### Generating a chat completion
8572

86-
You can now create a chat completion, for example with the `llama-3.1-8b-instruct` model:
73+
You can now create a chat completion using either the Chat Completions or Responses API, as shown in the following examples: ,
8774

88-
```python
89-
# Create a chat completion using the 'llama-3.1-8b-instruct' model
90-
response = client.chat.completions.create(
91-
model="llama-3.1-8b-instruct",
92-
messages=[{"role": "user", "content": "Describe a futuristic city with advanced technology and green energy solutions."}],
93-
temperature=0.2, # Adjusts creativity
94-
max_tokens=100, # Limits the length of the output
95-
top_p=0.7 # Controls diversity through nucleus sampling. You usually only need to use temperature.
96-
)
75+
<Tabs id="generating-chat-completion">
9776

98-
# Print the generated response
99-
print(response.choices[0].message.content)
100-
```
77+
<TabsTab label="Chat Completions API">
78+
79+
```python
80+
# Create a chat completion using the 'llama-3.1-8b-instruct' model
81+
response = client.chat.completions.create(
82+
model="llama-3.1-8b-instruct",
83+
messages=[{"role": "user", "content": "Describe a futuristic city with advanced technology and green energy solutions."}],
84+
temperature=0.2, # Adjusts creativity
85+
max_tokens=100, # Limits the length of the output
86+
top_p=0.7 # Controls diversity through nucleus sampling. You usually only need to use temperature.
87+
)
88+
89+
# Print the generated response
90+
print(response.choices[0].message.content)
91+
```
92+
93+
This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively.
94+
95+
</TabsTab>
96+
97+
<TabsTab label="Responses API (Beta)">
98+
99+
```python
100+
# Create a chat completion using the 'gpt-oss-120b' model
101+
response = client.responses.create(
102+
model="gpt-oss-120b",
103+
input=[{"role": "user", "content": "Briefly describe a futuristic city with advanced technology and green energy solutions."}],
104+
temperature=0.2, # Adjusts creativity
105+
max_output_tokens=100, # Limits the length of the output
106+
top_p=0.7 # Controls diversity through nucleus sampling. You usually only need to use temperature.
107+
108+
)
109+
# Print the generated response. Here, the last output message will contain the final content.
110+
# Previous outputs will contain reasoning content.
111+
print(response.output[-1].content[0].text)
112+
```
113+
</TabsTab>
114+
</Tabs>
101115

102-
This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively.
103116

104117
A conversation style may include a default system prompt. You may set this prompt by setting the first message with the role system. For example:
105118

106-
```python
107-
[
108-
{
109-
"role": "system",
110-
"content": "You are Xavier Niel."
111-
},
112-
{
113-
"role": "user",
114-
"content": "Hello, what is your name?"
115-
}
116-
]
117-
```
119+
```python
120+
[
121+
{
122+
"role": "system",
123+
"content": "You are Xavier Niel."
124+
},
125+
{
126+
"role": "user",
127+
"content": "Hello, what is your name?"
128+
}
129+
]
130+
```
118131

119132
### Model parameters and their effects
120133

121134
The following parameters will influence the output of the model:
122135

123-
- **`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`.
124-
- **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
125-
- **`max_tokens`**: The maximum number of tokens (words or parts of words) in the generated output.
126-
- **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
127-
- **`stop`**: A string or list of strings where the model will stop generating further tokens. This is useful for controlling the end of the output.
136+
<Tabs id="model-params">
137+
138+
<TabsTab label = "Chat Completions API">
139+
140+
- **`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`.
141+
- **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
142+
- **`max_tokens`**: The maximum number of tokens (words or parts of words) in the generated output.
143+
- **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
144+
- **`stop`**: A string or list of strings where the model will stop generating further tokens. This is useful for controlling the end of the output.
145+
146+
See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
147+
148+
</TabsTab>
149+
150+
<TabsTab label="Responses API (Beta)">
151+
152+
- **`input`**: A single text string, or an array of string/multi-modal inputs to provide to the model to generate a response. When using the array option, you can define a `role` and list of `content` inputs of different types (texts, files, images etc.)
153+
- **`max_output_tokens`**: A maximum number of output tokens that can be generated for a completion. Different default maximum values
154+
are enforced for each model, to avoid edge cases where tokens are generated indefinitely.
155+
- **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
156+
- **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
157+
158+
See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) for a full list of all available parameters.
159+
160+
</TabsTab>
161+
</Tabs>
128162

129163
<Message type="warning">
130164
If you encounter an error such as "Forbidden 403" refer to the [API documentation](/generative-apis/api-cli/understanding-errors) for troubleshooting tips.
@@ -133,7 +167,8 @@ The following parameters will influence the output of the model:
133167
## Streaming
134168

135169
By default, the outputs are returned to the client only after the generation process is complete. However, a common alternative is to stream the results back to the client as they are generated. This is particularly useful in chat applications, where it allows the client to view the results incrementally as each token is produced.
136-
Following is an example using the chat completions API:
170+
171+
Following is an example using the Chat Completions API, but the `stream` parameter can be set in the same way with the Responses API.
137172

138173
```python
139174
from openai import OpenAI
@@ -160,28 +195,62 @@ for chunk in response:
160195

161196
The service also supports asynchronous mode for any chat completion.
162197

163-
```python
164-
165-
import asyncio
166-
from openai import AsyncOpenAI
167-
168-
client = AsyncOpenAI(
169-
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
170-
api_key="<SCW_API_KEY>" # Your unique API key from Scaleway
171-
)
172-
173-
async def main():
174-
stream = await client.chat.completions.create(
175-
model="llama-3.1-8b-instruct",
176-
messages=[{
177-
"role": "user",
178-
"content": "Sing me a song",
179-
}],
180-
stream=True,
181-
)
182-
async for chunk in stream:
183-
if chunk.choices and chunk.choices[0].delta.content:
184-
print(chunk.choices[0].delta.content, end="")
185-
186-
asyncio.run(main())
187-
```
198+
<Tabs id="async">
199+
200+
<TabsTab label="Chat Completions API">
201+
202+
```python
203+
204+
import asyncio
205+
from openai import AsyncOpenAI
206+
207+
client = AsyncOpenAI(
208+
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
209+
api_key="<SCW_API_KEY>" # Your unique API key from Scaleway
210+
)
211+
212+
async def main():
213+
stream = await client.chat.completions.create(
214+
model="llama-3.1-8b-instruct",
215+
messages=[{
216+
"role": "user",
217+
"content": "Sing me a song",
218+
}],
219+
stream=True,
220+
)
221+
async for chunk in stream:
222+
if chunk.choices and chunk.choices[0].delta.content:
223+
print(chunk.choices[0].delta.content, end="")
224+
225+
asyncio.run(main())
226+
```
227+
</TabsTab>
228+
<TabsTab label="Responses API (Beta)">
229+
```python
230+
import asyncio
231+
from openai import AsyncOpenAI
232+
233+
client = AsyncOpenAI(
234+
base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL
235+
api_key="<SCW_API_KEY>" # Your unique API key from Scaleway
236+
)
237+
238+
async def main():
239+
stream = await client.responses.create(
240+
model="llama-3.1-8b-instruct",
241+
input=[{
242+
"role": "user",
243+
"content": "Sing me a song"
244+
}],
245+
stream=True,
246+
)
247+
async for event in stream:
248+
if event.type == "response.output_text.delta":
249+
print(event.delta, end="")
250+
elif event.type == "response.completed":
251+
break
252+
253+
asyncio.run(main())
254+
```
255+
</TabsTab>
256+
</Tabs>

0 commit comments

Comments
 (0)