diff --git a/macros/ai/chat-comp-vs-responses-api.mdx b/macros/ai/chat-comp-vs-responses-api.mdx
new file mode 100644
index 0000000000..01c1cd7837
--- /dev/null
+++ b/macros/ai/chat-comp-vs-responses-api.mdx
@@ -0,0 +1,17 @@
+---
+macro: chat-comp-vs-responses-api
+---
+
+Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) are OpenAI-compatible REST APIs that can be used for generating and manipulating conversations. The Chat Completions API is focused on generating conversational responses, while the Responses API is a more general REST API for chat, structured outputs, tool use, and multimodal inputs.
+
+The **Chat Completions** API was released in 2023, and is an industry standard for building AI applications, being specifically designed for handling multi-turn conversations. It is stateless, but allows users to manage conversation history by appending each new message to the ongoing conversation. Messages in the conversation can include text, images and audio extracts. The API supports `function` tool-calling, allowing developers to define functions that the model can choose to call. If it does so, it returns the function name and arguments, which the developer's code must execute and feed back into the conversation.
+
+The **Responses** API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response. 
+
+<Message type="note">
+Scaleway's support for the Responses API is currently at beta stage. Support of the full feature set will be incremental: currently statefulness and tools other than `function` calling are not supported. 
+</Message>
+
+Most supported Generative API models can be used with both Chat Completions and Responses API. For the **`gtp-oss-120b` model, use of the Responses API is recommended, as it will allow you to access all of its features, especially tool-calling.
+
+For full details on the differences between these APIs, see the [official OpenAI documentation](https://platform.openai.com/docs/guides/migrate-to-responses).
\ No newline at end of file
diff --git a/pages/generative-apis/faq.mdx b/pages/generative-apis/faq.mdx
index 99cd33b139..84010a2d6f 100644
--- a/pages/generative-apis/faq.mdx
+++ b/pages/generative-apis/faq.mdx
@@ -40,7 +40,7 @@ No, you cannot increase maximum output tokens above [limits for each models](/ge
 These limits are in place to protect you against:
 - Long generation which may be ended by an HTTP timeout. Limits are designed to ensure a model will send its HTTP response in less than 5 minutes.
 - Uncontrolled billing, as several models are known to be able to enter infinite generation loops (specific prompts can make the model generate the same sentence over and over, without stopping at all).
-If you require higher maximum output tokens, you can use [Managed Inference](https://console.scaleway.com/inference/deployments) where these limts do not apply (as your bill will be limited by the size of your deployment).
+If you require higher maximum output tokens, you can use [Managed Inference](https://console.scaleway.com/inference/deployments) where these limits do not apply (as your bill will be limited by the size of your deployment).
 
 ### Can I use OpenAI libraries and APIs with Scaleway's Generative APIs?
 Yes, Scaleway's Generative APIs are designed to be compatible with OpenAI libraries and SDKs, including the OpenAI Python client library and LangChain SDKs. This allows for seamless integration with existing workflows.
diff --git a/pages/generative-apis/how-to/query-language-models.mdx b/pages/generative-apis/how-to/query-language-models.mdx
index 72c31e7e73..ea5239835e 100644
--- a/pages/generative-apis/how-to/query-language-models.mdx
+++ b/pages/generative-apis/how-to/query-language-models.mdx
@@ -3,17 +3,17 @@ title: How to query language models
 description: Learn how to interact with powerful language models using Scaleway's Generative APIs service.
 tags: generative-apis ai-data language-models chat-completions-api
 dates:
-  validation: 2025-05-12
+  validation: 2025-08-22
   posted: 2024-08-28
 ---
 import Requirements from '@macros/iam/requirements.mdx'
-
+import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx'
 
 Scaleway's Generative APIs service allows users to interact with powerful language models hosted on the platform.
 
 There are several ways to interact with language models:
 - The Scaleway [console](https://console.scaleway.com) provides complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
-- Via the [Chat API](/generative-apis/how-to/query-language-models/#querying-language-models-via-api)
+- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response)
 
 <Requirements />
 
@@ -39,10 +39,12 @@ The web playground displays.
 
 ## Querying language models via API
 
-The [Chat API](/generative-apis/api-cli/using-chat-api/) is an OpenAI-compatible REST API for generating and manipulating conversations.
-
 You can query the models programmatically using your favorite tools or languages.
-In the following example, we will use the OpenAI Python client.
+In the example that follows, we will use the OpenAI Python client.
+
+### Chat Completions API or Responses API?
+
+<ChatCompVsResponsesApi />
 
 ### Installing the OpenAI SDK
 
@@ -68,48 +70,97 @@ client = OpenAI(
 
 ### Generating a chat completion
 
-You can now create a chat completion, for example with the `llama-3.1-8b-instruct` model:
+You can now create a chat completion using either the Chat Completions or Responses API, as shown in the following examples:
 
-```python
-# Create a chat completion using the 'llama-3.1-8b-instruct' model
-response = client.chat.completions.create(
-    model="llama-3.1-8b-instruct",
-    messages=[{"role": "user", "content": "Describe a futuristic city with advanced technology and green energy solutions."}],
-    temperature=0.2,  # Adjusts creativity
-    max_tokens=100,   # Limits the length of the output
-    top_p=0.7         # Controls diversity through nucleus sampling. You usually only need to use temperature.
-)
+<Tabs id="generating-chat-completion">
 
-# Print the generated response
-print(response.choices[0].message.content)
-```
+    <TabsTab label="Chat Completions API">
+
+    ```python
+    # Create a chat completion using the 'llama-3.1-8b-instruct' model
+    response = client.chat.completions.create(
+        model="llama-3.1-8b-instruct",
+        messages=[{"role": "user", "content": "Describe a futuristic city with advanced technology and green energy solutions."}],
+        temperature=0.2,  # Adjusts creativity
+        max_completion_tokens=100,   # Limits the length of the output
+        top_p=0.7         # Controls diversity through nucleus sampling. You usually only need to use temperature.
+    )
+
+    # Print the generated response
+    print(response.choices[0].message.content)
+    ```
+
+    This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_completion_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively.
+
+    </TabsTab>
+
+    <TabsTab label="Responses API (Beta)">
+
+    ```python
+    # Create a chat completion using the 'gpt-oss-120b' model
+    response = client.responses.create(
+        model="gpt-oss-120b",
+        input=[{"role": "user", "content": "Briefly describe a futuristic city with advanced technology and green energy solutions."}],
+        temperature=0.2,  # Adjusts creativity
+        max_output_tokens=100,   # Limits the length of the output
+        top_p=0.7         # Controls diversity through nucleus sampling. You usually only need to use temperature.
+
+    )
+    # Print the generated response. Here, the last output message will contain the final content.
+    # Previous outputs will contain reasoning content.
+    print(response.output[-1].content[0].text)
+    ```
+    </TabsTab>
+</Tabs>
 
-This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively.
 
 A conversation style may include a default system prompt. You may set this prompt by setting the first message with the role system. For example:
 
-```python
-[
-  {
-  	"role": "system",
-  	"content": "You are Xavier Niel."
-  },
-  {
-  	"role": "user",
-  	"content": "Hello, what is your name?"
-  }
-]
-```
+  ```python
+  [
+    {
+      "role": "system",
+      "content": "You are Xavier Niel."
+    },
+    {
+      "role": "user",
+      "content": "Hello, what is your name?"
+    }
+  ]
+  ```
+
+Adding such a system prompt can also help resolve issues if you receive responses such as `I'm not sure what tools are available to me. Can you please provide a library of tools that I can use to generate a response?`.
 
 ### Model parameters and their effects
 
 The following parameters will influence the output of the model:
 
-- **`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`.
-- **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
-- **`max_tokens`**: The maximum number of tokens (words or parts of words) in the generated output.
-- **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
-- **`stop`**: A string or list of strings where the model will stop generating further tokens. This is useful for controlling the end of the output.
+<Tabs id="model-params">
+
+  <TabsTab label = "Chat Completions API">
+
+  - **`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`.
+  - **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
+  - **`max_completion_tokens`**: The maximum number of tokens (words or parts of words) in the generated output.
+  - **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
+  - **`stop`**: A string or list of strings where the model will stop generating further tokens. This is useful for controlling the end of the output.
+
+  See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters.
+
+  </TabsTab>
+
+  <TabsTab label="Responses API (Beta)">
+
+  - **`input`**: A single text string, or an array of string/multi-modal inputs to provide to the model to generate a response. When using the array option, you can define a `role` and list of `content` inputs of different types (texts, files, images etc.)
+  - **`max_output_tokens`**: A maximum number of output tokens that can be generated for a completion. Different default maximum values
+are enforced for each model, to avoid edge cases where tokens are generated indefinitely.
+  - **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
+  - **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
+
+  See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) for a full list of all available parameters.
+
+  </TabsTab>
+</Tabs>
 
 <Message type="warning">
  If you encounter an error such as "Forbidden 403" refer to the [API documentation](/generative-apis/api-cli/understanding-errors) for troubleshooting tips.
@@ -118,7 +169,8 @@ The following parameters will influence the output of the model:
 ## Streaming
 
 By default, the outputs are returned to the client only after the generation process is complete. However, a common alternative is to stream the results back to the client as they are generated. This is particularly useful in chat applications, where it allows the client to view the results incrementally as each token is produced.
-Following is an example using the chat completions API:
+
+Following is an example using the Chat Completions API, but the `stream` parameter can be set in the same way with the Responses API.
 
 ```python
 from openai import OpenAI
@@ -145,28 +197,62 @@ for chunk in response:
 
 The service also supports asynchronous mode for any chat completion.
 
-```python
-
-import asyncio
-from openai import AsyncOpenAI
-
-client = AsyncOpenAI(
-    base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
-    api_key="<SCW_API_KEY>"  # Your unique API key from Scaleway
-)
-
-async def main():
-    stream = await client.chat.completions.create(
-        model="llama-3.1-8b-instruct",
-        messages=[{
-        "role": "user",
-        "content": "Sing me a song",
-        }],
-        stream=True,
-    )
-    async for chunk in stream:
-        if chunk.choices and chunk.choices[0].delta.content:
-            print(chunk.choices[0].delta.content, end="")
-
-asyncio.run(main())
-```
+<Tabs id="async">
+
+  <TabsTab label="Chat Completions API">
+
+  ```python
+
+  import asyncio
+  from openai import AsyncOpenAI
+
+  client = AsyncOpenAI(
+      base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
+      api_key="<SCW_API_KEY>"  # Your unique API key from Scaleway
+  )
+
+  async def main():
+      stream = client.chat.completions.create(
+          model="llama-3.1-8b-instruct",
+          messages=[{
+          "role": "user",
+          "content": "Sing me a song",
+          }],
+          stream=True,
+      )
+      async for chunk in stream:
+          if chunk.choices and chunk.choices[0].delta.content:
+              print(chunk.choices[0].delta.content, end="")
+
+  asyncio.run(main())
+  ```
+  </TabsTab>
+  <TabsTab label="Responses API (Beta)">
+  ```python
+  import asyncio
+  from openai import AsyncOpenAI
+
+  client = AsyncOpenAI(
+      base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
+      api_key="<SCW_API_KEY>"  # Your unique API key from Scaleway
+  )
+
+  async def main():
+      stream = await client.responses.create(
+          model="gpt-oss-120b",
+          input=[{
+              "role": "user",
+              "content": "Sing me a song"
+          }],
+          stream=True,
+      )
+      async for event in stream:
+          if event.type == "response.output_text.delta":
+              print(event.delta, end="")
+          elif event.type == "response.completed":
+              break
+
+  asyncio.run(main())
+  ```
+  </TabsTab>
+</Tabs>
diff --git a/pages/generative-apis/how-to/query-vision-models.mdx b/pages/generative-apis/how-to/query-vision-models.mdx
index cc7b1ef08e..1bac3e7330 100644
--- a/pages/generative-apis/how-to/query-vision-models.mdx
+++ b/pages/generative-apis/how-to/query-vision-models.mdx
@@ -8,7 +8,6 @@ dates:
 ---
 import Requirements from '@macros/iam/requirements.mdx'
 
-
 Scaleway's Generative APIs service allows users to interact with powerful vision models hosted on the platform.
 
 <Message type="note">
@@ -17,7 +16,7 @@ Scaleway's Generative APIs service allows users to interact with powerful vision
 
 There are several ways to interact with vision models:
 - The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-vision-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
-- Via the [Chat API](/generative-apis/how-to/query-vision-models/#querying-vision-models-via-the-api)
+- The [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion).
 
 <Requirements />
 
@@ -43,17 +42,16 @@ The web playground displays.
 
 ## Querying vision models via the API
 
-The [Chat API](/generative-apis/api-cli/using-chat-api/) is an OpenAI-compatible REST API for generating and manipulating conversations.
+You can query vision models programmatically using your favorite tools or languages.
 
-You can query the vision models programmatically using your favorite tools or languages.
 Vision models take both text and images as inputs.
 
+In the example that follows, we will use the OpenAI Python client.
+
 <Message type="tip">
   Unlike traditional language models, vision models will take a content array for the user role, structuring text and images as inputs.
 </Message>
 
-In the following example, we will use the OpenAI Python client.
-
 ### Installing the OpenAI SDK
 
 Install the OpenAI SDK using pip:
@@ -78,21 +76,21 @@ client = OpenAI(
 
 ### Generating a chat completion
 
-You can now create a chat completion, for example with the `pixtral-12b-2409` model:
+You can now create a chat completion:
 
 ```python
 # Create a chat completion using the 'pixtral-12b-2409' model
 response = client.chat.completions.create(
     model="pixtral-12b-2409",
     messages=[
-      {
+    {
         "role": "user",
         "content": [
-              {"type": "text", "text": "What is this image?"},
-              {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
-          ] #  Vision models will take a content array with text and image_url objects.
+            {"type": "text", "text": "What is this image?"},
+            {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
+        ] #  Vision models will take a content array with text and image_url objects.
 
-      }
+    }
     ],
     temperature=0.7,  # Adjusts creativity
     max_tokens=2048,   # Limits the length of the output
@@ -127,7 +125,7 @@ To encode Base64 images in Python, you first need to install `Pillow` library:
 pip install pillow
 ```
 
-Then, the following Python code sample shows you how to encode an image in Base64 format and pass it to your request payload:
+Then, the following Python code sample shows you how to encode an image in Base64 format and pass it to a request payload for the Chat Completions API:
 
 ```python
 import base64
@@ -165,9 +163,9 @@ payload = {
 
 ```
 
-### Model parameters and their effects
+### Model parameters and their effects 
 
-The following parameters will influence the output of the model:
+When using the Chat Completions API, the following parameters will influence the output of the model:
 
 - **`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`. The content is an array that can contain text and/or image objects.
 - **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
@@ -182,7 +180,8 @@ The following parameters will influence the output of the model:
 ## Streaming
 
 By default, the outputs are returned to the client only after the generation process is complete. However, a common alternative is to stream the results back to the client as they are generated. This is particularly useful in chat applications, where it allows the client to view the results incrementally as each token is produced.
-The following example shows how to use the chat completion API:
+
+An example for the Chat Completions API is provided below:
 
 ```python
 from openai import OpenAI
@@ -192,15 +191,15 @@ client = OpenAI(
     api_key="<SCW_API_KEY>"  # Your unique API key from Scaleway
 )
 response = client.chat.completions.create(
-  model="pixtral-12b-2409",
-  messages=[{
-      "role": "user", 
-      "content": [
-          {"type": "text", "text": "What is this image?"},
-          {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
-      ]
-  }],
-  stream=True,
+model="pixtral-12b-2409",
+messages=[{
+    "role": "user", 
+    "content": [
+        {"type": "text", "text": "What is this image?"},
+        {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
+    ]
+}],
+stream=True,
 )
 
 for chunk in response:
@@ -208,9 +207,10 @@ for chunk in response:
         print(chunk.choices[0].delta.content, end="")
 ```
 
+
 ## Async
 
-The service also supports asynchronous mode for any chat completion.
+The service also supports asynchronous mode for any chat completion. An example for the Chat Completions API is provided below:
 
 ```python
 
diff --git a/pages/generative-apis/how-to/use-function-calling.mdx b/pages/generative-apis/how-to/use-function-calling.mdx
index 2bb3bb6397..f62b7ce712 100644
--- a/pages/generative-apis/how-to/use-function-calling.mdx
+++ b/pages/generative-apis/how-to/use-function-calling.mdx
@@ -3,13 +3,13 @@ title: How to use function calling
 description: Learn how to implement function calling capabilities using Scaleway's Chat Completions API service.
 tags: chat-completions-api
 dates:
-  validation: 2025-05-26
+  validation: 2025-08-22
   posted: 2024-09-24
 ---
 import Requirements from '@macros/iam/requirements.mdx'
 
 
-Scaleway's Chat Completions API supports function calling as introduced by OpenAI. 
+Scaleway's Chat Completions API supports function calling as introduced by OpenAI. The Responses API allows not only function calling, but also direct tool-calling by the model, e.g. web and file search. However currently only function calling is supported by Scaleway, as our support of Responses API is at beta stage. [Read more about Chat Completions vs Responses API](/generative-apis/how-to/query-language-models/#chat-completions-api-or-responses-api?).
 
 ## What is function calling?
 
@@ -39,7 +39,7 @@ The workflow typically follows these steps:
 4. Execute selected functions
 5. Return results to model for final response
 
-## Code examples
+## Code example for Chat Completions API
 
 <Message type="tip">
     Before diving into the code examples, ensure you have the necessary libraries installed:
@@ -48,7 +48,7 @@ The workflow typically follows these steps:
     ```
 </Message>
 
-We will demonstrate function calling using a flight scheduling system that allows users to check available flights between European airports.
+We will demonstrate function calling with the Chat Completions API using a flight scheduling system that allows users to check available flights between European airports.
 
 ### Basic function definition
 
@@ -152,42 +152,43 @@ As the model detects properly that a tool call is required to answer the questio
 <Message type="important">
     Some models must be told they can use external functions in the system prompt. If you do not provide a system prompt when using tools, Scaleway will automatically add one that works best for that specific model.
 </Message>
-
+  
 ### Call the tool and provide a final answer
 
 To provide the answer, or for more complex interactions, you will need to handle multiple turns of conversation:
 
+
 ```python
 # Process the tool call
 if response.choices[0].message.tool_calls:
-    tool_call = response.choices[0].message.tool_calls[0]
+tool_call = response.choices[0].message.tool_calls[0]
+
+# Execute the function
+if tool_call.function.name == "get_flight_schedule":
+    function_args = json.loads(tool_call.function.arguments)
+    function_response = get_flight_schedule(**function_args)
     
-    # Execute the function
-    if tool_call.function.name == "get_flight_schedule":
-        function_args = json.loads(tool_call.function.arguments)
-        function_response = get_flight_schedule(**function_args)
-        
-        # Add results to the conversation
-        messages.extend([
-            {
-                "role": "assistant",
-                "content": None,
-                "tool_calls": [tool_call]
-            },
-            {
-                "role": "tool",
-                "name": tool_call.function.name,
-                "content": json.dumps(function_response),
-                "tool_call_id": tool_call.id
-            }
-        ])
-        
-        # Get final response
-        final_response = client.chat.completions.create(
-            model="llama-3.1-70b-instruct",
-            messages=messages
-        )
-        print(final_response.choices[0].message.content)
+    # Add results to the conversation
+    messages.extend([
+        {
+            "role": "assistant",
+            "content": None,
+            "tool_calls": [tool_call]
+        },
+        {
+            "role": "tool",
+            "name": tool_call.function.name,
+            "content": json.dumps(function_response),
+            "tool_call_id": tool_call.id
+        }
+    ])
+    
+    # Get final response
+    final_response = client.chat.completions.create(
+        model="llama-3.1-70b-instruct",
+        messages=messages
+    )
+    print(final_response.choices[0].message.content)
 ```
 
 ### Parallel function calling
@@ -294,6 +295,11 @@ messages = [
 ]
 ```
 
+## Code example for Responses API
+
+See the OpenAPI documentation for a fully worked example on [function calling using the Responses API](https://platform.openai.com/docs/guides/function-calling#function-tool-example). Note that Scaleway's support of the Responses API is currently at beta stage - [find out more](/generative-apis/how-to/query-language-models/#chat-completions-api-or-responses-api).
+
+
 ## Best practices
 
 When implementing function calling, follow these guidelines for optimal results:
diff --git a/pages/generative-apis/how-to/use-structured-outputs.mdx b/pages/generative-apis/how-to/use-structured-outputs.mdx
index c1c05fd6bf..7868f3d858 100644
--- a/pages/generative-apis/how-to/use-structured-outputs.mdx
+++ b/pages/generative-apis/how-to/use-structured-outputs.mdx
@@ -3,22 +3,22 @@ title:  How to use structured outputs
 description: Learn how to get consistent JSON format responses using Scaleway's Chat Completions API service.
 tags: chat-completions-api
 dates:
-  validation: 2025-05-12
+  validation: 2025-08-22
   posted: 2024-09-17
 ---
 import Requirements from '@macros/iam/requirements.mdx'
-
+import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx'
 
 
 Structured outputs allow users to get consistent, machine-readable JSON format responses from language models.
 JSON, as a widely-used format, enables seamless integration with a variety of platforms and applications. Its interoperability is crucial for developers aiming to incorporate AI functionality into their current systems with minimal adjustments.
 
-By specifying a response format when using the [Chat Completions API](/generative-apis/api-cli/using-chat-api/), you can ensure that responses are returned in a JSON structure.
+By specifying a response format when using the Chat Completions API or Responses API, you can ensure that responses are returned in a JSON structure.
 There are two main modes for generating JSON: **Object Mode** (schemaless) and **Schema Mode** (deterministic, structured output).
 
 There are several ways to interact with language models:
 - The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
-- Via the [Chat API](/generative-apis/how-to/query-language-models/#querying-language-models-via-api)
+- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response)
 
 <Requirements />
 
@@ -41,9 +41,13 @@ There are several ways to interact with language models:
    - JSON mode is older and has been used by developers since early API implementations, but lacks reliability in response formats.
 
 <Message type="note">
-    - All LLMs in the Scaleway library support **Structured outputs** and **JSON mode**. However, a schemaless **JSON mode** will produce lower quality results and is not recommended.
+    - All LLMs in the Scaleway library support **Structured outputs** and **JSON mode**. However, a schemaless **JSON mode** will produce lower quality results and is not recommended. Note that structured output is more reliably validated and more richly parsed with the Responses API.
 </Message>
 
+## Chat Completions API or Responses API?
+
+<ChatCompVsResponsesApi />
+
 ## Code examples
 
 <Message type="tip">
@@ -55,80 +59,132 @@ There are several ways to interact with language models:
 
 The following Python examples demonstrate how to use **Structured outputs** to generate structured responses.
 
-We are using the base code below to send our LLM a voice note transcript to structure:
+We using the base code below to send our LLM a voice note transcript to structure:
 
-```python
-import json
-from openai import OpenAI
-from pydantic import BaseModel, Field
+### Defining the voice note and transcript
 
-# Set your preferred model
-MODEL = "llama-3.1-8b-instruct"
+    ```python
+    import json
+    from openai import OpenAI
+    from pydantic import BaseModel, Field
 
-# Set your API key
-API_KEY = "<SCW_API_KEY>"
+    # Set your preferred model
+    MODEL = "llama-3.1-8b-instruct" ## or "gpt-oss-120b" for the Responses API
 
-client = OpenAI(
-    base_url="https://api.scaleway.ai/v1",
-    api_key=API_KEY,
-)
+    # Set your API key
+    API_KEY = "<SCW_API_KEY>"
 
-# Define the schema for the output using Pydantic
-class VoiceNote(BaseModel):
-    title: str = Field(description="A title for the voice note")
-    summary: str = Field(description="A short one sentence summary of the voice note.")
-    actionItems: list[str] = Field(description="A list of action items from the voice note")
-
-# Transcript to use for the output
-TRANSCRIPT = ( 
-    "Good evening! It's 6:30 PM, and I'm just getting home from work. I have a few things to do " 
-    "before I can relax. First, I'll need to water the plants in the garden since they've been in the sun all day. " 
-    "Then, I'll start preparing dinner. I think a simple pasta dish with some garlic bread should be good. " 
-    "While that's cooking, I'll catch up on a couple of phone calls I missed earlier."
-)
-```
+    client = OpenAI(
+        base_url="https://api.scaleway.ai/v1",
+        api_key=API_KEY,
+    )
+
+    # Define the schema for the output using Pydantic
+    class VoiceNote(BaseModel):
+        title: str = Field(description="A title for the voice note")
+        summary: str = Field(description="A short one sentence summary of the voice note.")
+        actionItems: list[str] = Field(description="A list of action items from the voice note")
+
+    # Transcript to use for the output
+    TRANSCRIPT = ( 
+        "Good evening! It's 6:30 PM, and I'm just getting home from work. I have a few things to do " 
+        "before I can relax. First, I'll need to water the plants in the garden since they've been in the sun all day. " 
+        "Then, I'll start preparing dinner. I think a simple pasta dish with some garlic bread should be good. " 
+        "While that's cooking, I'll catch up on a couple of phone calls I missed earlier."
+    )
+    ```
 
 ### Using structured outputs with JSON schema (Pydantic)
 
 Using [Pydantic](https://docs.pydantic.dev/latest/concepts/models/), users can define the schema as a Python class and enforce the model to return results adhering to this schema.
 
-```python
-extract = client.chat.completions.create(
-    messages=[
-        {
-            "role": "system",
-            "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character.",
-        },
-        {
-            "role": "user",
-            "content": TRANSCRIPT,
+<Tabs id="structured-output_pydantic">
+    <TabsTab label="Chat Completions API">
+
+    ```python
+    extract = client.chat.completions.create(
+        messages=[
+            {
+                "role": "system",
+                "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character.",
+            },
+            {
+                "role": "user",
+                "content": TRANSCRIPT,
+            },
+        ],
+        model=MODEL,
+        response_format={
+            "type": "json_schema",
+            "json_schema": {
+                "name": "VoiceNote",
+                "schema": VoiceNote.model_json_schema(),
+            }
         },
-    ],
-    model=MODEL,
-    response_format={
-        "type": "json_schema",
-        "json_schema": {
-            "name": "VoiceNote",
-            "schema": VoiceNote.model_json_schema(),
+    )
+    output = json.loads(extract.choices[0].message.content)
+    print(json.dumps(output, indent=2))
+    ```
+
+    Output example:
+    ```json
+    {
+    "title": "To-Do List",
+    "summary": "Returning from work, need to complete tasks before relaxing",
+    "actionItems": [
+        "Water garden",
+        "Prepare dinner: pasta dish with garlic bread",
+        "Catch up on missed phone calls"
+    ]
+    }
+    ```
+    </TabsTab>
+    <TabsTab label="Responses API (Beta)">
+    ```python
+
+    extract = client.responses.create(
+        input=[
+            {
+                "role": "system",
+                "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character.",
+            },
+            {
+                "role": "user",
+                "content": TRANSCRIPT,
+            }
+        ],
+        model=MODEL,
+        text={
+            "format": {
+                "type": "json_schema",
+                "name": "VoiceNote",
+                "schema": VoiceNote.model_json_schema()
+            }
         }
-    },
-)
-output = json.loads(extract.choices[0].message.content)
-print(json.dumps(output, indent=2))
-```
+    )
+
+    # Print the generated response. Here, the last output message will contain the final content.
+    # Previous outputs will contain reasoning content.
+    output = json.loads(extract.output[-1].content[0].text)
+    print(json.dumps(output, indent=2))
+    ```
+
+    Output example:
+
+    ```json
+    {
+        "actionItems": [
+            "Water the plants in the garden",
+            "Prepare a simple pasta dish with garlic bread",
+            "Catch up on missed phone calls while dinner is cooking"
+        ],
+        "summary": "The user plans to water plants, cook dinner, and make phone calls after arriving home at 6:30\u202fPM.",
+        "title": "Evening Tasks"
+    }
+    ```
+    </TabsTab>
+</Tabs>
 
-Output example:
-```json
-{
-  "title": "To-Do List",
-  "summary": "Returning from work, need to complete tasks before relaxing",
-  "actionItems": [
-    "Water garden",
-    "Prepare dinner: pasta dish with garlic bread",
-    "Catch up on missed phone calls"
-  ]
-}
-```
 
 <Message type="tip">
     Structured outputs accuracy may vary between models. For instance, with Llama models, we suggest adding a description of the field looked for in `response_format` and in `system` or `user` messages. In our example this would mean adding a system prompt similar to:
@@ -140,7 +196,7 @@ Output example:
 
 ### Using structured outputs with JSON schema (manual definition)
 
-Alternatively, users can manually define the JSON schema inline when calling the model.
+Alternatively, users can manually define the JSON schema inline when calling the model. See below an example for doing this with the Chat Completions API:
 
 ```python
 extract = client.chat.completions.create(
@@ -182,13 +238,13 @@ print(json.dumps(output, indent=2))
 Output example:
 ```json
 {
-  "title": "Evening Routine",
-  "actionItems": [
+"title": "Evening Routine",
+"actionItems": [
     "Water the plants",
     "Cook dinner (pasta and garlic bread)",
     "Make phone calls"
-  ],
-  "summary": "Made a list of tasks to accomplish before relaxing tonight"
+],
+"summary": "Made a list of tasks to accomplish before relaxing tonight"
 }
 ```
 
@@ -199,11 +255,10 @@ Output example:
 ### Using JSON mode (schemaless, Legacy method)
 
 <Message type="important">
-  - When using the OpenAI SDKs as in the examples above, you are expected to set `additionalProperties` to false, and to specify all your properties as required.
-  - JSON mode: It is important to explicitly ask the model to generate a JSON output either in the system prompt or user prompt. To prevent infinite generations, model providers most often encourage users to ask the model for short JSON objects. Prompt example: `Only answer in JSON using '{' as the first character.`.
+  JSON mode: It is important to explicitly ask the model to generate a JSON output either in the system prompt or user prompt. To prevent infinite generations, model providers most often encourage users to ask the model for short JSON objects. Prompt example: `Only answer in JSON using '{' as the first character.`.
 </Message>
 
-In JSON mode, you can prompt the model to output a JSON object without enforcing a strict schema.
+In JSON mode, you can prompt the model to output a JSON object without enforcing a strict schema. See below an example for the Chat Completions API:
 
 ```python
 extract = client.chat.completions.create(
@@ -229,21 +284,21 @@ print(json.dumps(output, indent=2))
 Output example:
 ```json
 {
-  "current_time": "6:30 PM",
-  "tasks": [
+"current_time": "6:30 PM",
+"tasks": [
     {
-      "task": "water the plants in the garden",
-      "priority": "high"
+    "task": "water the plants in the garden",
+    "priority": "high"
     },
     {
-      "task": "prepare dinner (pasta with garlic bread)",
-      "priority": "high"
+    "task": "prepare dinner (pasta with garlic bread)",
+    "priority": "high"
     },
     {
-      "task": "catch up on phone calls",
-      "priority": "medium"
+    "task": "catch up on phone calls",
+    "priority": "medium"
     }
-  ]
+]
 }
 ```
 
diff --git a/pages/generative-apis/index.mdx b/pages/generative-apis/index.mdx
index 48e9e96d47..a48aad7e31 100644
--- a/pages/generative-apis/index.mdx
+++ b/pages/generative-apis/index.mdx
@@ -44,6 +44,14 @@ description: Dive into Scaleway Generative APIs with our quickstart guides, how-
     />
 </Grid>
 
+<ClickableBanner
+    productLogo="cli"
+    title="Generative APIs Developer Reference"
+    description="Developer reference documentation for Scaleway Generative APIs."
+    url="https://www.scaleway.com/en/developers/api/generative-apis/"
+    label="Go to Generative APIs"
+/>
+
 ## Changelog
 
 <ChangelogList
diff --git a/pages/generative-apis/quickstart.mdx b/pages/generative-apis/quickstart.mdx
index 1b34be3bc7..87592a1c61 100644
--- a/pages/generative-apis/quickstart.mdx
+++ b/pages/generative-apis/quickstart.mdx
@@ -14,10 +14,6 @@ No need to configure hardware or deploy your own models.
 
 Hosted in European data centers and priced competitively per million tokens used, these APIs enable efficient and scalable integration of AI capabilities into your applications.
 
-<Message type="important">
-  This service is free while in beta. [Specific terms and conditions](https://www.scaleway.com/en/contracts/) apply.
-</Message>
-
 <Requirements />
 
   - A Scaleway account logged into the [console](https://console.scaleway.com)
diff --git a/pages/managed-inference/reference-content/openai-compatibility.mdx b/pages/managed-inference/reference-content/openai-compatibility.mdx
index 95a572c3a2..d2a8b3e7b1 100644
--- a/pages/managed-inference/reference-content/openai-compatibility.mdx
+++ b/pages/managed-inference/reference-content/openai-compatibility.mdx
@@ -7,12 +7,14 @@ dates:
   posted: 2024-05-06
 ---
 
+import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx'
+
 You can use any of the OpenAI [official libraries](https://platform.openai.com/docs/libraries/), for example, the [OpenAI Python client library](https://github.com/openai/openai-python) to interact with your Scaleway Managed Inference deployment. 
 This feature is especially beneficial for those looking to seamlessly transition applications already utilizing OpenAI.
 
-## Chat Completions API
+### Chat Completions API or Responses API?
 
-The Chat Completions API is designed for models fine-tuned for conversational tasks (such as X-chat and X-instruct variants).
+<ChatCompVsResponsesApi />
 
 ### CURL