scaleway
diff --git a/‎macros/ai/chat-comp-vs-responses-api.mdx‎
Lines changed: 1 addition & 1 deletion b/‎macros/ai/chat-comp-vs-responses-api.mdx‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎pages/generative-apis/how-to/query-language-models.mdx‎
Lines changed: 7 additions & 5 deletions b/‎pages/generative-apis/how-to/query-language-models.mdx‎
Lines changed: 7 additions & 5 deletions
diff --git a/‎pages/generative-apis/how-to/query-vision-models.mdx‎
Lines changed: 49 additions & 131 deletions b/‎pages/generative-apis/how-to/query-vision-models.mdx‎
Lines changed: 49 additions & 131 deletions
@@ -6,6 +6,6 @@ Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/gener
 
 The **Chat Completions** API was released in 2023, and is an industry standard for building AI applications, being specifically designed for handling multi-turn conversations. It is stateless, but allows users to manage conversation history by appending each new message to the ongoing conversation. It supports `function` tool-calling, where the developer defines a set of functions, which the model can decide whether to call when generating a response. If it decides to call one of these functions, it returns the function name and arguments, and the developer's own code must actually execute the function and feed the result back into the conversation for use by the model.
 
-The **Responses** API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response, though currently only `function` tools are supported by Scaleway. Overall, Scaleway's support for the Responses API is currently at beta stage. All supported Generative API models can be used with Responses API, and note that for the `gtp-oss-120b` model, only the Responses API will allow you to access all of its features. 
+The **Responses** API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response, though currently only `function` tools are supported by Scaleway. Overall, **Scaleway's support for the Responses API is currently at beta stage**. All supported Generative API models can be used with Responses API, and note that for the `gtp-oss-120b` model, only the Responses API will allow you to access all of its features. 
 
 For full details on the differences between these APIs, see the [official OpenAI documentation](https://platform.openai.com/docs/guides/migrate-to-responses).
@@ -82,15 +82,15 @@ You can now create a chat completion using either the Chat Completions or Respon
         model="llama-3.1-8b-instruct",
         messages=[{"role": "user", "content": "Describe a futuristic city with advanced technology and green energy solutions."}],
         temperature=0.2,  # Adjusts creativity
-        max_tokens=100,   # Limits the length of the output
+        max_completion_tokens=100,   # Limits the length of the output
         top_p=0.7         # Controls diversity through nucleus sampling. You usually only need to use temperature.
     )
 
     # Print the generated response
     print(response.choices[0].message.content)
     ```
 
-    This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively.
+    This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_completion_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively.
 
     </TabsTab>
 
@@ -129,6 +129,8 @@ A conversation style may include a default system prompt. You may set this promp
   ]
   ```
 
+Adding such a system prompt can also help resolve issues if you receive responses such as `I'm not sure what tools are available to me. Can you please provide a library of tools that I can use to generate a response?`.
+
 ### Model parameters and their effects
 
 The following parameters will influence the output of the model:
@@ -139,7 +141,7 @@ The following parameters will influence the output of the model:
 
   - **`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`.
   - **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
-  - **`max_tokens`**: The maximum number of tokens (words or parts of words) in the generated output.
+  - **`max_completion_tokens`**: The maximum number of tokens (words or parts of words) in the generated output.
   - **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`.
   - **`stop`**: A string or list of strings where the model will stop generating further tokens. This is useful for controlling the end of the output.
 
@@ -210,7 +212,7 @@ The service also supports asynchronous mode for any chat completion.
   )
 
   async def main():
-      stream = await client.chat.completions.create(
+      stream = client.chat.completions.create(
           model="llama-3.1-8b-instruct",
           messages=[{
           "role": "user",
@@ -237,7 +239,7 @@ The service also supports asynchronous mode for any chat completion.
 
   async def main():
       stream = await client.responses.create(
-          model="llama-3.1-8b-instruct",
+          model="gpt-oss-120b",
           input=[{
               "role": "user",
               "content": "Sing me a song"
 
@@ -109,15 +109,10 @@ You can now create a chat completion:
     print(response.choices[0].message.content)
     ```
     </TabsTab>
-    <TabsTab label="Responses API">
+    <TabsTab label="Responses API (Beta)">
     ```python
     from openai import OpenAI
 
-    # Initialize the client with your base URL and API key
-    client = OpenAI(
-        base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
-        api_key="<SCW_SECRET_KEY>"  # Your unique API secret key from Scaleway
-    )
     # Create a chat completion using the 'mistral-small-3.2-24b-instruct-2506' model
     response = client.responses.create(
         model="mistral-small-3.2-24b-instruct-2506",
@@ -169,7 +164,7 @@ To encode Base64 images in Python, you first need to install `Pillow` library:
 pip install pillow
 ```
 
-Then, the following Python code sample shows you how to encode an image in Base64 format and pass it to your request payload:
+Then, the following Python code sample shows you how to encode an image in Base64 format and pass it to a request payload for the Chat Completions API:
 
 ```python
 import base64
@@ -207,9 +202,9 @@ payload = {
 
 ```
 
-### Model parameters and their effects
+### Model parameters and their effects 
 
-The following parameters will influence the output of the model:
+When using the Chat Completions API, the following parameters will influence the output of the model:
 
 - **`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`. The content is an array that can contain text and/or image objects.
 - **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative.
@@ -225,142 +220,65 @@ The following parameters will influence the output of the model:
 
 By default, the outputs are returned to the client only after the generation process is complete. However, a common alternative is to stream the results back to the client as they are generated. This is particularly useful in chat applications, where it allows the client to view the results incrementally as each token is produced.
 
-Examples are provided below:
-
-<Tabs id="vision-streaming">
-    <TabsTab label="Chat Completions API">
-    ```python
-    from openai import OpenAI
-
-    client = OpenAI(
-        base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
-        api_key="<SCW_API_KEY>"  # Your unique API key from Scaleway
-    )
-    response = client.chat.completions.create(
-    model="pixtral-12b-2409",
-    messages=[{
-        "role": "user", 
-        "content": [
-            {"type": "text", "text": "What is this image?"},
-            {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
-        ]
-    }],
-    stream=True,
-    )
-
-    for chunk in response:
-        if chunk.choices and chunk.choices[0].delta.content:
-            print(chunk.choices[0].delta.content, end="")
-    ```
-    </TabsTab>
-    <TabsTab label="Responses API">
+An example for the Chat Completions API is provided below:
 
-    ```python
-    from openai import OpenAI
+```python
+from openai import OpenAI
 
-    client = OpenAI(
-        base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
-        api_key="<SCW_API_KEY>"  # Your unique API key from Scaleway
-    )
+client = OpenAI(
+    base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
+    api_key="<SCW_API_KEY>"  # Your unique API key from Scaleway
+)
+response = client.chat.completions.create(
+model="pixtral-12b-2409",
+messages=[{
+    "role": "user", 
+    "content": [
+        {"type": "text", "text": "What is this image?"},
+        {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
+    ]
+}],
+stream=True,
+)
 
-    # Stream a response from the vision model
-    with client.responses.stream(
-        model="pixtral-12b-2409",
-        input=[
-            {
-                "role": "user",
-                "content": [
-                    {"type": "text", "text": "What is this image?"},
-                    {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
-                ]
-            }
-        ]
-    ) as stream:
-        for event in stream:
-            # Print incremental text as it arrives
-            if event.type == "response.output_text.delta":
-                print(event.delta, end="")
-
-    # Optionally, get the final aggregated response
-    final_response = stream.get_final_response()
-    print("\nFinal output:\n", final_response.output_text)
+for chunk in response:
+    if chunk.choices and chunk.choices[0].delta.content:
+        print(chunk.choices[0].delta.content, end="")
 ```
-    </TabsTab>
-</Tabs>
 
 
 ## Async
 
-The service also supports asynchronous mode for any chat completion.
-
-<Tabs id="vision-async">
-    <TabsTab label="Chat Completions API">
-    ```python
+The service also supports asynchronous mode for any chat completion. An example for the Chat Completions API is provided below:
 
-    import asyncio
-    from openai import AsyncOpenAI
-
-    client = AsyncOpenAI(
-        base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
-        api_key="<SCW_API_KEY>"  # Your unique API key from Scaleway
-    )
+```python
 
-    async def main():
-        stream = await client.chat.completions.create(
-            model="pixtral-12b-2409",
-            messages=[{
-                "role": "user", 
-                "content": [
-                    {"type": "text", "text": "What is this image?"},
-                    {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
-                ]
-            }],
-            stream=True,
-        )
-        async for chunk in stream:
-            if chunk.choices and chunk.choices[0].delta.content:
-                print(chunk.choices[0].delta.content, end="")
-
-    asyncio.run(main())
-    ```
-    </TabsTab>
-    <TabsTab label="Responses API">
-    ```python
-    import asyncio
-    from openai import AsyncOpenAI
+import asyncio
+from openai import AsyncOpenAI
 
-    client = AsyncOpenAI(
-        base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
-        api_key="<SCW_API_KEY>"  # Your unique API key from Scaleway
-    )
+client = AsyncOpenAI(
+    base_url="https://api.scaleway.ai/v1",  # Scaleway's Generative APIs service URL
+    api_key="<SCW_API_KEY>"  # Your unique API key from Scaleway
+)
 
-    async def main():
-        # Stream a response from the vision model
-        async with client.responses.stream(
-            model="pixtral-12b-2409",
-            input=[
-                {
-                    "role": "user",
-                    "content": [
-                        {"type": "text", "text": "What is this image?"},
-                        {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
-                    ]
-                }
+async def main():
+    stream = await client.chat.completions.create(
+        model="pixtral-12b-2409",
+        messages=[{
+            "role": "user", 
+            "content": [
+                {"type": "text", "text": "What is this image?"},
+                {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
             ]
-        ) as stream:
-            async for event in stream:
-                # Print incremental text as it arrives
-                if event.type == "response.output_text.delta":
-                    print(event.delta, end="")
-
-            # Optionally, get the final aggregated response
-            final_response = await stream.get_final_response()
-            print("\nFinal output:\n", final_response.output_text)
+        }],
+        stream=True,
+    )
+    async for chunk in stream:
+        if chunk.choices and chunk.choices[0].delta.content:
+            print(chunk.choices[0].delta.content, end="")
 
-    asyncio.run(main())
-    ```
-    </TabsTab>
-</Tabs>
+asyncio.run(main())
+```
 
 ## Frequently Asked Questions