From e2cbfff29b638bd4d5c8c730e275b9295128183a Mon Sep 17 00:00:00 2001 From: Rowena Date: Thu, 21 Aug 2025 17:49:00 +0200 Subject: [PATCH 1/6] fix(genai): responses api --- pages/generative-apis/faq.mdx | 2 +- .../how-to/query-language-models.mdx | 19 +++++++++++++++++-- pages/generative-apis/index.mdx | 8 ++++++++ pages/generative-apis/quickstart.mdx | 4 ---- 4 files changed, 26 insertions(+), 7 deletions(-) diff --git a/pages/generative-apis/faq.mdx b/pages/generative-apis/faq.mdx index 99cd33b139..84010a2d6f 100644 --- a/pages/generative-apis/faq.mdx +++ b/pages/generative-apis/faq.mdx @@ -40,7 +40,7 @@ No, you cannot increase maximum output tokens above [limits for each models](/ge These limits are in place to protect you against: - Long generation which may be ended by an HTTP timeout. Limits are designed to ensure a model will send its HTTP response in less than 5 minutes. - Uncontrolled billing, as several models are known to be able to enter infinite generation loops (specific prompts can make the model generate the same sentence over and over, without stopping at all). -If you require higher maximum output tokens, you can use [Managed Inference](https://console.scaleway.com/inference/deployments) where these limts do not apply (as your bill will be limited by the size of your deployment). +If you require higher maximum output tokens, you can use [Managed Inference](https://console.scaleway.com/inference/deployments) where these limits do not apply (as your bill will be limited by the size of your deployment). ### Can I use OpenAI libraries and APIs with Scaleway's Generative APIs? Yes, Scaleway's Generative APIs are designed to be compatible with OpenAI libraries and SDKs, including the OpenAI Python client library and LangChain SDKs. This allows for seamless integration with existing workflows. diff --git a/pages/generative-apis/how-to/query-language-models.mdx b/pages/generative-apis/how-to/query-language-models.mdx index 72c31e7e73..bbdc2b0b16 100644 --- a/pages/generative-apis/how-to/query-language-models.mdx +++ b/pages/generative-apis/how-to/query-language-models.mdx @@ -13,7 +13,7 @@ Scaleway's Generative APIs service allows users to interact with powerful langua There are several ways to interact with language models: - The Scaleway [console](https://console.scaleway.com) provides complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time. -- Via the [Chat API](/generative-apis/how-to/query-language-models/#querying-language-models-via-api) +- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) @@ -39,7 +39,22 @@ The web playground displays. ## Querying language models via API -The [Chat API](/generative-apis/api-cli/using-chat-api/) is an OpenAI-compatible REST API for generating and manipulating conversations. +Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) are an OpenAI-compatible REST APIs for generating and manipulating conversations. + +### Chat Completions API or Responses API? + +The table below compares the Chat Completions API to the Responses API. TODO CONTINUE HERE + +| Aspect | Responses API | Chat Completions API | +|-------------------------|------------------------------------------------|-----------------------------------------| +| **Description | Unified API for model responses (successor to Chat + Assistants). Offers tool-calling by built-in tools (e.g. web or file search) while the model generates a responses, though currently only `function` tools are supported by Scaleway. | Older API for chat-style completions. Offers only `function` tool-calling. | +| **Status** | Beta | GA | +| **Endpoint** | `/v1/{project_id}/responses` | `/v1/{project_id}/chat/completions` | +| **Use cases** | Agentic apps, tool-augmented workflows, multi-step tasks, future-proof apps | Simple chatbots, Q&A, summarization, stateless interactions | +| **Features** | - Plain chat completions
- Tool calling (Code Interpreter, image gen, file search, web search)
- MCP tool integration
- Background mode
- Reasoning summaries | - Plain chat completions only
- Function calling (basic tool use)
- No MCP, no background mode, no reasoning summaries | + +| **Long-term support** | Future standard API (will replace Chat & Assistants) | Maintained for now, EOL expected mid-2026 | +| **Complexity** | More powerful but requires newer SDK methods (`client.responses.create`) | Simpler, lighter (`client.chat.completions.create`) | You can query the models programmatically using your favorite tools or languages. In the following example, we will use the OpenAI Python client. diff --git a/pages/generative-apis/index.mdx b/pages/generative-apis/index.mdx index 48e9e96d47..a48aad7e31 100644 --- a/pages/generative-apis/index.mdx +++ b/pages/generative-apis/index.mdx @@ -44,6 +44,14 @@ description: Dive into Scaleway Generative APIs with our quickstart guides, how- /> + + ## Changelog - This service is free while in beta. [Specific terms and conditions](https://www.scaleway.com/en/contracts/) apply. - - - A Scaleway account logged into the [console](https://console.scaleway.com) From 78ef48cea85eb26cde6e7612290bc103663f4bf0 Mon Sep 17 00:00:00 2001 From: Rowena Date: Fri, 22 Aug 2025 17:53:18 +0200 Subject: [PATCH 2/6] feat(ai): continue to integrate responses api --- macros/ai/chat-comp-vs-responses-api.mdx | 11 + .../how-to/query-language-models.mdx | 219 ++++++---- .../how-to/query-vision-models.mdx | 261 +++++++++--- .../how-to/use-function-calling.mdx | 233 +++++++--- .../how-to/use-structured-outputs.mdx | 402 ++++++++++++------ 5 files changed, 792 insertions(+), 334 deletions(-) create mode 100644 macros/ai/chat-comp-vs-responses-api.mdx diff --git a/macros/ai/chat-comp-vs-responses-api.mdx b/macros/ai/chat-comp-vs-responses-api.mdx new file mode 100644 index 0000000000..817a59935b --- /dev/null +++ b/macros/ai/chat-comp-vs-responses-api.mdx @@ -0,0 +1,11 @@ +--- +macro: chat-comp-vs-responses-api +--- + +Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) are OpenAI-compatible REST APIs that can be used for generating and manipulating conversations. The Chat Completions API is focused on generating conversational responses, while the Responses API is a more general REST API for chat, structured outputs, tool use, and multimodal inputs. + +The **Chat Completions** API was released in 2023, and is an industry standard for building AI applications, being specifically designed for handling multi-turn conversations. It is stateless, but allows users to manage conversation history by appending each new message to the ongoing conversation. It supports `function` tool-calling, where the developer defines a set of functions, which the model can decide whether to call when generating a response. If it decides to call one of these functions, it returns the function name and arguments, and the developer's own code must actually execute the function and feed the result back into the conversation for use by the model. + +The **Responses** API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response, though currently only `function` tools are supported by Scaleway. Overall, Scaleway's support for the Responses API is currently at beta stage. All supported Generative API models can be used with Responses API, and note that for the `gtp-oss-120b` model, only the Responses API will allow you to access all of its features. + +For full details on the differences between these APIs, see the [official OpenAI documentation](https://platform.openai.com/docs/guides/migrate-to-responses). \ No newline at end of file diff --git a/pages/generative-apis/how-to/query-language-models.mdx b/pages/generative-apis/how-to/query-language-models.mdx index bbdc2b0b16..7dd88aae02 100644 --- a/pages/generative-apis/how-to/query-language-models.mdx +++ b/pages/generative-apis/how-to/query-language-models.mdx @@ -3,11 +3,11 @@ title: How to query language models description: Learn how to interact with powerful language models using Scaleway's Generative APIs service. tags: generative-apis ai-data language-models chat-completions-api dates: - validation: 2025-05-12 + validation: 2025-08-22 posted: 2024-08-28 --- import Requirements from '@macros/iam/requirements.mdx' - +import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx' Scaleway's Generative APIs service allows users to interact with powerful language models hosted on the platform. @@ -39,25 +39,12 @@ The web playground displays. ## Querying language models via API -Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) are an OpenAI-compatible REST APIs for generating and manipulating conversations. +You can query the models programmatically using your favorite tools or languages. +In the example that follows, we will use the OpenAI Python client. ### Chat Completions API or Responses API? -The table below compares the Chat Completions API to the Responses API. TODO CONTINUE HERE - -| Aspect | Responses API | Chat Completions API | -|-------------------------|------------------------------------------------|-----------------------------------------| -| **Description | Unified API for model responses (successor to Chat + Assistants). Offers tool-calling by built-in tools (e.g. web or file search) while the model generates a responses, though currently only `function` tools are supported by Scaleway. | Older API for chat-style completions. Offers only `function` tool-calling. | -| **Status** | Beta | GA | -| **Endpoint** | `/v1/{project_id}/responses` | `/v1/{project_id}/chat/completions` | -| **Use cases** | Agentic apps, tool-augmented workflows, multi-step tasks, future-proof apps | Simple chatbots, Q&A, summarization, stateless interactions | -| **Features** | - Plain chat completions
- Tool calling (Code Interpreter, image gen, file search, web search)
- MCP tool integration
- Background mode
- Reasoning summaries | - Plain chat completions only
- Function calling (basic tool use)
- No MCP, no background mode, no reasoning summaries | - -| **Long-term support** | Future standard API (will replace Chat & Assistants) | Maintained for now, EOL expected mid-2026 | -| **Complexity** | More powerful but requires newer SDK methods (`client.responses.create`) | Simpler, lighter (`client.chat.completions.create`) | - -You can query the models programmatically using your favorite tools or languages. -In the following example, we will use the OpenAI Python client. + ### Installing the OpenAI SDK @@ -83,48 +70,95 @@ client = OpenAI( ### Generating a chat completion -You can now create a chat completion, for example with the `llama-3.1-8b-instruct` model: +You can now create a chat completion using either the Chat Completions or Responses API, as shown in the following examples: , -```python -# Create a chat completion using the 'llama-3.1-8b-instruct' model -response = client.chat.completions.create( - model="llama-3.1-8b-instruct", - messages=[{"role": "user", "content": "Describe a futuristic city with advanced technology and green energy solutions."}], - temperature=0.2, # Adjusts creativity - max_tokens=100, # Limits the length of the output - top_p=0.7 # Controls diversity through nucleus sampling. You usually only need to use temperature. -) + -# Print the generated response -print(response.choices[0].message.content) -``` + + + ```python + # Create a chat completion using the 'llama-3.1-8b-instruct' model + response = client.chat.completions.create( + model="llama-3.1-8b-instruct", + messages=[{"role": "user", "content": "Describe a futuristic city with advanced technology and green energy solutions."}], + temperature=0.2, # Adjusts creativity + max_tokens=100, # Limits the length of the output + top_p=0.7 # Controls diversity through nucleus sampling. You usually only need to use temperature. + ) + + # Print the generated response + print(response.choices[0].message.content) + ``` + + This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively. + + + + + + ```python + # Create a chat completion using the 'gpt-oss-120b' model + response = client.responses.create( + model="gpt-oss-120b", + input=[{"role": "user", "content": "Briefly describe a futuristic city with advanced technology and green energy solutions."}], + temperature=0.2, # Adjusts creativity + max_output_tokens=100, # Limits the length of the output + top_p=0.7 # Controls diversity through nucleus sampling. You usually only need to use temperature. + + ) + # Print the generated response. Here, the last output message will contain the final content. + # Previous outputs will contain reasoning content. + print(response.output[-1].content[0].text) + ``` + + -This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively. A conversation style may include a default system prompt. You may set this prompt by setting the first message with the role system. For example: -```python -[ - { - "role": "system", - "content": "You are Xavier Niel." - }, - { - "role": "user", - "content": "Hello, what is your name?" - } -] -``` + ```python + [ + { + "role": "system", + "content": "You are Xavier Niel." + }, + { + "role": "user", + "content": "Hello, what is your name?" + } + ] + ``` ### Model parameters and their effects The following parameters will influence the output of the model: -- **`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`. -- **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative. -- **`max_tokens`**: The maximum number of tokens (words or parts of words) in the generated output. -- **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`. -- **`stop`**: A string or list of strings where the model will stop generating further tokens. This is useful for controlling the end of the output. + + + + + - **`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`. + - **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative. + - **`max_tokens`**: The maximum number of tokens (words or parts of words) in the generated output. + - **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`. + - **`stop`**: A string or list of strings where the model will stop generating further tokens. This is useful for controlling the end of the output. + + See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) for a full list of all available parameters. + + + + + + - **`input`**: A single text string, or an array of string/multi-modal inputs to provide to the model to generate a response. When using the array option, you can define a `role` and list of `content` inputs of different types (texts, files, images etc.) + - **`max_output_tokens`**: A maximum number of output tokens that can be generated for a completion. Different default maximum values +are enforced for each model, to avoid edge cases where tokens are generated indefinitely. + - **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative. + - **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`. + + See the [dedicated API documentation](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) for a full list of all available parameters. + + + If you encounter an error such as "Forbidden 403" refer to the [API documentation](/generative-apis/api-cli/understanding-errors) for troubleshooting tips. @@ -133,7 +167,8 @@ The following parameters will influence the output of the model: ## Streaming By default, the outputs are returned to the client only after the generation process is complete. However, a common alternative is to stream the results back to the client as they are generated. This is particularly useful in chat applications, where it allows the client to view the results incrementally as each token is produced. -Following is an example using the chat completions API: + +Following is an example using the Chat Completions API, but the `stream` parameter can be set in the same way with the Responses API. ```python from openai import OpenAI @@ -160,28 +195,62 @@ for chunk in response: The service also supports asynchronous mode for any chat completion. -```python - -import asyncio -from openai import AsyncOpenAI - -client = AsyncOpenAI( - base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL - api_key="" # Your unique API key from Scaleway -) - -async def main(): - stream = await client.chat.completions.create( - model="llama-3.1-8b-instruct", - messages=[{ - "role": "user", - "content": "Sing me a song", - }], - stream=True, - ) - async for chunk in stream: - if chunk.choices and chunk.choices[0].delta.content: - print(chunk.choices[0].delta.content, end="") - -asyncio.run(main()) -``` + + + + + ```python + + import asyncio + from openai import AsyncOpenAI + + client = AsyncOpenAI( + base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL + api_key="" # Your unique API key from Scaleway + ) + + async def main(): + stream = await client.chat.completions.create( + model="llama-3.1-8b-instruct", + messages=[{ + "role": "user", + "content": "Sing me a song", + }], + stream=True, + ) + async for chunk in stream: + if chunk.choices and chunk.choices[0].delta.content: + print(chunk.choices[0].delta.content, end="") + + asyncio.run(main()) + ``` + + + ```python + import asyncio + from openai import AsyncOpenAI + + client = AsyncOpenAI( + base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL + api_key="" # Your unique API key from Scaleway + ) + + async def main(): + stream = await client.responses.create( + model="llama-3.1-8b-instruct", + input=[{ + "role": "user", + "content": "Sing me a song" + }], + stream=True, + ) + async for event in stream: + if event.type == "response.output_text.delta": + print(event.delta, end="") + elif event.type == "response.completed": + break + + asyncio.run(main()) + ``` + + diff --git a/pages/generative-apis/how-to/query-vision-models.mdx b/pages/generative-apis/how-to/query-vision-models.mdx index cc7b1ef08e..e15c7f59f0 100644 --- a/pages/generative-apis/how-to/query-vision-models.mdx +++ b/pages/generative-apis/how-to/query-vision-models.mdx @@ -7,6 +7,7 @@ dates: posted: 2024-10-30 --- import Requirements from '@macros/iam/requirements.mdx' +import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx' Scaleway's Generative APIs service allows users to interact with powerful vision models hosted on the platform. @@ -17,7 +18,7 @@ Scaleway's Generative APIs service allows users to interact with powerful vision There are several ways to interact with vision models: - The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-vision-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time. -- Via the [Chat API](/generative-apis/how-to/query-vision-models/#querying-vision-models-via-the-api) +- [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) @@ -43,16 +44,19 @@ The web playground displays. ## Querying vision models via the API -The [Chat API](/generative-apis/api-cli/using-chat-api/) is an OpenAI-compatible REST API for generating and manipulating conversations. +You can query vision models programmatically using your favorite tools or languages. -You can query the vision models programmatically using your favorite tools or languages. Vision models take both text and images as inputs. +In the example that follows,we will use the OpenAI Python client. + Unlike traditional language models, vision models will take a content array for the user role, structuring text and images as inputs. -In the following example, we will use the OpenAI Python client. +### Chat Completions API or Responses API? + + ### Installing the OpenAI SDK @@ -78,30 +82,68 @@ client = OpenAI( ### Generating a chat completion -You can now create a chat completion, for example with the `pixtral-12b-2409` model: +You can now create a chat completion: -```python -# Create a chat completion using the 'pixtral-12b-2409' model -response = client.chat.completions.create( - model="pixtral-12b-2409", - messages=[ - { - "role": "user", - "content": [ - {"type": "text", "text": "What is this image?"}, - {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, - ] # Vision models will take a content array with text and image_url objects. - - } - ], - temperature=0.7, # Adjusts creativity - max_tokens=2048, # Limits the length of the output - top_p=0.9 # Controls diversity through nucleus sampling. You usually only need to use temperature. -) + + + ```python + # Create a chat completion using the 'pixtral-12b-2409' model + response = client.chat.completions.create( + model="pixtral-12b-2409", + messages=[ + { + "role": "user", + "content": [ + {"type": "text", "text": "What is this image?"}, + {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, + ] # Vision models will take a content array with text and image_url objects. -# Print the generated response -print(response.choices[0].message.content) -``` + } + ], + temperature=0.7, # Adjusts creativity + max_tokens=2048, # Limits the length of the output + top_p=0.9 # Controls diversity through nucleus sampling. You usually only need to use temperature. + ) + + # Print the generated response + print(response.choices[0].message.content) + ``` + + + ```python + from openai import OpenAI + + # Initialize the client with your base URL and API key + client = OpenAI( + base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL + api_key="" # Your unique API secret key from Scaleway + ) + # Create a chat completion using the 'mistral-small-3.2-24b-instruct-2506' model + response = client.responses.create( + model="mistral-small-3.2-24b-instruct-2506", + input=[ + { + "role": "user", + "content": [ + {"type": "input_text", "text": "What is this image?"}, + {"type": "input_image", + "image_url": "https://picsum.photos/id/32/512/512", + "detail": "auto"} + ] # Vision models will take a content array with text and image_url objects. + + } + ], + temperature=0.7, # Adjusts creativity + max_output_tokens=2048, # Limits the length of the output + top_p=0.9 # Controls diversity through nucleus sampling. You usually only need to use temperature. + ) + + # Print the generated response. Here, the last output message will contain the final content. + # Previous outputs will contain reasoning content. + print(response.output[-1].content[0].text) + ``` + + This code sends messages, prompts and images, to the vision model and returns an answer based on your input. The `temperature`, `max_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively. @@ -182,64 +224,143 @@ The following parameters will influence the output of the model: ## Streaming By default, the outputs are returned to the client only after the generation process is complete. However, a common alternative is to stream the results back to the client as they are generated. This is particularly useful in chat applications, where it allows the client to view the results incrementally as each token is produced. -The following example shows how to use the chat completion API: -```python -from openai import OpenAI +Examples are provided below: -client = OpenAI( - base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL - api_key="" # Your unique API key from Scaleway -) -response = client.chat.completions.create( - model="pixtral-12b-2409", - messages=[{ - "role": "user", - "content": [ - {"type": "text", "text": "What is this image?"}, - {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, - ] - }], - stream=True, -) + + + ```python + from openai import OpenAI + + client = OpenAI( + base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL + api_key="" # Your unique API key from Scaleway + ) + response = client.chat.completions.create( + model="pixtral-12b-2409", + messages=[{ + "role": "user", + "content": [ + {"type": "text", "text": "What is this image?"}, + {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, + ] + }], + stream=True, + ) -for chunk in response: - if chunk.choices and chunk.choices[0].delta.content: - print(chunk.choices[0].delta.content, end="") + for chunk in response: + if chunk.choices and chunk.choices[0].delta.content: + print(chunk.choices[0].delta.content, end="") + ``` + + + + ```python + from openai import OpenAI + + client = OpenAI( + base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL + api_key="" # Your unique API key from Scaleway + ) + + # Stream a response from the vision model + with client.responses.stream( + model="pixtral-12b-2409", + input=[ + { + "role": "user", + "content": [ + {"type": "text", "text": "What is this image?"}, + {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, + ] + } + ] + ) as stream: + for event in stream: + # Print incremental text as it arrives + if event.type == "response.output_text.delta": + print(event.delta, end="") + + # Optionally, get the final aggregated response + final_response = stream.get_final_response() + print("\nFinal output:\n", final_response.output_text) ``` + + + ## Async The service also supports asynchronous mode for any chat completion. -```python + + + ```python -import asyncio -from openai import AsyncOpenAI + import asyncio + from openai import AsyncOpenAI -client = AsyncOpenAI( - base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL - api_key="" # Your unique API key from Scaleway -) + client = AsyncOpenAI( + base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL + api_key="" # Your unique API key from Scaleway + ) -async def main(): - stream = await client.chat.completions.create( - model="pixtral-12b-2409", - messages=[{ - "role": "user", - "content": [ - {"type": "text", "text": "What is this image?"}, - {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, - ] - }], - stream=True, + async def main(): + stream = await client.chat.completions.create( + model="pixtral-12b-2409", + messages=[{ + "role": "user", + "content": [ + {"type": "text", "text": "What is this image?"}, + {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, + ] + }], + stream=True, + ) + async for chunk in stream: + if chunk.choices and chunk.choices[0].delta.content: + print(chunk.choices[0].delta.content, end="") + + asyncio.run(main()) + ``` + + + ```python + import asyncio + from openai import AsyncOpenAI + + client = AsyncOpenAI( + base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL + api_key="" # Your unique API key from Scaleway ) - async for chunk in stream: - if chunk.choices and chunk.choices[0].delta.content: - print(chunk.choices[0].delta.content, end="") -asyncio.run(main()) -``` + async def main(): + # Stream a response from the vision model + async with client.responses.stream( + model="pixtral-12b-2409", + input=[ + { + "role": "user", + "content": [ + {"type": "text", "text": "What is this image?"}, + {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, + ] + } + ] + ) as stream: + async for event in stream: + # Print incremental text as it arrives + if event.type == "response.output_text.delta": + print(event.delta, end="") + + # Optionally, get the final aggregated response + final_response = await stream.get_final_response() + print("\nFinal output:\n", final_response.output_text) + + asyncio.run(main()) + ``` + + ## Frequently Asked Questions diff --git a/pages/generative-apis/how-to/use-function-calling.mdx b/pages/generative-apis/how-to/use-function-calling.mdx index 2bb3bb6397..48ed2d6936 100644 --- a/pages/generative-apis/how-to/use-function-calling.mdx +++ b/pages/generative-apis/how-to/use-function-calling.mdx @@ -3,13 +3,13 @@ title: How to use function calling description: Learn how to implement function calling capabilities using Scaleway's Chat Completions API service. tags: chat-completions-api dates: - validation: 2025-05-26 + validation: 2025-08-22 posted: 2024-09-24 --- import Requirements from '@macros/iam/requirements.mdx' -Scaleway's Chat Completions API supports function calling as introduced by OpenAI. +Scaleway's Chat Completions API supports function calling as introduced by OpenAI. The Responses API allows not only function calling, but also direct tool-calling by the model, e.g. web and file search. However currently only function calling is supported by Scaleway, as our support of Responses API is at beta stage. [Read more about Chat Completions vs Responses API](/generative-apis/how-to/query-language-models/#chat-completions-api-or-responses-api?). ## What is function calling? @@ -110,85 +110,184 @@ tools = [{ To implement a basic function call, add the following code: -```python -# Initialize the OpenAI client -client = OpenAI( - base_url="https://api.scaleway.ai/v1", - api_key="" -) + + + ```python + # Initialize the OpenAI client + client = OpenAI( + base_url="https://api.scaleway.ai/v1", + api_key="" + ) + + # Create a simple query + messages = [ + { + "role": "system", + "content": "You are a helpful flight assistant." + }, + { + "role": "user", + "content": "What flights are available from CDG to LHR on November 1st, 2024?" + } + ] -# Create a simple query -messages = [ - { - "role": "system", - "content": "You are a helpful flight assistant." - }, - { - "role": "user", - "content": "What flights are available from CDG to LHR on November 1st, 2024?" - } -] + # Make the API call + response = client.chat.completions.create( + model="llama-3.1-70b-instruct", + messages=messages, + tools=tools, + tool_choice="auto" + ) -# Make the API call -response = client.chat.completions.create( - model="llama-3.1-70b-instruct", - messages=messages, - tools=tools, - tool_choice="auto" -) + print(response.choices[0].message.tool_calls) + ``` -print(response.choices[0].message.tool_calls) -``` + As the model detects properly that a tool call is required to answer the question, the output should be a list of tool calls specifying function names and parameter properties: + ```bash + [ChatCompletionMessageToolCall(id='chatcmpl-tool-81e63f4f496d429ba9ec6efcff6a86e1', function=Function(arguments='{"departure_airport": "CDG", "destination_airport": "LHR", "departure_date": "2024-11-01"}', name='get_flight_schedule'), type='function')] + ``` -As the model detects properly that a tool call is required to answer the question, the output should be a list of tool calls specifying function names and parameter properties: -```bash -[ChatCompletionMessageToolCall(id='chatcmpl-tool-81e63f4f496d429ba9ec6efcff6a86e1', function=Function(arguments='{"departure_airport": "CDG", "destination_airport": "LHR", "departure_date": "2024-11-01"}', name='get_flight_schedule'), type='function')] -``` + + The model automatically decides which functions to call. However, you can specify a particular function by using the `tool_choice` parameter. In the example above, you can replace `tool_choice=auto` with `tool_choice={"type": "function", "function": {"name": "get_flight_schedule"}}` to explicitly call the desired function. + - - The model automatically decides which functions to call. However, you can specify a particular function by using the `tool_choice` parameter. In the example above, you can replace `tool_choice=auto` with `tool_choice={"type": "function", "function": {"name": "get_flight_schedule"}}` to explicitly call the desired function. - + + Some models must be told they can use external functions in the system prompt. If you do not provide a system prompt when using tools, Scaleway will automatically add one that works best for that specific model. + + - - Some models must be told they can use external functions in the system prompt. If you do not provide a system prompt when using tools, Scaleway will automatically add one that works best for that specific model. - + -### Call the tool and provide a final answer + ```python + from openai import OpenAI -To provide the answer, or for more complex interactions, you will need to handle multiple turns of conversation: + # Initialize the OpenAI client + client = OpenAI( + base_url="https://api.scaleway.ai/v1", + api_key="" + ) -```python -# Process the tool call -if response.choices[0].message.tool_calls: - tool_call = response.choices[0].message.tool_calls[0] - - # Execute the function - if tool_call.function.name == "get_flight_schedule": - function_args = json.loads(tool_call.function.arguments) - function_response = get_flight_schedule(**function_args) - - # Add results to the conversation - messages.extend([ + # Create a simple query + response = client.responses.create( + model="gpt-oss-120b", + input=[ { - "role": "assistant", - "content": None, - "tool_calls": [tool_call] + "role": "system", + "content": "You are a helpful flight assistant." }, { - "role": "tool", - "name": tool_call.function.name, - "content": json.dumps(function_response), - "tool_call_id": tool_call.id + "role": "user", + "content": "What flights are available from CDG to LHR on November 1st, 2024?" } - ]) - - # Get final response - final_response = client.chat.completions.create( - model="llama-3.1-70b-instruct", - messages=messages + ], + tools=tools, + tool_choice="auto" + ) + + # Inspect tool calls + print(response.output[0].content[0].tool_calls) + ``` + + As the model detects properly that a tool call is required to answer the question, the output should be a list of tool calls specifying function names and parameter properties: + + ```bash + [ToolCall( + id='resp-tool-81e63f4f496d429ba9ec6efcff6a86e1', + type='function', + function=ToolCallFunction( + name='get_flight_schedule', + arguments='{"departure_airport": "CDG", "destination_airport": "LHR", "departure_date": "2024-11-01"}' ) - print(final_response.choices[0].message.content) -``` + )] + ``` + + + +### Call the tool and provide a final answer + +To provide the answer, or for more complex interactions, you will need to handle multiple turns of conversation: + + + + ```python + # Process the tool call + if response.choices[0].message.tool_calls: + tool_call = response.choices[0].message.tool_calls[0] + + # Execute the function + if tool_call.function.name == "get_flight_schedule": + function_args = json.loads(tool_call.function.arguments) + function_response = get_flight_schedule(**function_args) + + # Add results to the conversation + messages.extend([ + { + "role": "assistant", + "content": None, + "tool_calls": [tool_call] + }, + { + "role": "tool", + "name": tool_call.function.name, + "content": json.dumps(function_response), + "tool_call_id": tool_call.id + } + ]) + + # Get final response + final_response = client.chat.completions.create( + model="llama-3.1-70b-instruct", + messages=messages + ) + print(final_response.choices[0].message.content) + ``` + + + ```python + import json + from openai import OpenAI + + client = OpenAI() + + messages = [ + {"role": "user", "content": "What time is the next flight to Paris?"} + ] + + # First request + response = client.responses.create( + model="gpt-4.1", + input=messages + ) + + # Look for tool calls + tool_calls = [item for item in response.output if item.type == "tool_call"] + + if tool_calls: + tool_call = tool_calls[0] + + # Execute the function + if tool_call.function.name == "get_flight_schedule": + function_args = json.loads(tool_call.function.arguments) + function_response = get_flight_schedule(**function_args) + + # Provide tool output and get final model response + final_response = client.responses.create( + model="gpt-4.1", + input=messages + [ + response, # include assistant's tool request + { + "role": "tool", + "tool_call_id": tool_call.id, + "name": tool_call.function.name, + "content": json.dumps(function_response) + } + ] + ) + + print(final_response.output_text) + ``` + + ### Parallel function calling diff --git a/pages/generative-apis/how-to/use-structured-outputs.mdx b/pages/generative-apis/how-to/use-structured-outputs.mdx index c1c05fd6bf..e003df983f 100644 --- a/pages/generative-apis/how-to/use-structured-outputs.mdx +++ b/pages/generative-apis/how-to/use-structured-outputs.mdx @@ -3,22 +3,22 @@ title: How to use structured outputs description: Learn how to get consistent JSON format responses using Scaleway's Chat Completions API service. tags: chat-completions-api dates: - validation: 2025-05-12 + validation: 2025-08-22 posted: 2024-09-17 --- import Requirements from '@macros/iam/requirements.mdx' - +import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx' Structured outputs allow users to get consistent, machine-readable JSON format responses from language models. JSON, as a widely-used format, enables seamless integration with a variety of platforms and applications. Its interoperability is crucial for developers aiming to incorporate AI functionality into their current systems with minimal adjustments. -By specifying a response format when using the [Chat Completions API](/generative-apis/api-cli/using-chat-api/), you can ensure that responses are returned in a JSON structure. +By specifying a response format when using the Chat Completions API or Responses API, you can ensure that responses are returned in a JSON structure. There are two main modes for generating JSON: **Object Mode** (schemaless) and **Schema Mode** (deterministic, structured output). There are several ways to interact with language models: - The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-language-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time. -- Via the [Chat API](/generative-apis/how-to/query-language-models/#querying-language-models-via-api) +- Via the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) @@ -41,7 +41,7 @@ There are several ways to interact with language models: - JSON mode is older and has been used by developers since early API implementations, but lacks reliability in response formats. - - All LLMs in the Scaleway library support **Structured outputs** and **JSON mode**. However, a schemaless **JSON mode** will produce lower quality results and is not recommended. + - All LLMs in the Scaleway library support **Structured outputs** and **JSON mode**. However, a schemaless **JSON mode** will produce lower quality results and is not recommended. Note that structured output is more reliably validated and more richly parsed OKwith the Responses API. ## Code examples @@ -55,46 +55,94 @@ There are several ways to interact with language models: The following Python examples demonstrate how to use **Structured outputs** to generate structured responses. -We are using the base code below to send our LLM a voice note transcript to structure: - -```python -import json -from openai import OpenAI -from pydantic import BaseModel, Field - -# Set your preferred model -MODEL = "llama-3.1-8b-instruct" +We using the base code below to send our LLM a voice note transcript to structure: -# Set your API key -API_KEY = "" +### Defining the voice note and transcript -client = OpenAI( - base_url="https://api.scaleway.ai/v1", - api_key=API_KEY, -) - -# Define the schema for the output using Pydantic -class VoiceNote(BaseModel): - title: str = Field(description="A title for the voice note") - summary: str = Field(description="A short one sentence summary of the voice note.") - actionItems: list[str] = Field(description="A list of action items from the voice note") + +If you are going to use this base code with the Responses API, note that for now you must use `gpt-oss-120b` as the only supported model. + -# Transcript to use for the output -TRANSCRIPT = ( - "Good evening! It's 6:30 PM, and I'm just getting home from work. I have a few things to do " - "before I can relax. First, I'll need to water the plants in the garden since they've been in the sun all day. " - "Then, I'll start preparing dinner. I think a simple pasta dish with some garlic bread should be good. " - "While that's cooking, I'll catch up on a couple of phone calls I missed earlier." -) -``` + ```python + import json + from openai import OpenAI + from pydantic import BaseModel, Field + + # Set your preferred model + MODEL = "llama-3.1-8b-instruct" + + # Set your API key + API_KEY = "" + + client = OpenAI( + base_url="https://api.scaleway.ai/v1", + api_key=API_KEY, + ) + + # Define the schema for the output using Pydantic + class VoiceNote(BaseModel): + title: str = Field(description="A title for the voice note") + summary: str = Field(description="A short one sentence summary of the voice note.") + actionItems: list[str] = Field(description="A list of action items from the voice note") + + # Transcript to use for the output + TRANSCRIPT = ( + "Good evening! It's 6:30 PM, and I'm just getting home from work. I have a few things to do " + "before I can relax. First, I'll need to water the plants in the garden since they've been in the sun all day. " + "Then, I'll start preparing dinner. I think a simple pasta dish with some garlic bread should be good. " + "While that's cooking, I'll catch up on a couple of phone calls I missed earlier." + ) + ``` ### Using structured outputs with JSON schema (Pydantic) Using [Pydantic](https://docs.pydantic.dev/latest/concepts/models/), users can define the schema as a Python class and enforce the model to return results adhering to this schema. -```python -extract = client.chat.completions.create( - messages=[ + + + + ```python + extract = client.chat.completions.create( + messages=[ + { + "role": "system", + "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character.", + }, + { + "role": "user", + "content": TRANSCRIPT, + }, + ], + model=MODEL, + response_format={ + "type": "json_schema", + "json_schema": { + "name": "VoiceNote", + "schema": VoiceNote.model_json_schema(), + } + }, + ) + output = json.loads(extract.choices[0].message.content) + print(json.dumps(output, indent=2)) + ``` + + Output example: + ```json + { + "title": "To-Do List", + "summary": "Returning from work, need to complete tasks before relaxing", + "actionItems": [ + "Water garden", + "Prepare dinner: pasta dish with garlic bread", + "Catch up on missed phone calls" + ] + } + ``` + + + ```python + extract = client.responses.create( + input=[ { "role": "system", "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character.", @@ -102,33 +150,39 @@ extract = client.chat.completions.create( { "role": "user", "content": TRANSCRIPT, - }, + } ], model=MODEL, - response_format={ - "type": "json_schema", - "json_schema": { + text={ + "format": { + "type": "json_schema", "name": "VoiceNote", - "schema": VoiceNote.model_json_schema(), + "schema": VoiceNote.model_json_schema() } - }, -) -output = json.loads(extract.choices[0].message.content) -print(json.dumps(output, indent=2)) -``` - -Output example: -```json -{ - "title": "To-Do List", - "summary": "Returning from work, need to complete tasks before relaxing", - "actionItems": [ - "Water garden", - "Prepare dinner: pasta dish with garlic bread", - "Catch up on missed phone calls" - ] -} -``` + } + ) + + # Print the generated response. Here, the last output message will contain the final content. + # Previous outputs will contain reasoning content. + output = json.loads(extract.output[-1].content[0].text) + print(json.dumps(output, indent=2)) + ``` + + Output example: + ```json + { + "title": "To-Do List", + "summary": "Returning from work, need to complete tasks before relaxing", + "actionItems": [ + "Water garden", + "Prepare dinner: pasta dish with garlic bread", + "Catch up on missed phone calls" + ] + } + ``` + + + Structured outputs accuracy may vary between models. For instance, with Llama models, we suggest adding a description of the field looked for in `response_format` and in `system` or `user` messages. In our example this would mean adding a system prompt similar to: @@ -142,23 +196,76 @@ Output example: Alternatively, users can manually define the JSON schema inline when calling the model. -```python -extract = client.chat.completions.create( - messages=[ + + + ```python + extract = client.chat.completions.create( + messages=[ + { + "role": "system", + "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character.", + }, + { + "role": "user", + "content": TRANSCRIPT, + }, + ], + model=MODEL, + response_format={ + "type": "json_schema", + "json_schema": { + "name": "VoiceNote", + "schema": { + "type": "object", + "properties": { + "title": {"type": "string"}, + "summary": {"type": "string"}, + "actionItems": { + "type": "array", + "items": {"type": "string"} + } + }, + "additionalProperties": False, + "required": ["title", "summary", "actionItems"] + } + } + } + ) + output = json.loads(extract.choices[0].message.content) + print(json.dumps(output, indent=2)) + ``` + + Output example: + ```json + { + "title": "Evening Routine", + "actionItems": [ + "Water the plants", + "Cook dinner (pasta and garlic bread)", + "Make phone calls" + ], + "summary": "Made a list of tasks to accomplish before relaxing tonight" + } + ``` + + + ```python + extract = client.responses.create( + model=MODEL, + input=[ { "role": "system", - "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character.", + "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character." }, { "role": "user", - "content": TRANSCRIPT, - }, + "content": TRANSCRIPT + } ], - model=MODEL, response_format={ "type": "json_schema", "json_schema": { - "name": "VoiceNote", + "name": "VoiceNote", "schema": { "type": "object", "properties": { @@ -171,26 +278,30 @@ extract = client.chat.completions.create( }, "additionalProperties": False, "required": ["title", "summary", "actionItems"] + } } } + + ) + output = json.loads(extract.choices[0].message.content) + print(json.dumps(output, indent=2)) + ``` + + Output example: + ```json + { + "title": "Evening Routine", + "actionItems": [ + "Water the plants", + "Cook dinner (pasta and garlic bread)", + "Make phone calls" + ], + "summary": "Made a list of tasks to accomplish before relaxing tonight" } -) -output = json.loads(extract.choices[0].message.content) -print(json.dumps(output, indent=2)) -``` - -Output example: -```json -{ - "title": "Evening Routine", - "actionItems": [ - "Water the plants", - "Cook dinner (pasta and garlic bread)", - "Make phone calls" - ], - "summary": "Made a list of tasks to accomplish before relaxing tonight" -} -``` + ``` + + + When using the OpenAI SDKs like in the examples above, you are expected to set `additionalProperties` to false, and to specify all your properties as required. @@ -199,53 +310,100 @@ Output example: ### Using JSON mode (schemaless, Legacy method) - - When using the OpenAI SDKs as in the examples above, you are expected to set `additionalProperties` to false, and to specify all your properties as required. - - JSON mode: It is important to explicitly ask the model to generate a JSON output either in the system prompt or user prompt. To prevent infinite generations, model providers most often encourage users to ask the model for short JSON objects. Prompt example: `Only answer in JSON using '{' as the first character.`. + JSON mode: It is important to explicitly ask the model to generate a JSON output either in the system prompt or user prompt. To prevent infinite generations, model providers most often encourage users to ask the model for short JSON objects. Prompt example: `Only answer in JSON using '{' as the first character.`. In JSON mode, you can prompt the model to output a JSON object without enforcing a strict schema. -```python -extract = client.chat.completions.create( - messages=[ + + + ```python + extract = client.chat.completions.create( + messages=[ + { + "role": "system", + "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character.", + }, + { + "role": "user", + "content": TRANSCRIPT, + }, + ], + model=MODEL, + response_format={ + "type": "json_object", + }, + ) + output = json.loads(extract.choices[0].message.content) + print(json.dumps(output, indent=2)) + ``` + + Output example: + ```json + { + "current_time": "6:30 PM", + "tasks": [ { - "role": "system", - "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character.", + "task": "water the plants in the garden", + "priority": "high" }, { - "role": "user", - "content": TRANSCRIPT, + "task": "prepare dinner (pasta with garlic bread)", + "priority": "high" }, - ], - model=MODEL, - response_format={ - "type": "json_object", - }, -) -output = json.loads(extract.choices[0].message.content) -print(json.dumps(output, indent=2)) -``` - -Output example: -```json -{ - "current_time": "6:30 PM", - "tasks": [ - { - "task": "water the plants in the garden", - "priority": "high" - }, - { - "task": "prepare dinner (pasta with garlic bread)", - "priority": "high" - }, + { + "task": "catch up on phone calls", + "priority": "medium" + } + ] + } + ``` + + + ```python + extract = client.responses.create( + model=MODEL, + input=[ + { + "role": "system", + "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character." + }, + { + "role": "user", + "content": TRANSCRIPT + } + ], + response_format={ + "type": "json_object", + }, + ) + + output = json.loads(extract.choices[0].message.content) + print(json.dumps(output, indent=2)) + ``` + + Output example: + ```json { - "task": "catch up on phone calls", - "priority": "medium" + "current_time": "6:30 PM", + "tasks": [ + { + "task": "water the plants in the garden", + "priority": "high" + }, + { + "task": "prepare dinner (pasta with garlic bread)", + "priority": "high" + }, + { + "task": "catch up on phone calls", + "priority": "medium" + } + ] } - ] -} -``` + ``` + + ## Conclusion From 7bdddeb8a4edb9de4f2be6be7f6848f9a8c9080c Mon Sep 17 00:00:00 2001 From: Rowena Date: Mon, 25 Aug 2025 16:47:39 +0200 Subject: [PATCH 3/6] fix(ai): finish updating for responses --- macros/ai/chat-comp-vs-responses-api.mdx | 2 +- .../how-to/query-language-models.mdx | 12 +- .../how-to/query-vision-models.mdx | 180 +++-------- .../how-to/use-function-calling.mdx | 245 +++++---------- .../how-to/use-structured-outputs.mdx | 285 ++++++------------ .../openai-compatibility.mdx | 6 +- 6 files changed, 226 insertions(+), 504 deletions(-) diff --git a/macros/ai/chat-comp-vs-responses-api.mdx b/macros/ai/chat-comp-vs-responses-api.mdx index 817a59935b..1507a02487 100644 --- a/macros/ai/chat-comp-vs-responses-api.mdx +++ b/macros/ai/chat-comp-vs-responses-api.mdx @@ -6,6 +6,6 @@ Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/gener The **Chat Completions** API was released in 2023, and is an industry standard for building AI applications, being specifically designed for handling multi-turn conversations. It is stateless, but allows users to manage conversation history by appending each new message to the ongoing conversation. It supports `function` tool-calling, where the developer defines a set of functions, which the model can decide whether to call when generating a response. If it decides to call one of these functions, it returns the function name and arguments, and the developer's own code must actually execute the function and feed the result back into the conversation for use by the model. -The **Responses** API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response, though currently only `function` tools are supported by Scaleway. Overall, Scaleway's support for the Responses API is currently at beta stage. All supported Generative API models can be used with Responses API, and note that for the `gtp-oss-120b` model, only the Responses API will allow you to access all of its features. +The **Responses** API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response, though currently only `function` tools are supported by Scaleway. Overall, **Scaleway's support for the Responses API is currently at beta stage**. All supported Generative API models can be used with Responses API, and note that for the `gtp-oss-120b` model, only the Responses API will allow you to access all of its features. For full details on the differences between these APIs, see the [official OpenAI documentation](https://platform.openai.com/docs/guides/migrate-to-responses). \ No newline at end of file diff --git a/pages/generative-apis/how-to/query-language-models.mdx b/pages/generative-apis/how-to/query-language-models.mdx index 7dd88aae02..ecea416dc2 100644 --- a/pages/generative-apis/how-to/query-language-models.mdx +++ b/pages/generative-apis/how-to/query-language-models.mdx @@ -82,7 +82,7 @@ You can now create a chat completion using either the Chat Completions or Respon model="llama-3.1-8b-instruct", messages=[{"role": "user", "content": "Describe a futuristic city with advanced technology and green energy solutions."}], temperature=0.2, # Adjusts creativity - max_tokens=100, # Limits the length of the output + max_completion_tokens=100, # Limits the length of the output top_p=0.7 # Controls diversity through nucleus sampling. You usually only need to use temperature. ) @@ -90,7 +90,7 @@ You can now create a chat completion using either the Chat Completions or Respon print(response.choices[0].message.content) ``` - This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively. + This code sends a message to the model and returns an answer based on your input. The `temperature`, `max_completion_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively. @@ -129,6 +129,8 @@ A conversation style may include a default system prompt. You may set this promp ] ``` +Adding such a system prompt can also help resolve issues if you receive responses such as `I'm not sure what tools are available to me. Can you please provide a library of tools that I can use to generate a response?`. + ### Model parameters and their effects The following parameters will influence the output of the model: @@ -139,7 +141,7 @@ The following parameters will influence the output of the model: - **`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`. - **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative. - - **`max_tokens`**: The maximum number of tokens (words or parts of words) in the generated output. + - **`max_completion_tokens`**: The maximum number of tokens (words or parts of words) in the generated output. - **`top_p`**: Recommended for advanced use cases only. You usually only need to use temperature. `top_p` controls the diversity of the output, using nucleus sampling, where the model considers the tokens with top probabilities until the cumulative probability reaches `top_p`. - **`stop`**: A string or list of strings where the model will stop generating further tokens. This is useful for controlling the end of the output. @@ -210,7 +212,7 @@ The service also supports asynchronous mode for any chat completion. ) async def main(): - stream = await client.chat.completions.create( + stream = client.chat.completions.create( model="llama-3.1-8b-instruct", messages=[{ "role": "user", @@ -237,7 +239,7 @@ The service also supports asynchronous mode for any chat completion. async def main(): stream = await client.responses.create( - model="llama-3.1-8b-instruct", + model="gpt-oss-120b", input=[{ "role": "user", "content": "Sing me a song" diff --git a/pages/generative-apis/how-to/query-vision-models.mdx b/pages/generative-apis/how-to/query-vision-models.mdx index e15c7f59f0..7a88339b03 100644 --- a/pages/generative-apis/how-to/query-vision-models.mdx +++ b/pages/generative-apis/how-to/query-vision-models.mdx @@ -109,15 +109,10 @@ You can now create a chat completion: print(response.choices[0].message.content) ``` - + ```python from openai import OpenAI - # Initialize the client with your base URL and API key - client = OpenAI( - base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL - api_key="" # Your unique API secret key from Scaleway - ) # Create a chat completion using the 'mistral-small-3.2-24b-instruct-2506' model response = client.responses.create( model="mistral-small-3.2-24b-instruct-2506", @@ -169,7 +164,7 @@ To encode Base64 images in Python, you first need to install `Pillow` library: pip install pillow ``` -Then, the following Python code sample shows you how to encode an image in Base64 format and pass it to your request payload: +Then, the following Python code sample shows you how to encode an image in Base64 format and pass it to a request payload for the Chat Completions API: ```python import base64 @@ -207,9 +202,9 @@ payload = { ``` -### Model parameters and their effects +### Model parameters and their effects -The following parameters will influence the output of the model: +When using the Chat Completions API, the following parameters will influence the output of the model: - **`messages`**: A list of message objects that represent the conversation history. Each message should have a `role` (e.g., "system", "user", "assistant") and `content`. The content is an array that can contain text and/or image objects. - **`temperature`**: Controls the output's randomness. Lower values (e.g., 0.2) make the output more deterministic, while higher values (e.g., 0.8) make it more creative. @@ -225,142 +220,65 @@ The following parameters will influence the output of the model: By default, the outputs are returned to the client only after the generation process is complete. However, a common alternative is to stream the results back to the client as they are generated. This is particularly useful in chat applications, where it allows the client to view the results incrementally as each token is produced. -Examples are provided below: - - - - ```python - from openai import OpenAI - - client = OpenAI( - base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL - api_key="" # Your unique API key from Scaleway - ) - response = client.chat.completions.create( - model="pixtral-12b-2409", - messages=[{ - "role": "user", - "content": [ - {"type": "text", "text": "What is this image?"}, - {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, - ] - }], - stream=True, - ) - - for chunk in response: - if chunk.choices and chunk.choices[0].delta.content: - print(chunk.choices[0].delta.content, end="") - ``` - - +An example for the Chat Completions API is provided below: - ```python - from openai import OpenAI +```python +from openai import OpenAI - client = OpenAI( - base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL - api_key="" # Your unique API key from Scaleway - ) +client = OpenAI( + base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL + api_key="" # Your unique API key from Scaleway +) +response = client.chat.completions.create( +model="pixtral-12b-2409", +messages=[{ + "role": "user", + "content": [ + {"type": "text", "text": "What is this image?"}, + {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, + ] +}], +stream=True, +) - # Stream a response from the vision model - with client.responses.stream( - model="pixtral-12b-2409", - input=[ - { - "role": "user", - "content": [ - {"type": "text", "text": "What is this image?"}, - {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, - ] - } - ] - ) as stream: - for event in stream: - # Print incremental text as it arrives - if event.type == "response.output_text.delta": - print(event.delta, end="") - - # Optionally, get the final aggregated response - final_response = stream.get_final_response() - print("\nFinal output:\n", final_response.output_text) +for chunk in response: + if chunk.choices and chunk.choices[0].delta.content: + print(chunk.choices[0].delta.content, end="") ``` - - ## Async -The service also supports asynchronous mode for any chat completion. - - - - ```python +The service also supports asynchronous mode for any chat completion. An example for the Chat Completions API is provided below: - import asyncio - from openai import AsyncOpenAI - - client = AsyncOpenAI( - base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL - api_key="" # Your unique API key from Scaleway - ) +```python - async def main(): - stream = await client.chat.completions.create( - model="pixtral-12b-2409", - messages=[{ - "role": "user", - "content": [ - {"type": "text", "text": "What is this image?"}, - {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, - ] - }], - stream=True, - ) - async for chunk in stream: - if chunk.choices and chunk.choices[0].delta.content: - print(chunk.choices[0].delta.content, end="") - - asyncio.run(main()) - ``` - - - ```python - import asyncio - from openai import AsyncOpenAI +import asyncio +from openai import AsyncOpenAI - client = AsyncOpenAI( - base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL - api_key="" # Your unique API key from Scaleway - ) +client = AsyncOpenAI( + base_url="https://api.scaleway.ai/v1", # Scaleway's Generative APIs service URL + api_key="" # Your unique API key from Scaleway +) - async def main(): - # Stream a response from the vision model - async with client.responses.stream( - model="pixtral-12b-2409", - input=[ - { - "role": "user", - "content": [ - {"type": "text", "text": "What is this image?"}, - {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, - ] - } +async def main(): + stream = await client.chat.completions.create( + model="pixtral-12b-2409", + messages=[{ + "role": "user", + "content": [ + {"type": "text", "text": "What is this image?"}, + {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, ] - ) as stream: - async for event in stream: - # Print incremental text as it arrives - if event.type == "response.output_text.delta": - print(event.delta, end="") - - # Optionally, get the final aggregated response - final_response = await stream.get_final_response() - print("\nFinal output:\n", final_response.output_text) + }], + stream=True, + ) + async for chunk in stream: + if chunk.choices and chunk.choices[0].delta.content: + print(chunk.choices[0].delta.content, end="") - asyncio.run(main()) - ``` - - +asyncio.run(main()) +``` ## Frequently Asked Questions diff --git a/pages/generative-apis/how-to/use-function-calling.mdx b/pages/generative-apis/how-to/use-function-calling.mdx index 48ed2d6936..7a8e6b10dc 100644 --- a/pages/generative-apis/how-to/use-function-calling.mdx +++ b/pages/generative-apis/how-to/use-function-calling.mdx @@ -39,7 +39,7 @@ The workflow typically follows these steps: 4. Execute selected functions 5. Return results to model for final response -## Code examples +## Code example for Chat Completions API Before diving into the code examples, ensure you have the necessary libraries installed: @@ -48,7 +48,7 @@ The workflow typically follows these steps: ``` -We will demonstrate function calling using a flight scheduling system that allows users to check available flights between European airports. +We will demonstrate function calling with the Chat Completions API using a flight scheduling system that allows users to check available flights between European airports. ### Basic function definition @@ -110,184 +110,86 @@ tools = [{ To implement a basic function call, add the following code: - - - ```python - # Initialize the OpenAI client - client = OpenAI( - base_url="https://api.scaleway.ai/v1", - api_key="" - ) - - # Create a simple query - messages = [ - { - "role": "system", - "content": "You are a helpful flight assistant." - }, - { - "role": "user", - "content": "What flights are available from CDG to LHR on November 1st, 2024?" - } - ] - - # Make the API call - response = client.chat.completions.create( - model="llama-3.1-70b-instruct", - messages=messages, - tools=tools, - tool_choice="auto" - ) - - print(response.choices[0].message.tool_calls) - ``` - - As the model detects properly that a tool call is required to answer the question, the output should be a list of tool calls specifying function names and parameter properties: - ```bash - [ChatCompletionMessageToolCall(id='chatcmpl-tool-81e63f4f496d429ba9ec6efcff6a86e1', function=Function(arguments='{"departure_airport": "CDG", "destination_airport": "LHR", "departure_date": "2024-11-01"}', name='get_flight_schedule'), type='function')] - ``` - - - The model automatically decides which functions to call. However, you can specify a particular function by using the `tool_choice` parameter. In the example above, you can replace `tool_choice=auto` with `tool_choice={"type": "function", "function": {"name": "get_flight_schedule"}}` to explicitly call the desired function. - - - - Some models must be told they can use external functions in the system prompt. If you do not provide a system prompt when using tools, Scaleway will automatically add one that works best for that specific model. - - - - - - ```python - from openai import OpenAI +```python +# Initialize the OpenAI client +client = OpenAI( + base_url="https://api.scaleway.ai/v1", + api_key="" +) - # Initialize the OpenAI client - client = OpenAI( - base_url="https://api.scaleway.ai/v1", - api_key="" - ) +# Create a simple query +messages = [ + { + "role": "system", + "content": "You are a helpful flight assistant." + }, + { + "role": "user", + "content": "What flights are available from CDG to LHR on November 1st, 2024?" + } +] - # Create a simple query - response = client.responses.create( - model="gpt-oss-120b", - input=[ - { - "role": "system", - "content": "You are a helpful flight assistant." - }, - { - "role": "user", - "content": "What flights are available from CDG to LHR on November 1st, 2024?" - } - ], - tools=tools, - tool_choice="auto" - ) +# Make the API call +response = client.chat.completions.create( + model="llama-3.1-70b-instruct", + messages=messages, + tools=tools, + tool_choice="auto" +) - # Inspect tool calls - print(response.output[0].content[0].tool_calls) - ``` +print(response.choices[0].message.tool_calls) +``` - As the model detects properly that a tool call is required to answer the question, the output should be a list of tool calls specifying function names and parameter properties: +As the model detects properly that a tool call is required to answer the question, the output should be a list of tool calls specifying function names and parameter properties: +```bash +[ChatCompletionMessageToolCall(id='chatcmpl-tool-81e63f4f496d429ba9ec6efcff6a86e1', function=Function(arguments='{"departure_airport": "CDG", "destination_airport": "LHR", "departure_date": "2024-11-01"}', name='get_flight_schedule'), type='function')] +``` - ```bash - [ToolCall( - id='resp-tool-81e63f4f496d429ba9ec6efcff6a86e1', - type='function', - function=ToolCallFunction( - name='get_flight_schedule', - arguments='{"departure_airport": "CDG", "destination_airport": "LHR", "departure_date": "2024-11-01"}' - ) - )] - ``` - - + + The model automatically decides which functions to call. However, you can specify a particular function by using the `tool_choice` parameter. In the example above, you can replace `tool_choice=auto` with `tool_choice={"type": "function", "function": {"name": "get_flight_schedule"}}` to explicitly call the desired function. + + + Some models must be told they can use external functions in the system prompt. If you do not provide a system prompt when using tools, Scaleway will automatically add one that works best for that specific model. + + ### Call the tool and provide a final answer To provide the answer, or for more complex interactions, you will need to handle multiple turns of conversation: - - - ```python - # Process the tool call - if response.choices[0].message.tool_calls: - tool_call = response.choices[0].message.tool_calls[0] - - # Execute the function - if tool_call.function.name == "get_flight_schedule": - function_args = json.loads(tool_call.function.arguments) - function_response = get_flight_schedule(**function_args) - - # Add results to the conversation - messages.extend([ - { - "role": "assistant", - "content": None, - "tool_calls": [tool_call] - }, - { - "role": "tool", - "name": tool_call.function.name, - "content": json.dumps(function_response), - "tool_call_id": tool_call.id - } - ]) - - # Get final response - final_response = client.chat.completions.create( - model="llama-3.1-70b-instruct", - messages=messages - ) - print(final_response.choices[0].message.content) - ``` - - - ```python - import json - from openai import OpenAI - - client = OpenAI() - - messages = [ - {"role": "user", "content": "What time is the next flight to Paris?"} - ] - - # First request - response = client.responses.create( - model="gpt-4.1", - input=messages - ) - - # Look for tool calls - tool_calls = [item for item in response.output if item.type == "tool_call"] - - if tool_calls: - tool_call = tool_calls[0] - - # Execute the function - if tool_call.function.name == "get_flight_schedule": - function_args = json.loads(tool_call.function.arguments) - function_response = get_flight_schedule(**function_args) - - # Provide tool output and get final model response - final_response = client.responses.create( - model="gpt-4.1", - input=messages + [ - response, # include assistant's tool request - { - "role": "tool", - "tool_call_id": tool_call.id, - "name": tool_call.function.name, - "content": json.dumps(function_response) - } - ] - ) - print(final_response.output_text) - ``` - - +```python +# Process the tool call +if response.choices[0].message.tool_calls: +tool_call = response.choices[0].message.tool_calls[0] + +# Execute the function +if tool_call.function.name == "get_flight_schedule": + function_args = json.loads(tool_call.function.arguments) + function_response = get_flight_schedule(**function_args) + + # Add results to the conversation + messages.extend([ + { + "role": "assistant", + "content": None, + "tool_calls": [tool_call] + }, + { + "role": "tool", + "name": tool_call.function.name, + "content": json.dumps(function_response), + "tool_call_id": tool_call.id + } + ]) + + # Get final response + final_response = client.chat.completions.create( + model="llama-3.1-70b-instruct", + messages=messages + ) + print(final_response.choices[0].message.content) +``` ### Parallel function calling @@ -393,6 +295,11 @@ messages = [ ] ``` +## Code example for Responses API + +See the OpenAPI documentation for a fully worked example on [function calling using the Responses API](https://platform.openai.com/docs/guides/function-calling#function-tool-example). + + ## Best practices When implementing function calling, follow these guidelines for optimal results: diff --git a/pages/generative-apis/how-to/use-structured-outputs.mdx b/pages/generative-apis/how-to/use-structured-outputs.mdx index e003df983f..ee3b642e96 100644 --- a/pages/generative-apis/how-to/use-structured-outputs.mdx +++ b/pages/generative-apis/how-to/use-structured-outputs.mdx @@ -59,17 +59,13 @@ We using the base code below to send our LLM a voice note transcript to structur ### Defining the voice note and transcript - -If you are going to use this base code with the Responses API, note that for now you must use `gpt-oss-120b` as the only supported model. - - ```python import json from openai import OpenAI from pydantic import BaseModel, Field # Set your preferred model - MODEL = "llama-3.1-8b-instruct" + MODEL = "llama-3.1-8b-instruct" ## or "gpt-oss-120b" for the Responses API # Set your API key API_KEY = "" @@ -141,25 +137,26 @@ Using [Pydantic](https://docs.pydantic.dev/latest/concepts/models/), users can d ```python + extract = client.responses.create( - input=[ - { - "role": "system", - "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character.", - }, - { - "role": "user", - "content": TRANSCRIPT, - } - ], - model=MODEL, - text={ - "format": { - "type": "json_schema", - "name": "VoiceNote", - "schema": VoiceNote.model_json_schema() + input=[ + { + "role": "system", + "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character.", + }, + { + "role": "user", + "content": TRANSCRIPT, + } + ], + model=MODEL, + text={ + "format": { + "type": "json_schema", + "name": "VoiceNote", + "schema": VoiceNote.model_json_schema() + } } - } ) # Print the generated response. Here, the last output message will contain the final content. @@ -169,15 +166,16 @@ Using [Pydantic](https://docs.pydantic.dev/latest/concepts/models/), users can d ``` Output example: + ```json { - "title": "To-Do List", - "summary": "Returning from work, need to complete tasks before relaxing", - "actionItems": [ - "Water garden", - "Prepare dinner: pasta dish with garlic bread", - "Catch up on missed phone calls" - ] + "actionItems": [ + "Water the plants in the garden", + "Prepare a simple pasta dish with garlic bread", + "Catch up on missed phone calls while dinner is cooking" + ], + "summary": "The user plans to water plants, cook dinner, and make phone calls after arriving home at 6:30\u202fPM.", + "title": "Evening Tasks" } ``` @@ -194,78 +192,25 @@ Using [Pydantic](https://docs.pydantic.dev/latest/concepts/models/), users can d ### Using structured outputs with JSON schema (manual definition) -Alternatively, users can manually define the JSON schema inline when calling the model. - - - - ```python - extract = client.chat.completions.create( - messages=[ - { - "role": "system", - "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character.", - }, - { - "role": "user", - "content": TRANSCRIPT, - }, - ], - model=MODEL, - response_format={ - "type": "json_schema", - "json_schema": { - "name": "VoiceNote", - "schema": { - "type": "object", - "properties": { - "title": {"type": "string"}, - "summary": {"type": "string"}, - "actionItems": { - "type": "array", - "items": {"type": "string"} - } - }, - "additionalProperties": False, - "required": ["title", "summary", "actionItems"] - } - } - } - ) - output = json.loads(extract.choices[0].message.content) - print(json.dumps(output, indent=2)) - ``` +Alternatively, users can manually define the JSON schema inline when calling the model. See below an example for doing this with the Chat Completions API: - Output example: - ```json - { - "title": "Evening Routine", - "actionItems": [ - "Water the plants", - "Cook dinner (pasta and garlic bread)", - "Make phone calls" - ], - "summary": "Made a list of tasks to accomplish before relaxing tonight" - } - ``` - - - ```python - extract = client.responses.create( - model=MODEL, - input=[ +```python +extract = client.chat.completions.create( + messages=[ { "role": "system", - "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character." + "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character.", }, { "role": "user", - "content": TRANSCRIPT - } + "content": TRANSCRIPT, + }, ], + model=MODEL, response_format={ "type": "json_schema", "json_schema": { - "name": "VoiceNote", + "name": "VoiceNote", "schema": { "type": "object", "properties": { @@ -278,30 +223,26 @@ Alternatively, users can manually define the JSON schema inline when calling the }, "additionalProperties": False, "required": ["title", "summary", "actionItems"] - } } } - - ) - output = json.loads(extract.choices[0].message.content) - print(json.dumps(output, indent=2)) - ``` - - Output example: - ```json - { - "title": "Evening Routine", - "actionItems": [ - "Water the plants", - "Cook dinner (pasta and garlic bread)", - "Make phone calls" - ], - "summary": "Made a list of tasks to accomplish before relaxing tonight" } - ``` - - - +) +output = json.loads(extract.choices[0].message.content) +print(json.dumps(output, indent=2)) +``` + +Output example: +```json +{ +"title": "Evening Routine", +"actionItems": [ + "Water the plants", + "Cook dinner (pasta and garlic bread)", + "Make phone calls" +], +"summary": "Made a list of tasks to accomplish before relaxing tonight" +} +``` When using the OpenAI SDKs like in the examples above, you are expected to set `additionalProperties` to false, and to specify all your properties as required. @@ -313,97 +254,49 @@ Alternatively, users can manually define the JSON schema inline when calling the JSON mode: It is important to explicitly ask the model to generate a JSON output either in the system prompt or user prompt. To prevent infinite generations, model providers most often encourage users to ask the model for short JSON objects. Prompt example: `Only answer in JSON using '{' as the first character.`. -In JSON mode, you can prompt the model to output a JSON object without enforcing a strict schema. +In JSON mode, you can prompt the model to output a JSON object without enforcing a strict schema. See below an example for the Chat Completions API: - - - ```python - extract = client.chat.completions.create( - messages=[ - { - "role": "system", - "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character.", - }, - { - "role": "user", - "content": TRANSCRIPT, - }, - ], - model=MODEL, - response_format={ - "type": "json_object", - }, - ) - output = json.loads(extract.choices[0].message.content) - print(json.dumps(output, indent=2)) - ``` - - Output example: - ```json - { - "current_time": "6:30 PM", - "tasks": [ - { - "task": "water the plants in the garden", - "priority": "high" - }, +```python +extract = client.chat.completions.create( + messages=[ { - "task": "prepare dinner (pasta with garlic bread)", - "priority": "high" + "role": "system", + "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character.", }, { - "task": "catch up on phone calls", - "priority": "medium" - } - ] - } - ``` - - - ```python - extract = client.responses.create( - model=MODEL, - input=[ - { - "role": "system", - "content": "The following is a voice message transcript. Only answer in JSON using '{' as the first character." - }, - { - "role": "user", - "content": TRANSCRIPT - } - ], - response_format={ - "type": "json_object", + "role": "user", + "content": TRANSCRIPT, }, - ) - - output = json.loads(extract.choices[0].message.content) - print(json.dumps(output, indent=2)) - ``` - - Output example: - ```json + ], + model=MODEL, + response_format={ + "type": "json_object", + }, +) +output = json.loads(extract.choices[0].message.content) +print(json.dumps(output, indent=2)) +``` + +Output example: +```json +{ +"current_time": "6:30 PM", +"tasks": [ { - "current_time": "6:30 PM", - "tasks": [ - { - "task": "water the plants in the garden", - "priority": "high" - }, - { - "task": "prepare dinner (pasta with garlic bread)", - "priority": "high" - }, - { - "task": "catch up on phone calls", - "priority": "medium" - } - ] + "task": "water the plants in the garden", + "priority": "high" + }, + { + "task": "prepare dinner (pasta with garlic bread)", + "priority": "high" + }, + { + "task": "catch up on phone calls", + "priority": "medium" } - ``` - - +] +} +``` ## Conclusion diff --git a/pages/managed-inference/reference-content/openai-compatibility.mdx b/pages/managed-inference/reference-content/openai-compatibility.mdx index 95a572c3a2..d2a8b3e7b1 100644 --- a/pages/managed-inference/reference-content/openai-compatibility.mdx +++ b/pages/managed-inference/reference-content/openai-compatibility.mdx @@ -7,12 +7,14 @@ dates: posted: 2024-05-06 --- +import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx' + You can use any of the OpenAI [official libraries](https://platform.openai.com/docs/libraries/), for example, the [OpenAI Python client library](https://github.com/openai/openai-python) to interact with your Scaleway Managed Inference deployment. This feature is especially beneficial for those looking to seamlessly transition applications already utilizing OpenAI. -## Chat Completions API +### Chat Completions API or Responses API? -The Chat Completions API is designed for models fine-tuned for conversational tasks (such as X-chat and X-instruct variants). + ### CURL From 3325935db689f88c21f2142aff81e84219e59b19 Mon Sep 17 00:00:00 2001 From: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> Date: Wed, 27 Aug 2025 16:51:16 +0200 Subject: [PATCH 4/6] Apply suggestions from code review Co-authored-by: Guillaume Calmettes Co-authored-by: Jessica <113192637+jcirinosclwy@users.noreply.github.com> Co-authored-by: Benedikt Rollik --- macros/ai/chat-comp-vs-responses-api.mdx | 4 ++-- pages/generative-apis/how-to/query-language-models.mdx | 2 +- pages/generative-apis/how-to/query-vision-models.mdx | 2 +- pages/generative-apis/how-to/use-structured-outputs.mdx | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/macros/ai/chat-comp-vs-responses-api.mdx b/macros/ai/chat-comp-vs-responses-api.mdx index 1507a02487..09998c21fe 100644 --- a/macros/ai/chat-comp-vs-responses-api.mdx +++ b/macros/ai/chat-comp-vs-responses-api.mdx @@ -4,8 +4,8 @@ macro: chat-comp-vs-responses-api Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) are OpenAI-compatible REST APIs that can be used for generating and manipulating conversations. The Chat Completions API is focused on generating conversational responses, while the Responses API is a more general REST API for chat, structured outputs, tool use, and multimodal inputs. -The **Chat Completions** API was released in 2023, and is an industry standard for building AI applications, being specifically designed for handling multi-turn conversations. It is stateless, but allows users to manage conversation history by appending each new message to the ongoing conversation. It supports `function` tool-calling, where the developer defines a set of functions, which the model can decide whether to call when generating a response. If it decides to call one of these functions, it returns the function name and arguments, and the developer's own code must actually execute the function and feed the result back into the conversation for use by the model. +The **Chat Completions** API was released in 2023, and is an industry standard for building AI applications, being specifically designed for handling multi-turn conversations. It is stateless, but allows users to manage conversation history by appending each new message to the ongoing conversation. Messages in the conversation can include text, images and audio extracts. The API also supports `function` tool-calling, where the developer defines a set of functions, which the model can decide whether to call when generating a response. If it decides to call one of these functions, it returns the function name and arguments, and the developer's own code must actually execute the function and feed the result back into the conversation for use by the model. -The **Responses** API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response, though currently only `function` tools are supported by Scaleway. Overall, **Scaleway's support for the Responses API is currently at beta stage**. All supported Generative API models can be used with Responses API, and note that for the `gtp-oss-120b` model, only the Responses API will allow you to access all of its features. +The **Responses** API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response, though currently only `function` tools are supported by Scaleway. Overall, **Scaleway's support for the Responses API is currently at beta stage and the support of the full features set will be incremental**. Most of the supported Generative API models can be used with Responses API, and note that for the **`gtp-oss-120b` model, the use of the Responses API is recommended** as it will allow you to access all of its features, especially tools calling. For full details on the differences between these APIs, see the [official OpenAI documentation](https://platform.openai.com/docs/guides/migrate-to-responses). \ No newline at end of file diff --git a/pages/generative-apis/how-to/query-language-models.mdx b/pages/generative-apis/how-to/query-language-models.mdx index ecea416dc2..ea5239835e 100644 --- a/pages/generative-apis/how-to/query-language-models.mdx +++ b/pages/generative-apis/how-to/query-language-models.mdx @@ -70,7 +70,7 @@ client = OpenAI( ### Generating a chat completion -You can now create a chat completion using either the Chat Completions or Responses API, as shown in the following examples: , +You can now create a chat completion using either the Chat Completions or Responses API, as shown in the following examples: diff --git a/pages/generative-apis/how-to/query-vision-models.mdx b/pages/generative-apis/how-to/query-vision-models.mdx index 7a88339b03..2dfbd0f266 100644 --- a/pages/generative-apis/how-to/query-vision-models.mdx +++ b/pages/generative-apis/how-to/query-vision-models.mdx @@ -48,7 +48,7 @@ You can query vision models programmatically using your favorite tools or langua Vision models take both text and images as inputs. -In the example that follows,we will use the OpenAI Python client. +In the example that follows, we will use the OpenAI Python client. Unlike traditional language models, vision models will take a content array for the user role, structuring text and images as inputs. diff --git a/pages/generative-apis/how-to/use-structured-outputs.mdx b/pages/generative-apis/how-to/use-structured-outputs.mdx index ee3b642e96..a716fa7292 100644 --- a/pages/generative-apis/how-to/use-structured-outputs.mdx +++ b/pages/generative-apis/how-to/use-structured-outputs.mdx @@ -41,7 +41,7 @@ There are several ways to interact with language models: - JSON mode is older and has been used by developers since early API implementations, but lacks reliability in response formats. - - All LLMs in the Scaleway library support **Structured outputs** and **JSON mode**. However, a schemaless **JSON mode** will produce lower quality results and is not recommended. Note that structured output is more reliably validated and more richly parsed OKwith the Responses API. + - All LLMs in the Scaleway library support **Structured outputs** and **JSON mode**. However, a schemaless **JSON mode** will produce lower quality results and is not recommended. Note that structured output is more reliably validated and more richly parsed with the Responses API. ## Code examples From 1eed4506922078c883fc396107c69979d7eb660c Mon Sep 17 00:00:00 2001 From: Rowena Date: Wed, 27 Aug 2025 17:18:53 +0200 Subject: [PATCH 5/6] fix(responses): remove from vision --- macros/ai/chat-comp-vs-responses-api.mdx | 10 ++- .../how-to/query-vision-models.mdx | 83 +++++-------------- 2 files changed, 30 insertions(+), 63 deletions(-) diff --git a/macros/ai/chat-comp-vs-responses-api.mdx b/macros/ai/chat-comp-vs-responses-api.mdx index 09998c21fe..01c1cd7837 100644 --- a/macros/ai/chat-comp-vs-responses-api.mdx +++ b/macros/ai/chat-comp-vs-responses-api.mdx @@ -4,8 +4,14 @@ macro: chat-comp-vs-responses-api Both the [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) and the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) are OpenAI-compatible REST APIs that can be used for generating and manipulating conversations. The Chat Completions API is focused on generating conversational responses, while the Responses API is a more general REST API for chat, structured outputs, tool use, and multimodal inputs. -The **Chat Completions** API was released in 2023, and is an industry standard for building AI applications, being specifically designed for handling multi-turn conversations. It is stateless, but allows users to manage conversation history by appending each new message to the ongoing conversation. Messages in the conversation can include text, images and audio extracts. The API also supports `function` tool-calling, where the developer defines a set of functions, which the model can decide whether to call when generating a response. If it decides to call one of these functions, it returns the function name and arguments, and the developer's own code must actually execute the function and feed the result back into the conversation for use by the model. +The **Chat Completions** API was released in 2023, and is an industry standard for building AI applications, being specifically designed for handling multi-turn conversations. It is stateless, but allows users to manage conversation history by appending each new message to the ongoing conversation. Messages in the conversation can include text, images and audio extracts. The API supports `function` tool-calling, allowing developers to define functions that the model can choose to call. If it does so, it returns the function name and arguments, which the developer's code must execute and feed back into the conversation. -The **Responses** API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response, though currently only `function` tools are supported by Scaleway. Overall, **Scaleway's support for the Responses API is currently at beta stage and the support of the full features set will be incremental**. Most of the supported Generative API models can be used with Responses API, and note that for the **`gtp-oss-120b` model, the use of the Responses API is recommended** as it will allow you to access all of its features, especially tools calling. +The **Responses** API was released in 2025, and is designed to combine the simplicity of Chat Completions with the ability to do more agentic tasks and reasoning. It supports statefulness, being able to maintain context without needing to resend the entire conversation history. It offers tool-calling by built-in tools (e.g. web or file search) that the model is able to execute itself while generating a response. + + +Scaleway's support for the Responses API is currently at beta stage. Support of the full feature set will be incremental: currently statefulness and tools other than `function` calling are not supported. + + +Most supported Generative API models can be used with both Chat Completions and Responses API. For the **`gtp-oss-120b` model, use of the Responses API is recommended, as it will allow you to access all of its features, especially tool-calling. For full details on the differences between these APIs, see the [official OpenAI documentation](https://platform.openai.com/docs/guides/migrate-to-responses). \ No newline at end of file diff --git a/pages/generative-apis/how-to/query-vision-models.mdx b/pages/generative-apis/how-to/query-vision-models.mdx index 2dfbd0f266..1bac3e7330 100644 --- a/pages/generative-apis/how-to/query-vision-models.mdx +++ b/pages/generative-apis/how-to/query-vision-models.mdx @@ -7,8 +7,6 @@ dates: posted: 2024-10-30 --- import Requirements from '@macros/iam/requirements.mdx' -import ChatCompVsResponsesApi from '@macros/ai/chat-comp-vs-responses-api.mdx' - Scaleway's Generative APIs service allows users to interact with powerful vision models hosted on the platform. @@ -18,7 +16,7 @@ Scaleway's Generative APIs service allows users to interact with powerful vision There are several ways to interact with vision models: - The Scaleway [console](https://console.scaleway.com) provides a complete [playground](/generative-apis/how-to/query-vision-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time. -- [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion) or the [Responses API](https://www.scaleway.com/en/developers/api/generative-apis/#path-responses-beta-create-a-response) +- The [Chat Completions API](https://www.scaleway.com/en/developers/api/generative-apis/#path-chat-completions-create-a-chat-completion). @@ -54,10 +52,6 @@ In the example that follows, we will use the OpenAI Python client. Unlike traditional language models, vision models will take a content array for the user role, structuring text and images as inputs. -### Chat Completions API or Responses API? - - - ### Installing the OpenAI SDK Install the OpenAI SDK using pip: @@ -84,61 +78,28 @@ client = OpenAI( You can now create a chat completion: - - - ```python - # Create a chat completion using the 'pixtral-12b-2409' model - response = client.chat.completions.create( - model="pixtral-12b-2409", - messages=[ - { - "role": "user", - "content": [ - {"type": "text", "text": "What is this image?"}, - {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, - ] # Vision models will take a content array with text and image_url objects. - - } - ], - temperature=0.7, # Adjusts creativity - max_tokens=2048, # Limits the length of the output - top_p=0.9 # Controls diversity through nucleus sampling. You usually only need to use temperature. - ) - - # Print the generated response - print(response.choices[0].message.content) - ``` - - - ```python - from openai import OpenAI - - # Create a chat completion using the 'mistral-small-3.2-24b-instruct-2506' model - response = client.responses.create( - model="mistral-small-3.2-24b-instruct-2506", - input=[ - { - "role": "user", - "content": [ - {"type": "input_text", "text": "What is this image?"}, - {"type": "input_image", - "image_url": "https://picsum.photos/id/32/512/512", - "detail": "auto"} - ] # Vision models will take a content array with text and image_url objects. - - } - ], - temperature=0.7, # Adjusts creativity - max_output_tokens=2048, # Limits the length of the output - top_p=0.9 # Controls diversity through nucleus sampling. You usually only need to use temperature. - ) +```python +# Create a chat completion using the 'pixtral-12b-2409' model +response = client.chat.completions.create( + model="pixtral-12b-2409", + messages=[ + { + "role": "user", + "content": [ + {"type": "text", "text": "What is this image?"}, + {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}}, + ] # Vision models will take a content array with text and image_url objects. + + } + ], + temperature=0.7, # Adjusts creativity + max_tokens=2048, # Limits the length of the output + top_p=0.9 # Controls diversity through nucleus sampling. You usually only need to use temperature. +) - # Print the generated response. Here, the last output message will contain the final content. - # Previous outputs will contain reasoning content. - print(response.output[-1].content[0].text) - ``` - - +# Print the generated response +print(response.choices[0].message.content) +``` This code sends messages, prompts and images, to the vision model and returns an answer based on your input. The `temperature`, `max_tokens`, and `top_p` parameters control the response's creativity, length, and diversity, respectively. From fe9ab33cecc16b16ac51f982b42780c18366c9b6 Mon Sep 17 00:00:00 2001 From: Rowena Date: Wed, 27 Aug 2025 17:53:35 +0200 Subject: [PATCH 6/6] fix(ai): final corrections --- pages/generative-apis/how-to/use-function-calling.mdx | 2 +- pages/generative-apis/how-to/use-structured-outputs.mdx | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/pages/generative-apis/how-to/use-function-calling.mdx b/pages/generative-apis/how-to/use-function-calling.mdx index 7a8e6b10dc..f62b7ce712 100644 --- a/pages/generative-apis/how-to/use-function-calling.mdx +++ b/pages/generative-apis/how-to/use-function-calling.mdx @@ -297,7 +297,7 @@ messages = [ ## Code example for Responses API -See the OpenAPI documentation for a fully worked example on [function calling using the Responses API](https://platform.openai.com/docs/guides/function-calling#function-tool-example). +See the OpenAPI documentation for a fully worked example on [function calling using the Responses API](https://platform.openai.com/docs/guides/function-calling#function-tool-example). Note that Scaleway's support of the Responses API is currently at beta stage - [find out more](/generative-apis/how-to/query-language-models/#chat-completions-api-or-responses-api). ## Best practices diff --git a/pages/generative-apis/how-to/use-structured-outputs.mdx b/pages/generative-apis/how-to/use-structured-outputs.mdx index a716fa7292..7868f3d858 100644 --- a/pages/generative-apis/how-to/use-structured-outputs.mdx +++ b/pages/generative-apis/how-to/use-structured-outputs.mdx @@ -44,6 +44,10 @@ There are several ways to interact with language models: - All LLMs in the Scaleway library support **Structured outputs** and **JSON mode**. However, a schemaless **JSON mode** will produce lower quality results and is not recommended. Note that structured output is more reliably validated and more richly parsed with the Responses API. +## Chat Completions API or Responses API? + + + ## Code examples