feat(ai): add structured outputs (#3713)

bene2k1 · web-flow · commit 277b35cabb3e · 2024-09-18T11:02:00.000+02:00
diff --git a/ai-data/generative-apis/api-cli/using-chat-api.mdx b/ai-data/generative-apis/api-cli/using-chat-api.mdx
@@ -68,13 +68,13 @@ Our chat API is OpenAI compatible. Use OpenAI’s [API reference](https://platfo
 - max_tokens
 - stream
 - presence_penalty
+- response_format
 - logprobs
 - stop
 - seed
 
 ### Unsupported parameters
 
-- response_format
 - frequency_penalty
 - n
 - top_logprobs
diff --git a/ai-data/generative-apis/api-cli/using-generative-apis.mdx b/ai-data/generative-apis/api-cli/using-generative-apis.mdx
@@ -13,7 +13,7 @@ dates:
 
 ## Access
 
-- Access to this service is restricted while in beta. You can request access to the product by filling out a form on the Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-api).
+- Access to this service is restricted while in beta. You can request access to the product by filling out a form on the Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-apis).
 - A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) is needed.
 
 ## Authentication
diff --git a/ai-data/generative-apis/concepts.mdx b/ai-data/generative-apis/concepts.mdx
@@ -36,6 +36,12 @@ Parameters are settings that control the behavior and performance of generative
 
 The inter-token latency (ITL) corresponds to the average time elapsed between two generated tokens. It is usually expressed in milliseconds.
 
+## JSON mode
+
+JSON mode allows you to guide the language model in outputting well-structured JSON data.
+To activate JSON mode, provide the `response_format` parameter with `{"type": "json_object"}`.
+JSON mode is useful for applications like chatbots or APIs, where a machine-readable format is essential for easy processing.
+
 ## Prompt Engineering
 
 Prompt engineering involves crafting specific and well-structured inputs (prompts) to guide the model towards generating the desired output. Effective prompt design is crucial for generating relevant responses, particularly in complex or creative tasks. It often requires experimentation to find the right balance between specificity and flexibility.
@@ -52,6 +58,12 @@ Stop words are a parameter set to tell the model to stop generating further toke
 
 Streaming is a parameter allowing responses to be delivered in real-time, showing parts of the output as they are generated rather than waiting for the full response. Scaleway is following the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events) standard. This behavior usually enhances user experience by providing immediate feedback and a more interactive conversation.
 
+## Structured outputs
+
+Structured outputs enable you to format the model's responses to suit specific use cases. To activate structured outputs, provide the `response_format` parameter with `"type": "json_schema"` and define its `"json_schema": {}`.
+By customizing the structure, such as using lists, tables, or key-value pairs, you ensure that the data returned is in a form that is easy to extract and process.
+By specifying the expected response format through the API, you can make the model consistently deliver the output your system requires.
+
 ## Temperature
 
 Temperature is a parameter that controls the randomness of the model's output during text generation. A higher temperature produces more creative and diverse outputs, while a lower temperature makes the model's responses more deterministic and focused. Adjusting the temperature allows users to balance creativity with coherence in the generated text.
diff --git a/ai-data/generative-apis/how-to/query-embedding-models.mdx b/ai-data/generative-apis/how-to/query-embedding-models.mdx
@@ -18,7 +18,7 @@ The embedding service is OpenAI compatible. Refer to OpenAI's [embedding documen
 
 <Macro id="requirements" />
 
-- Access to this service is restricted while in beta. You can request access to the product by filling out a form on the Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-api).
+- Access to this service is restricted while in beta. You can request access to the product by filling out a form on Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-apis).
 - A Scaleway account logged into the [console](https://console.scaleway.com)
 - [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization
 - A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) for API authentication
diff --git a/ai-data/generative-apis/how-to/query-text-models.mdx b/ai-data/generative-apis/how-to/query-text-models.mdx
@@ -19,7 +19,7 @@ There are several ways to interact with text models:
 
 <Macro id="requirements" />
 
-- Access to this service is restricted while in beta. You can request access to the product by filling out a form on the Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-api).
+- Access to this service is restricted while in beta. You can request access to the product by filling out a form on Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-apis).
 - A Scaleway account logged into the [console](https://console.scaleway.com)
 - [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization
 - A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) for API authentication
diff --git a/ai-data/generative-apis/how-to/use-structured-outputs.mdx b/ai-data/generative-apis/how-to/use-structured-outputs.mdx
@@ -0,0 +1,246 @@
+---
+meta:
+  title:  How to use structured outputs
+  description: Learn how to interact with structured outputs using Scaleway's Chat Completions API service.
+content:
+  h1:  How to use structured outputs
+  paragraph: Learn how to interact with powerful text models using Scaleway's Chat Completions API service.
+tags: chat-completitions-api
+dates:
+  validation: 2024-09-17
+  posted: 2024-09-17
+---
+
+
+Structured outputs allow users to get consistent, machine-readable JSON format responses from language models.
+JSON, as a widely-used format, enables seamless integration with a variety of platforms and applications. Its interoperability is crucial for developers aiming to incorporate AI functionality into their current systems with minimal adjustments.
+
+By specifying a response format when using the [Chat Completions API](/ai-data/generative-apis/api-cli/using-chat-api/), you can ensure that responses are returned in a JSON structure.
+There are two main modes for generating JSON: **Object Mode** (schemaless) and **Schema Mode** (deterministic, structured output).
+
+You can interact with text models in several ways:
+- Via the Scaleway [console](https://console.scaleway.com), which will soon provide a complete [playground](/ai-data/generative-apis/how-to/query-text-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
+- Via the [Chat API](/ai-data/generative-apis/how-to/query-text-models/#querying-text-models-via-api)
+
+<Macro id="requirements" />
+
+- Access to Generative APIs.
+    While in beta, the service is restricted to invited users. You can request access by filling out a form on Scaleway's [Betas page](https://www.scaleway.com/en/betas/#generative-apis).
+- A Scaleway account logged into the [console](https://console.scaleway.com)
+- [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization
+- A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) for API authentication
+- Python 3.7+ installed on your system
+
+## Types of structured outputs
+
+- **JSON mode** (schemaless):
+   - Type: `{"type": "json_object"}`
+   - This mode is non-deterministic and allows the model to output a JSON object without strict validation.
+   - Useful for flexible outputs when you expect the model to infer a reasonable structure based on your prompt.
+   - JSON mode is older and has been used by developers since early API implementations.
+
+- **Structured outputs (schema mode)** (deterministic/structured):
+   - Type `{"type": "json_schema"}`
+   - This mode enforces a strict schema format, where the output adheres to the predefined structure.
+   - Supports complex types and validation mechanisms as per the [JSON schema specification](https://json-schema.org/specification/).
+   - Structured outputs is a newer feature implemented by OpenAI in 2024 to enable stricter, schema-based response formatting.
+
+<Message type="note">
+    - All LLMs on the Scaleway library support **JSON mode** and **Structured outputs**, however, the quality of results will vary in the schemaless JSON mode.
+    - JSON mode: It is important to explicitly ask the model to generate a JSON output either in system prompt or user prompt. To prevent infinite generations, model providers most often encourage to ask the model for short JSON objects.
+    - Structured outputs: Scaleway supports the [JSON schema specification](https://json-schema.org/specification/) including nested schemas composition (`anyOf`, `allOf`, `oneOf` etc), `$ref`, `all` types, and regular expressions.
+</Message>
+
+## Code examples
+
+<Message type="tip">
+    Before diving into the code examples, ensure you have the necessary libraries installed:
+    ```bash
+    pip install openai pydantic
+    ```
+</Message>
+
+The following Python examples demonstrate how to use both **JSON mode** and **Structured outputs** to generate structured responses.
+
+We will send to our LLM a voice note transcript in order to structure it.
+Below is our base code:
+
+```python
+import json
+from openai import OpenAI
+from pydantic import BaseModel, Field
+
+# Set your preferred model
+MODEL = "llama-3.1-8b-instruct"
+
+# Set your API key
+API_KEY = "<SCW_API_KEY>"
+
+client = OpenAI(
+    base_url="https://api.scaleway.ai/v1",
+    api_key=API_KEY,
+)
+
+# Define the schema for the output using Pydantic
+class VoiceNote(BaseModel):
+    title: str = Field(description="A title for the voice note")
+    summary: str = Field(description="A short one sentence summary of the voice note.")
+    actionItems: list[str] = Field(description="A list of action items from the voice note")
+
+# Transcript to use for the output
+TRANSCRIPT = ( 
+    "Good evening! It's 6:30 PM, and I'm just getting home from work. I have a few things to do " 
+    "before I can relax. First, I'll need to water the plants in the garden since they've been in the sun all day. " 
+    "Then, I'll start preparing dinner. I think a simple pasta dish with some garlic bread should be good. " 
+    "While that's cooking, I'll catch up on a couple of phone calls I missed earlier."
+)
+```
+
+### Using JSON mode (schemaless)
+
+In JSON mode, you can prompt the model to output a JSON object without enforcing a strict schema.
+
+```python
+extract = client.chat.completions.create(
+    messages=[
+        {
+            "role": "system",
+            "content": "The following is a voice message transcript. Only answer in JSON.",
+        },
+        {
+            "role": "user",
+            "content": TRANSCRIPT,
+        },
+    ],
+    model=MODEL,
+    response_format={
+        "type": "json_object",
+    },
+)
+output = json.loads(extract.choices[0].message.content)
+print(json.dumps(output, indent=2))
+```
+
+Output example:
+```json
+{
+  "current_time": "6:30 PM",
+  "tasks": [
+    {
+      "task": "water the plants in the garden",
+      "priority": "high"
+    },
+    {
+      "task": "prepare dinner (pasta with garlic bread)",
+      "priority": "high"
+    },
+    {
+      "task": "catch up on phone calls",
+      "priority": "medium"
+    }
+  ]
+}
+```
+
+### Using structured outputs with JSON schema (Pydantic)
+
+Using [Pydantic](https://docs.pydantic.dev/latest/concepts/models/), users can define the schema as a Python class and enforce the model to return results adhering to this schema.
+
+```python
+extract = client.chat.completions.create(
+    messages=[
+        {
+            "role": "system",
+            "content": "The following is a voice message transcript. Only answer in JSON.",
+        },
+        {
+            "role": "user",
+            "content": TRANSCRIPT,
+        },
+    ],
+    model=MODEL,
+    response_format={
+        "type": "json_schema",
+        "json_schema": {
+            "schema": VoiceNote.model_json_schema(),
+        }
+    },
+)
+output = json.loads(extract.choices[0].message.content)
+print(json.dumps(output, indent=2))
+```
+
+Output example:
+```json
+{
+  "title": "To-Do List",
+  "summary": "Returning from work, need to complete tasks before relaxing",
+  "actionItems": [
+    "Water garden",
+    "Prepare dinner: pasta dish with garlic bread",
+    "Catch up on missed phone calls"
+  ]
+}
+```
+
+### Using structured outputs with JSON schema (manual definition)
+
+Alternatively, users can manually define the JSON schema inline when calling the model.
+
+```python
+extract = client.chat.completions.create(
+    messages=[
+        {
+            "role": "system",
+            "content": "The following is a voice message transcript. Only answer in JSON.",
+        },
+        {
+            "role": "user",
+            "content": TRANSCRIPT,
+        },
+    ],
+    model=MODEL,
+    response_format={
+        "type": "json_schema",
+        "json_schema": {
+            "schema": {
+                "type": "object",
+                "properties": {
+                    "title": {"type": "string"},
+                    "summary": {"type": "string"},
+                    "actionItems": {
+                        "type": "array",
+                        "items": {"type": "string"}
+                    }
+                },
+                "required": ["title", "summary", "actionItems"]
+            }
+        }
+    }
+)
+output = json.loads(extract.choices[0].message.content)
+print(json.dumps(output, indent=2))
+```
+
+Output example:
+```json
+{
+  "title": "Evening Routine",
+  "actionItems": [
+    "Water the plants",
+    "Cook dinner (pasta and garlic bread)",
+    "Make phone calls"
+  ],
+  "summary": "Made a list of tasks to accomplish before relaxing tonight"
+}
+```
+
+## Conclusion
+
+Using structured outputs with LLMs can significantly enhance data handling in your applications.
+By choosing between JSON mode and Structured outputs with JSON schema, you control the consistency and structure of the model's responses to suit your specific needs.
+
+- **JSON mode** is flexible but less predictable.
+- **Structured outputs** provide strict adherence to a predefined schema, ensuring consistency.
+
+Experiment with both methods to determine which best fits your application's requirements.
diff --git a/ai-data/generative-apis/quickstart.mdx b/ai-data/generative-apis/quickstart.mdx
@@ -24,7 +24,7 @@ Hosted in European data centers and priced competitively per million tokens used
 
 <Macro id="requirements" />
 
-  - Access to this service is restricted while in beta. You can request access to the product by filling out a form on the Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-api).
+  - Access to this service is restricted while in beta. You can request access to the product by filling out a form on the Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-apis).
   - A Scaleway account logged into the [console](https://console.scaleway.com)
   - [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization
   - A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/)
diff --git a/ai-data/managed-inference/concepts.mdx b/ai-data/managed-inference/concepts.mdx
@@ -51,6 +51,12 @@ Hallucinations in LLMs refer to instances where generative AI models generate re
 Inference is the process of deriving logical conclusions or predictions from available data. This concept involves using statistical methods, machine learning algorithms, and reasoning techniques to make decisions or draw insights based on observed patterns or evidence.
 Inference is fundamental in various AI applications, including natural language processing, image recognition, and autonomous systems.
 
+## JSON mode
+
+JSON mode allows you to guide the language model in outputting well-structured JSON data.
+To activate JSON mode, provide the `response_format` parameter with `{"type": "json_object"}`.
+JSON mode is useful for applications like chatbots or APIs, where a machine-readable format is essential for easy processing.
+
 ## Large Language Model Applications
 
 LLM Applications are applications or software tools that leverage the capabilities of LLMs for various tasks, such as text generation, summarization, or translation. These apps provide user-friendly interfaces for interacting with the models and accessing their functionalities.
@@ -74,4 +80,11 @@ LLMs provided for deployment are named with suffixes that denote their quantizat
 
 ## Retrieval Augmented Generation (RAG)
 
-RAG is an architecture combining information retrieval elements with language generation to enhance the capabilities of LLMs. It involves retrieving relevant context or knowledge from external sources and incorporating it into the generation process to produce more informative and contextually grounded outputs.
+RAG is an architecture combining information retrieval elements with language generation to enhance the capabilities of LLMs. It involves retrieving relevant context or knowledge from external sources and incorporating it into the generation process to produce more informative and contextually grounded outputs.
+
+## Structured outputs
+
+Structured outputs enable you to format the model's responses to suit specific use cases. To activate structured outputs, provide the `response_format` parameter with `"type": "json_schema"` and define its `"json_schema": {}`.
+By customizing the structure, such as using lists, tables, or key-value pairs, you ensure that the data returned is in a form that is easy to extract and process.
+By specifying the expected response format through the API, you can make the model consistently deliver the output your system requires.
+
diff --git a/ai-data/managed-inference/reference-content/openai-compatibility.mdx b/ai-data/managed-inference/reference-content/openai-compatibility.mdx
@@ -66,6 +66,7 @@ print(chat_completion.choices[0].message.content)
 - `temperature` (default 0.7)
 - `top_p` (default 1)
 - `presence_penalty`
+- `response_format`
 - `logprobs`
 - `stop`
 - `seed`
diff --git a/menu/navigation.json b/menu/navigation.json
@@ -658,6 +658,10 @@
                   {
                     "label": "Query embedding models",
                     "slug": "query-embedding-models"
+                  },
+                  {
+                    "label": "Use structured outputs",
+                    "slug": "use-structured-outputs"
                   }
                 ],
                 "label": "How to",

Original file line number	Diff line number	Diff line change
`@@ -658,6 +658,10 @@`
`658`	`658`	`{`
`659`	`659`	`"label": "Query embedding models",`
`660`	`660`	`"slug": "query-embedding-models"`
	`661`	`+ },`
	`662`	`+ {`
	`663`	`+ "label": "Use structured outputs",`
	`664`	`+ "slug": "use-structured-outputs"`
`661`	`665`	`}`
`662`	`666`	`],`
`663`	`667`	`"label": "How to",`