Skip to content

Commit 277b35c

Browse files
authored
feat(ai): add structured outputs (#3713)
1 parent 1190529 commit 277b35c

File tree

10 files changed

+282
-6
lines changed

10 files changed

+282
-6
lines changed

ai-data/generative-apis/api-cli/using-chat-api.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,13 +68,13 @@ Our chat API is OpenAI compatible. Use OpenAI’s [API reference](https://platfo
6868
- max_tokens
6969
- stream
7070
- presence_penalty
71+
- response_format
7172
- logprobs
7273
- stop
7374
- seed
7475

7576
### Unsupported parameters
7677

77-
- response_format
7878
- frequency_penalty
7979
- n
8080
- top_logprobs

ai-data/generative-apis/api-cli/using-generative-apis.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ dates:
1313

1414
## Access
1515

16-
- Access to this service is restricted while in beta. You can request access to the product by filling out a form on the Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-api).
16+
- Access to this service is restricted while in beta. You can request access to the product by filling out a form on the Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-apis).
1717
- A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) is needed.
1818

1919
## Authentication

ai-data/generative-apis/concepts.mdx

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,12 @@ Parameters are settings that control the behavior and performance of generative
3636

3737
The inter-token latency (ITL) corresponds to the average time elapsed between two generated tokens. It is usually expressed in milliseconds.
3838

39+
## JSON mode
40+
41+
JSON mode allows you to guide the language model in outputting well-structured JSON data.
42+
To activate JSON mode, provide the `response_format` parameter with `{"type": "json_object"}`.
43+
JSON mode is useful for applications like chatbots or APIs, where a machine-readable format is essential for easy processing.
44+
3945
## Prompt Engineering
4046

4147
Prompt engineering involves crafting specific and well-structured inputs (prompts) to guide the model towards generating the desired output. Effective prompt design is crucial for generating relevant responses, particularly in complex or creative tasks. It often requires experimentation to find the right balance between specificity and flexibility.
@@ -52,6 +58,12 @@ Stop words are a parameter set to tell the model to stop generating further toke
5258

5359
Streaming is a parameter allowing responses to be delivered in real-time, showing parts of the output as they are generated rather than waiting for the full response. Scaleway is following the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events) standard. This behavior usually enhances user experience by providing immediate feedback and a more interactive conversation.
5460

61+
## Structured outputs
62+
63+
Structured outputs enable you to format the model's responses to suit specific use cases. To activate structured outputs, provide the `response_format` parameter with `"type": "json_schema"` and define its `"json_schema": {}`.
64+
By customizing the structure, such as using lists, tables, or key-value pairs, you ensure that the data returned is in a form that is easy to extract and process.
65+
By specifying the expected response format through the API, you can make the model consistently deliver the output your system requires.
66+
5567
## Temperature
5668

5769
Temperature is a parameter that controls the randomness of the model's output during text generation. A higher temperature produces more creative and diverse outputs, while a lower temperature makes the model's responses more deterministic and focused. Adjusting the temperature allows users to balance creativity with coherence in the generated text.

ai-data/generative-apis/how-to/query-embedding-models.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The embedding service is OpenAI compatible. Refer to OpenAI's [embedding documen
1818

1919
<Macro id="requirements" />
2020

21-
- Access to this service is restricted while in beta. You can request access to the product by filling out a form on the Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-api).
21+
- Access to this service is restricted while in beta. You can request access to the product by filling out a form on Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-apis).
2222
- A Scaleway account logged into the [console](https://console.scaleway.com)
2323
- [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization
2424
- A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) for API authentication

ai-data/generative-apis/how-to/query-text-models.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ There are several ways to interact with text models:
1919

2020
<Macro id="requirements" />
2121

22-
- Access to this service is restricted while in beta. You can request access to the product by filling out a form on the Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-api).
22+
- Access to this service is restricted while in beta. You can request access to the product by filling out a form on Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-apis).
2323
- A Scaleway account logged into the [console](https://console.scaleway.com)
2424
- [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization
2525
- A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) for API authentication
Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
---
2+
meta:
3+
title: How to use structured outputs
4+
description: Learn how to interact with structured outputs using Scaleway's Chat Completions API service.
5+
content:
6+
h1: How to use structured outputs
7+
paragraph: Learn how to interact with powerful text models using Scaleway's Chat Completions API service.
8+
tags: chat-completitions-api
9+
dates:
10+
validation: 2024-09-17
11+
posted: 2024-09-17
12+
---
13+
14+
15+
Structured outputs allow users to get consistent, machine-readable JSON format responses from language models.
16+
JSON, as a widely-used format, enables seamless integration with a variety of platforms and applications. Its interoperability is crucial for developers aiming to incorporate AI functionality into their current systems with minimal adjustments.
17+
18+
By specifying a response format when using the [Chat Completions API](/ai-data/generative-apis/api-cli/using-chat-api/), you can ensure that responses are returned in a JSON structure.
19+
There are two main modes for generating JSON: **Object Mode** (schemaless) and **Schema Mode** (deterministic, structured output).
20+
21+
You can interact with text models in several ways:
22+
- Via the Scaleway [console](https://console.scaleway.com), which will soon provide a complete [playground](/ai-data/generative-apis/how-to/query-text-models/#accessing-the-playground), aiming to test models, adapt parameters, and observe how these changes affect the output in real-time.
23+
- Via the [Chat API](/ai-data/generative-apis/how-to/query-text-models/#querying-text-models-via-api)
24+
25+
<Macro id="requirements" />
26+
27+
- Access to Generative APIs.
28+
While in beta, the service is restricted to invited users. You can request access by filling out a form on Scaleway's [Betas page](https://www.scaleway.com/en/betas/#generative-apis).
29+
- A Scaleway account logged into the [console](https://console.scaleway.com)
30+
- [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization
31+
- A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) for API authentication
32+
- Python 3.7+ installed on your system
33+
34+
## Types of structured outputs
35+
36+
- **JSON mode** (schemaless):
37+
- Type: `{"type": "json_object"}`
38+
- This mode is non-deterministic and allows the model to output a JSON object without strict validation.
39+
- Useful for flexible outputs when you expect the model to infer a reasonable structure based on your prompt.
40+
- JSON mode is older and has been used by developers since early API implementations.
41+
42+
- **Structured outputs (schema mode)** (deterministic/structured):
43+
- Type `{"type": "json_schema"}`
44+
- This mode enforces a strict schema format, where the output adheres to the predefined structure.
45+
- Supports complex types and validation mechanisms as per the [JSON schema specification](https://json-schema.org/specification/).
46+
- Structured outputs is a newer feature implemented by OpenAI in 2024 to enable stricter, schema-based response formatting.
47+
48+
<Message type="note">
49+
- All LLMs on the Scaleway library support **JSON mode** and **Structured outputs**, however, the quality of results will vary in the schemaless JSON mode.
50+
- JSON mode: It is important to explicitly ask the model to generate a JSON output either in system prompt or user prompt. To prevent infinite generations, model providers most often encourage to ask the model for short JSON objects.
51+
- Structured outputs: Scaleway supports the [JSON schema specification](https://json-schema.org/specification/) including nested schemas composition (`anyOf`, `allOf`, `oneOf` etc), `$ref`, `all` types, and regular expressions.
52+
</Message>
53+
54+
## Code examples
55+
56+
<Message type="tip">
57+
Before diving into the code examples, ensure you have the necessary libraries installed:
58+
```bash
59+
pip install openai pydantic
60+
```
61+
</Message>
62+
63+
The following Python examples demonstrate how to use both **JSON mode** and **Structured outputs** to generate structured responses.
64+
65+
We will send to our LLM a voice note transcript in order to structure it.
66+
Below is our base code:
67+
68+
```python
69+
import json
70+
from openai import OpenAI
71+
from pydantic import BaseModel, Field
72+
73+
# Set your preferred model
74+
MODEL = "llama-3.1-8b-instruct"
75+
76+
# Set your API key
77+
API_KEY = "<SCW_API_KEY>"
78+
79+
client = OpenAI(
80+
base_url="https://api.scaleway.ai/v1",
81+
api_key=API_KEY,
82+
)
83+
84+
# Define the schema for the output using Pydantic
85+
class VoiceNote(BaseModel):
86+
title: str = Field(description="A title for the voice note")
87+
summary: str = Field(description="A short one sentence summary of the voice note.")
88+
actionItems: list[str] = Field(description="A list of action items from the voice note")
89+
90+
# Transcript to use for the output
91+
TRANSCRIPT = (
92+
"Good evening! It's 6:30 PM, and I'm just getting home from work. I have a few things to do "
93+
"before I can relax. First, I'll need to water the plants in the garden since they've been in the sun all day. "
94+
"Then, I'll start preparing dinner. I think a simple pasta dish with some garlic bread should be good. "
95+
"While that's cooking, I'll catch up on a couple of phone calls I missed earlier."
96+
)
97+
```
98+
99+
### Using JSON mode (schemaless)
100+
101+
In JSON mode, you can prompt the model to output a JSON object without enforcing a strict schema.
102+
103+
```python
104+
extract = client.chat.completions.create(
105+
messages=[
106+
{
107+
"role": "system",
108+
"content": "The following is a voice message transcript. Only answer in JSON.",
109+
},
110+
{
111+
"role": "user",
112+
"content": TRANSCRIPT,
113+
},
114+
],
115+
model=MODEL,
116+
response_format={
117+
"type": "json_object",
118+
},
119+
)
120+
output = json.loads(extract.choices[0].message.content)
121+
print(json.dumps(output, indent=2))
122+
```
123+
124+
Output example:
125+
```json
126+
{
127+
"current_time": "6:30 PM",
128+
"tasks": [
129+
{
130+
"task": "water the plants in the garden",
131+
"priority": "high"
132+
},
133+
{
134+
"task": "prepare dinner (pasta with garlic bread)",
135+
"priority": "high"
136+
},
137+
{
138+
"task": "catch up on phone calls",
139+
"priority": "medium"
140+
}
141+
]
142+
}
143+
```
144+
145+
### Using structured outputs with JSON schema (Pydantic)
146+
147+
Using [Pydantic](https://docs.pydantic.dev/latest/concepts/models/), users can define the schema as a Python class and enforce the model to return results adhering to this schema.
148+
149+
```python
150+
extract = client.chat.completions.create(
151+
messages=[
152+
{
153+
"role": "system",
154+
"content": "The following is a voice message transcript. Only answer in JSON.",
155+
},
156+
{
157+
"role": "user",
158+
"content": TRANSCRIPT,
159+
},
160+
],
161+
model=MODEL,
162+
response_format={
163+
"type": "json_schema",
164+
"json_schema": {
165+
"schema": VoiceNote.model_json_schema(),
166+
}
167+
},
168+
)
169+
output = json.loads(extract.choices[0].message.content)
170+
print(json.dumps(output, indent=2))
171+
```
172+
173+
Output example:
174+
```json
175+
{
176+
"title": "To-Do List",
177+
"summary": "Returning from work, need to complete tasks before relaxing",
178+
"actionItems": [
179+
"Water garden",
180+
"Prepare dinner: pasta dish with garlic bread",
181+
"Catch up on missed phone calls"
182+
]
183+
}
184+
```
185+
186+
### Using structured outputs with JSON schema (manual definition)
187+
188+
Alternatively, users can manually define the JSON schema inline when calling the model.
189+
190+
```python
191+
extract = client.chat.completions.create(
192+
messages=[
193+
{
194+
"role": "system",
195+
"content": "The following is a voice message transcript. Only answer in JSON.",
196+
},
197+
{
198+
"role": "user",
199+
"content": TRANSCRIPT,
200+
},
201+
],
202+
model=MODEL,
203+
response_format={
204+
"type": "json_schema",
205+
"json_schema": {
206+
"schema": {
207+
"type": "object",
208+
"properties": {
209+
"title": {"type": "string"},
210+
"summary": {"type": "string"},
211+
"actionItems": {
212+
"type": "array",
213+
"items": {"type": "string"}
214+
}
215+
},
216+
"required": ["title", "summary", "actionItems"]
217+
}
218+
}
219+
}
220+
)
221+
output = json.loads(extract.choices[0].message.content)
222+
print(json.dumps(output, indent=2))
223+
```
224+
225+
Output example:
226+
```json
227+
{
228+
"title": "Evening Routine",
229+
"actionItems": [
230+
"Water the plants",
231+
"Cook dinner (pasta and garlic bread)",
232+
"Make phone calls"
233+
],
234+
"summary": "Made a list of tasks to accomplish before relaxing tonight"
235+
}
236+
```
237+
238+
## Conclusion
239+
240+
Using structured outputs with LLMs can significantly enhance data handling in your applications.
241+
By choosing between JSON mode and Structured outputs with JSON schema, you control the consistency and structure of the model's responses to suit your specific needs.
242+
243+
- **JSON mode** is flexible but less predictable.
244+
- **Structured outputs** provide strict adherence to a predefined schema, ensuring consistency.
245+
246+
Experiment with both methods to determine which best fits your application's requirements.

ai-data/generative-apis/quickstart.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Hosted in European data centers and priced competitively per million tokens used
2424

2525
<Macro id="requirements" />
2626

27-
- Access to this service is restricted while in beta. You can request access to the product by filling out a form on the Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-api).
27+
- Access to this service is restricted while in beta. You can request access to the product by filling out a form on the Scaleway's [betas page](https://www.scaleway.com/en/betas/#generative-apis).
2828
- A Scaleway account logged into the [console](https://console.scaleway.com)
2929
- [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization
3030
- A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/)

ai-data/managed-inference/concepts.mdx

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,12 @@ Hallucinations in LLMs refer to instances where generative AI models generate re
5151
Inference is the process of deriving logical conclusions or predictions from available data. This concept involves using statistical methods, machine learning algorithms, and reasoning techniques to make decisions or draw insights based on observed patterns or evidence.
5252
Inference is fundamental in various AI applications, including natural language processing, image recognition, and autonomous systems.
5353

54+
## JSON mode
55+
56+
JSON mode allows you to guide the language model in outputting well-structured JSON data.
57+
To activate JSON mode, provide the `response_format` parameter with `{"type": "json_object"}`.
58+
JSON mode is useful for applications like chatbots or APIs, where a machine-readable format is essential for easy processing.
59+
5460
## Large Language Model Applications
5561

5662
LLM Applications are applications or software tools that leverage the capabilities of LLMs for various tasks, such as text generation, summarization, or translation. These apps provide user-friendly interfaces for interacting with the models and accessing their functionalities.
@@ -74,4 +80,11 @@ LLMs provided for deployment are named with suffixes that denote their quantizat
7480

7581
## Retrieval Augmented Generation (RAG)
7682

77-
RAG is an architecture combining information retrieval elements with language generation to enhance the capabilities of LLMs. It involves retrieving relevant context or knowledge from external sources and incorporating it into the generation process to produce more informative and contextually grounded outputs.
83+
RAG is an architecture combining information retrieval elements with language generation to enhance the capabilities of LLMs. It involves retrieving relevant context or knowledge from external sources and incorporating it into the generation process to produce more informative and contextually grounded outputs.
84+
85+
## Structured outputs
86+
87+
Structured outputs enable you to format the model's responses to suit specific use cases. To activate structured outputs, provide the `response_format` parameter with `"type": "json_schema"` and define its `"json_schema": {}`.
88+
By customizing the structure, such as using lists, tables, or key-value pairs, you ensure that the data returned is in a form that is easy to extract and process.
89+
By specifying the expected response format through the API, you can make the model consistently deliver the output your system requires.
90+

ai-data/managed-inference/reference-content/openai-compatibility.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ print(chat_completion.choices[0].message.content)
6666
- `temperature` (default 0.7)
6767
- `top_p` (default 1)
6868
- `presence_penalty`
69+
- `response_format`
6970
- `logprobs`
7071
- `stop`
7172
- `seed`

menu/navigation.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -658,6 +658,10 @@
658658
{
659659
"label": "Query embedding models",
660660
"slug": "query-embedding-models"
661+
},
662+
{
663+
"label": "Use structured outputs",
664+
"slug": "use-structured-outputs"
661665
}
662666
],
663667
"label": "How to",

0 commit comments

Comments
 (0)