Skip to content

Commit e096047

Browse files
committed
out of band responses
1 parent 04203cc commit e096047

File tree

2 files changed

+50
-1
lines changed

2 files changed

+50
-1
lines changed

articles/ai-services/openai/how-to/realtime-audio.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,33 @@ An example `session.update` that configures several aspects of the session, incl
145145

146146
The server responds with a [`session.updated`](../realtime-audio-reference.md#realtimeservereventsessionupdated) event to confirm the session configuration.
147147

148+
## Out of band responses
149+
150+
By default, responses generated during a session are added to the default conversation state. In some cases, you might want to generate responses outside the default conversation. This can be useful for generating multiple responses concurrently or for generating responses that don't affect the default conversation state. For example, you can limit the number of turns considered by the model when generating a response.
151+
152+
You can create responses outside the default conversation by setting the [`response.conversation`](../realtime-audio-reference.md#realtimeresponseoptions) field to the string `none` when creating a response with the [`response.create`](../realtime-audio-reference.md#realtimeclienteventresponsecreate) client event.
153+
154+
In the same [`response.create`](../realtime-audio-reference.md#realtimeclienteventresponsecreate) client event, you can also set the [`response.metadata`](../realtime-audio-reference.md#realtimeresponseoptions) field to help you identify which response is being generated for this client-sent event.
155+
156+
```json
157+
{
158+
"type": "response.create",
159+
"response": {
160+
"conversation": "none",
161+
"metadata": {
162+
"topic": "world_capitals"
163+
},
164+
"modalities": ["text"],
165+
"prompt": "What is the capital of France?"
166+
}
167+
}
168+
```
169+
170+
When the server responds with a [`response.done`](../realtime-audio-reference.md#realtimeservereventresponsecreated) event, the response will contain the metadata you provided. You can identify the corresponding response for the client-sent event via the `response.metadata` field.
171+
172+
> [!IMPORTANT]
173+
> If you create any responses outside the default conversation, be sure to always check the `response.metadata` field to help you identify the corresponding response for the client-sent event. You should even check the `response.metadata` field for responses that are part of the default conversation. That way, you can ensure that you're handling the correct response for the client-sent event.
174+
148175
## Voice activity detection (VAD) and the audio buffer
149176

150177
The server maintains an input audio buffer containing client-provided audio that has not yet been committed to the conversation state.

articles/ai-services/openai/realtime-audio-reference.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1100,7 +1100,11 @@ The server `session.updated` event is returned when a session is updated by the
11001100

11011101
| Field | Type | Description |
11021102
|-------|------|-------------|
1103-
| type | [RealtimeContentPartType](#realtimecontentparttype) | The type of the content part. |
1103+
| type | [RealtimeContentPartType](#realtimecontentparttype) | The content type (`input_text`, `input_audio`, `item_reference`, `text`).<br><br>A property of the `function` object. |
1104+
| text | string | The text content, used for `input_text` and `text` content types. |
1105+
| id | string | ID of a previous conversation item to reference (for `item_reference` content types in `response.create` events). These can reference both client and server created items. |
1106+
| audio | string | Base64-encoded audio bytes, used for `input_audio` content type. |
1107+
| transcript | string | The transcript of the audio, used for `input_audio` content type. |
11041108

11051109
### RealtimeContentPartType
11061110

@@ -1115,6 +1119,21 @@ The server `session.updated` event is returned when a session is updated by the
11151119

11161120
The item to add to the conversation.
11171121

1122+
This table describes all `RealtimeConversationItem` propertiest. The properties that are applicable per event depend on the [RealtimeItemType](#realtimeitemtype).
1123+
1124+
| Field | Type | Description |
1125+
|-------|------|-------------|
1126+
| id | string | The unique ID of the item. The ID can be specified by the client to help manage server-side context. If the client doesn't provide an ID, the server generates one. |
1127+
| type | [RealtimeItemType](#realtimeitemtype) | The type of the item.<br><br>Allowed values: `message`, `function_call`, `function_call_output` |
1128+
| object | string | Identifier for the API object being returned - always `realtime.item`. |
1129+
| status | [RealtimeItemStatus](#realtimeitemstatus) | The status of the item (`completed`, `incomplete`). These have no effect on the conversation, but are accepted for consistency with the `conversation.item.created` event. |
1130+
| role | [RealtimeMessageRole](#realtimemessagerole) | The role of the message sender (`user`, `assistant`, `system`), only applicable for `message` items. |
1131+
| content | array of [RealtimeContentPart](#realtimecontentpart) | The content of the message, applicable for `message` items.<br><br>- Message items of role `system` support only `input_text` content<br>- Message items of role `user` support `input_text` and `input_audio` content<br>- Message items of role `assistant` support `text` content. |
1132+
| call_id | string | The ID of the function call (for `function_call` and `function_call_output` items). If passed on a `function_call_output` item, the server will check that a `function_call` item with the same ID exists in the conversation history. |
1133+
| name | string | The name of the function being called (for `function_call` items). |
1134+
| arguments | string | The arguments of the function call (for `function_call` items). |
1135+
| output | string | The output of the function call (for `function_call_output` items). |
1136+
11181137
### RealtimeConversationRequestItem
11191138

11201139
You use the `RealtimeConversationRequestItem` object to create a new item in the conversation via the [conversation.item.create](#realtimeclienteventconversationitemcreate) event.
@@ -1333,6 +1352,9 @@ The response resource.
13331352
| tool_choice | [RealtimeToolChoice](#realtimetoolchoice) | The tool choice for the session. |
13341353
| temperature | number | The sampling temperature for the model. The allowed temperature values are limited to [0.6, 1.2]. Defaults to 0.8. |
13351354
| max__output_tokens | integer or "inf" | The maximum number of output tokens per assistant response, inclusive of tool calls.<br><br>Specify an integer between 1 and 4096 to limit the output tokens. Otherwise, set the value to "inf" to allow the maximum number of tokens.<br><br>For example, to limit the output tokens to 1000, set `"max_response_output_tokens": 1000`. To allow the maximum number of tokens, set `"max_response_output_tokens": "inf"`.<br><br>Defaults to `"inf"`. |
1355+
| conversation | string | Controls which conversation the response is added to. Currently supports auto and none, with auto as the default value. The auto value means that the contents of the response will be added to the default conversation. Set this to none to create an out-of-band response which will not add items to the default conversation. |
1356+
| metadata | map | Set of up to 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.<br/><br/>For example: `metadata: { topic: "classification" }` |
1357+
| input | array | Input items to include in the prompt for the model. Creates a new context for this response, without including the default conversation. Can include references to items from the default conversation.<br><br>Array items: [RealtimeConversationItemBase](#realtimeconversationitembase) |
13361358

13371359
### RealtimeResponseSession
13381360

0 commit comments

Comments
 (0)