You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/assistants-reference-runs.md
+10-2Lines changed: 10 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -606,7 +606,7 @@ Represents an execution run on a thread.
606
606
|`max_prompt_tokens`| integer or null | The maximum number of prompt tokens specified to have been used over the course of the run. |
607
607
|`max_completion_tokens`| integer or null | The maximum number of completion tokens specified to have been used over the course of the run. |
608
608
|`usage`| object or null | Usage statistics related to the run. This value will be null if the run is not in a terminal state (for example `in_progress`, `queued`). |
609
-
| `truncation_strategy | object | Controls for how a thread will be truncated prior to the run. |
609
+
|`truncation_strategy`| object | Controls for how a thread will be truncated prior to the run. |
610
610
|`response_format`| string | The format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since `gpt-3.5-turbo-1106`. |
611
611
|`tool_choice`| string | Controls which (if any) tool is called by the model. `none` means the model won't call any tools and instead generates a message. `auto` is the default value and means the model can pick between generating a message or calling a tool. |
612
612
@@ -685,6 +685,14 @@ with client.beta.threads.runs.stream(
685
685
stream.until_done()
686
686
```
687
687
688
+
## Truncation object
689
+
690
+
Controls for how a thread will be truncated prior to the run. Use this to control the initial context window of the run.
691
+
692
+
| Name | Type | Description | Required |
693
+
|--- |--- |--- |
694
+
|`type`| string | The truncation strategy to use for the thread. The default is `auto`. If set to `last_messages`, the thread will be truncated to the n most recent messages in the thread. When set to `auto`, messages in the middle of the thread will be dropped to fit the context length of the model, `max_prompt_tokens`. | Yes |
695
+
|`last_messages`| integer | The number of most recent messages from the thread when constructing the context for the run. | No |
688
696
689
697
## Message delta object
690
698
@@ -742,4 +750,4 @@ Events are emitted whenever a new object is created, transitions to a new state,
742
750
|`thread.message.completed`|`data` is a message. | Occurs when a message is completed. |
743
751
|`thread.message.incomplete`|`data` is a message. | Occurs when a message ends before it is completed. |
744
752
|`error`|`data` is an error. | Occurs when an error occurs. This can happen due to an internal server error or a timeout. |
745
-
|`done`|`data` is `[DONE]`| Occurs when a stream ends. |
753
+
|`done`|`data` is `[DONE]`| Occurs when a stream ends. |
Copy file name to clipboardExpand all lines: articles/ai-services/openai/assistants-reference.md
+4-5Lines changed: 4 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -311,8 +311,7 @@ Assistants use the [same API for file upload as fine-tuning](/rest/api/azureopen
311
311
|`instructions`| string or null | The system instructions that the assistant uses. The maximum length is 32768 characters.|
312
312
|`tools`| array | A list of tool enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, or function. A `function` description can be a maximum of 1,024 characters.|
313
313
|`metadata`| map | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.|
314
-
315
-
|`temperature`| number or null | Optional | Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. |
316
-
|`top_p`| number or null | Optional | Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. |
317
-
|`response_format`| string or object | Optional | Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length. |
318
-
|`tool_resources`| object | Optional | A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the `code_interpreter` tool requires a list of file IDs, while the `file_search` tool requires a list of vector store IDs. |
314
+
|`temperature`| number or null | Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. |
315
+
|`top_p`| number or null | Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. |
316
+
|`response_format`| string or object | Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length. |
317
+
|`tool_resources`| object | A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the `code_interpreter` tool requires a list of file IDs, while the `file_search` tool requires a list of vector store IDs. |
0 commit comments