Skip to content

Commit ded628a

Browse files
committed
fixes
1 parent 016eedc commit ded628a

File tree

2 files changed

+14
-7
lines changed

2 files changed

+14
-7
lines changed

articles/ai-services/openai/assistants-reference-runs.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -606,7 +606,7 @@ Represents an execution run on a thread.
606606
| `max_prompt_tokens` | integer or null | The maximum number of prompt tokens specified to have been used over the course of the run. |
607607
| `max_completion_tokens` | integer or null | The maximum number of completion tokens specified to have been used over the course of the run. |
608608
| `usage` | object or null | Usage statistics related to the run. This value will be null if the run is not in a terminal state (for example `in_progress`, `queued`). |
609-
| `truncation_strategy | object | Controls for how a thread will be truncated prior to the run. |
609+
| `truncation_strategy` | object | Controls for how a thread will be truncated prior to the run. |
610610
| `response_format` | string | The format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since `gpt-3.5-turbo-1106`. |
611611
| `tool_choice` | string | Controls which (if any) tool is called by the model. `none` means the model won't call any tools and instead generates a message. `auto` is the default value and means the model can pick between generating a message or calling a tool. |
612612

@@ -685,6 +685,14 @@ with client.beta.threads.runs.stream(
685685
stream.until_done()
686686
```
687687

688+
## Truncation object
689+
690+
Controls for how a thread will be truncated prior to the run. Use this to control the initial context window of the run.
691+
692+
| Name | Type | Description | Required |
693+
|--- |--- |--- |
694+
| `type` | string | The truncation strategy to use for the thread. The default is `auto`. If set to `last_messages`, the thread will be truncated to the n most recent messages in the thread. When set to `auto`, messages in the middle of the thread will be dropped to fit the context length of the model, `max_prompt_tokens`. | Yes |
695+
| `last_messages` | integer | The number of most recent messages from the thread when constructing the context for the run. | No |
688696

689697
## Message delta object
690698

@@ -742,4 +750,4 @@ Events are emitted whenever a new object is created, transitions to a new state,
742750
| `thread.message.completed` | `data` is a message. | Occurs when a message is completed. |
743751
| `thread.message.incomplete` | `data` is a message. | Occurs when a message ends before it is completed. |
744752
| `error` | `data` is an error. | Occurs when an error occurs. This can happen due to an internal server error or a timeout. |
745-
| `done` | `data` is `[DONE]` | Occurs when a stream ends. |
753+
| `done` | `data` is `[DONE]` | Occurs when a stream ends. |

articles/ai-services/openai/assistants-reference.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -311,8 +311,7 @@ Assistants use the [same API for file upload as fine-tuning](/rest/api/azureopen
311311
| `instructions` | string or null | The system instructions that the assistant uses. The maximum length is 32768 characters.|
312312
| `tools` | array | A list of tool enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, or function. A `function` description can be a maximum of 1,024 characters.|
313313
| `metadata` | map | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.|
314-
315-
| `temperature` | number or null | Optional | Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. |
316-
| `top_p` | number or null | Optional | Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. |
317-
| `response_format` | string or object | Optional | Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length. |
318-
| `tool_resources` | object | Optional | A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the `code_interpreter` tool requires a list of file IDs, while the `file_search` tool requires a list of vector store IDs. |
314+
| `temperature` | number or null | Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. |
315+
| `top_p` | number or null | Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. |
316+
| `response_format` | string or object | Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length. |
317+
| `tool_resources` | object | A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the `code_interpreter` tool requires a list of file IDs, while the `file_search` tool requires a list of vector store IDs. |

0 commit comments

Comments
 (0)