fixes

aahill · aahill · commit ded628a7ec36 · 2024-09-27T11:20:39.000-05:00
diff --git a/articles/ai-services/openai/assistants-reference-runs.md b/articles/ai-services/openai/assistants-reference-runs.md
@@ -606,7 +606,7 @@ Represents an execution run on a thread.
 | `max_prompt_tokens` | integer or null | The maximum number of prompt tokens specified to have been used over the course of the run. |
 | `max_completion_tokens` | integer or null | The maximum number of completion tokens specified to have been used over the course of the run. |
 | `usage` | object or null | Usage statistics related to the run. This value will be null if the run is not in a terminal state (for example `in_progress`, `queued`). |
-| `truncation_strategy | object | Controls for how a thread will be truncated prior to the run. | 
+| `truncation_strategy` | object | Controls for how a thread will be truncated prior to the run. | 
 | `response_format` | string | The format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since `gpt-3.5-turbo-1106`. |
 | `tool_choice` | string | Controls which (if any) tool is called by the model. `none` means the model won't call any tools and instead generates a message. `auto` is the default value and means the model can pick between generating a message or calling a tool. |
 
@@ -685,6 +685,14 @@ with client.beta.threads.runs.stream(
   stream.until_done()
 ```
 
+## Truncation object
+
+Controls for how a thread will be truncated prior to the run. Use this to control the initial context window of the run.
+
+| Name | Type | Description | Required |
+|---  |---   |---         |
+| `type` | string | The truncation strategy to use for the thread. The default is `auto`. If set to `last_messages`, the thread will be truncated to the n most recent messages in the thread. When set to `auto`, messages in the middle of the thread will be dropped to fit the context length of the model, `max_prompt_tokens`. | Yes |	
+| `last_messages`	| integer | The number of most recent messages from the thread when constructing the context for the run. | No | 
 
 ## Message delta object
 
@@ -742,4 +750,4 @@ Events are emitted whenever a new object is created, transitions to a new state,
 | `thread.message.completed` | `data` is a message. | Occurs when a message is completed. |
 | `thread.message.incomplete` | `data` is a message. | Occurs when a message ends before it is completed. |
 | `error` | `data` is an error. | Occurs when an error occurs. This can happen due to an internal server error or a timeout. |
-| `done` | `data` is `[DONE]` | Occurs when a stream ends. |
+| `done` | `data` is `[DONE]` | Occurs when a stream ends. |
diff --git a/articles/ai-services/openai/assistants-reference.md b/articles/ai-services/openai/assistants-reference.md
@@ -311,8 +311,7 @@ Assistants use the [same API for file upload as fine-tuning](/rest/api/azureopen
 | `instructions` | string or null | The system instructions that the assistant uses. The maximum length is 32768 characters.|
 | `tools` | array | A list of tool enabled on the assistant. There can be a maximum of 128 tools per assistant. Tools can be of types code_interpreter, or function. A `function` description can be a maximum of 1,024 characters.|
 | `metadata` | map | Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.|
-
-| `temperature` | number or null | Optional | Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. |
-| `top_p` | number or null | Optional | Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. |
-| `response_format` | string or object | Optional | Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length. |
-| `tool_resources` | object | Optional | A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the `code_interpreter` tool requires a list of file IDs, while the `file_search` tool requires a list of vector store IDs. |
+| `temperature` | number or null | Defaults to 1. Determines what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. |
+| `top_p` | number or null | Defaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. |
+| `response_format` | string or object | Specifies the format that the model must output. Compatible with GPT-4 Turbo and all GPT-3.5 Turbo models since gpt-3.5-turbo-1106. Setting this parameter to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON. Importantly, when using JSON mode, you must also instruct the model to produce JSON yourself using a system or user message. Without this instruction, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Additionally, the message content may be partially cut off if you use `finish_reason="length"`, which indicates the generation exceeded `max_tokens` or the conversation exceeded the max context length. |
+| `tool_resources` | object | A set of resources that are used by the assistant's tools. The resources are specific to the type of tool. For example, the `code_interpreter` tool requires a list of file IDs, while the `file_search` tool requires a list of vector store IDs. |