You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/includes/api-versions/latest-inference-preview.md
+26-26Lines changed: 26 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -295,7 +295,7 @@ Creates a completion for the chat message
295
295
| top_p | number | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both.<br> | No | 1 |
296
296
| stream | boolean | If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message.<br> | No | False |
297
297
| stop | string or array | Up to four sequences where the API will stop generating further tokens.<br> | No ||
298
-
| max_tokens | integer | The maximum number of [tokens](/tokenizer) that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length. <br> | No ||
298
+
| max_tokens | integer | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length. <br> | No ||
299
299
| max_completion_tokens | integer | An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. This is only supported in o1 series models. Will expand the support to other models in future API release. | No ||
300
300
| presence_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.<br> | No | 0 |
301
301
| frequency_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.<br> | No | 0 |
@@ -2199,7 +2199,7 @@ Create a message.
2199
2199
2200
2200
|**Content-Type**|**Type**|**Description**|
2201
2201
|:---|:---|:---|
2202
-
|application/json |[messageObject](#messageobject)| Represents a message within a threads.|
2202
+
|application/json |[messageObject](#messageobject)| Represents a message within a thread.|
2203
2203
2204
2204
### Examples
2205
2205
@@ -2275,7 +2275,7 @@ Retrieve a message.
2275
2275
2276
2276
|**Content-Type**|**Type**|**Description**|
2277
2277
|:---|:---|:---|
2278
-
|application/json |[messageObject](#messageobject)| Represents a message within a threads.|
2278
+
|application/json |[messageObject](#messageobject)| Represents a message within a thread.|
2279
2279
2280
2280
### Examples
2281
2281
@@ -2354,7 +2354,7 @@ Modifies a message.
2354
2354
2355
2355
|**Content-Type**|**Type**|**Description**|
2356
2356
|:---|:---|:---|
2357
-
|application/json |[messageObject](#messageobject)| Represents a message within a threads.|
2357
+
|application/json |[messageObject](#messageobject)| Represents a message within a thread.|
2358
2358
2359
2359
### Examples
2360
2360
@@ -2472,7 +2472,7 @@ Create a thread and run it in one request.
2472
2472
2473
2473
|**Content-Type**|**Type**|**Description**|
2474
2474
|:---|:---|:---|
2475
-
|application/json |[runObject](#runobject)| Represents an execution run on a threads.|
2475
+
|application/json |[runObject](#runobject)| Represents an execution run on a thread.|
2476
2476
2477
2477
### Examples
2478
2478
@@ -2743,7 +2743,7 @@ Create a run.
2743
2743
2744
2744
|**Content-Type**|**Type**|**Description**|
2745
2745
|:---|:---|:---|
2746
-
|application/json |[runObject](#runobject)| Represents an execution run on a threads.|
2746
+
|application/json |[runObject](#runobject)| Represents an execution run on a thread.|
2747
2747
2748
2748
### Examples
2749
2749
@@ -2832,7 +2832,7 @@ Retrieves a run.
2832
2832
2833
2833
|**Content-Type**|**Type**|**Description**|
2834
2834
|:---|:---|:---|
2835
-
|application/json |[runObject](#runobject)| Represents an execution run on a threads.|
2835
+
|application/json |[runObject](#runobject)| Represents an execution run on a thread.|
2836
2836
2837
2837
### Examples
2838
2838
@@ -2910,7 +2910,7 @@ Modifies a run.
2910
2910
2911
2911
|**Content-Type**|**Type**|**Description**|
2912
2912
|:---|:---|:---|
2913
-
|application/json |[runObject](#runobject)| Represents an execution run on a threads.|
2913
+
|application/json |[runObject](#runobject)| Represents an execution run on a thread.|
2914
2914
2915
2915
### Examples
2916
2916
@@ -3025,7 +3025,7 @@ When a run has the `status: "requires_action"` and `required_action.type` is `su
3025
3025
3026
3026
|**Content-Type**|**Type**|**Description**|
3027
3027
|:---|:---|:---|
3028
-
|application/json |[runObject](#runobject)| Represents an execution run on a threads.|
3028
+
|application/json |[runObject](#runobject)| Represents an execution run on a thread.|
3029
3029
3030
3030
### Examples
3031
3031
@@ -3142,7 +3142,7 @@ Cancels a run that is `in_progress`.
3142
3142
3143
3143
|**Content-Type**|**Type**|**Description**|
3144
3144
|:---|:---|:---|
3145
-
|application/json |[runObject](#runobject)| Represents an execution run on a threads.|
3145
+
|application/json |[runObject](#runobject)| Represents an execution run on a thread.|
3146
3146
3147
3147
### Examples
3148
3148
@@ -4609,7 +4609,7 @@ Represents a completion response from the API. Note: both the streamed and non-s
4609
4609
| top_p | number | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both.<br> | No | 1 |
4610
4610
| stream | boolean | If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message.<br> | No | False |
4611
4611
| stop | string or array | Up to 4 sequences where the API will stop generating further tokens.<br> | No ||
4612
-
| max_tokens | integer | The maximum number of [tokens](/tokenizer) that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length. <br> | No ||
4612
+
| max_tokens | integer | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length. <br> | No ||
4613
4613
| max_completion_tokens | integer | An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. This is only supported in o1 series models. Will expand the support to other models in future API release. | No ||
4614
4614
| presence_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.<br> | No | 0 |
4615
4615
| frequency_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.<br> | No | 0 |
@@ -6403,7 +6403,7 @@ Specifies a tool the model should use. Use to force the model to call a specific
6403
6403
6404
6404
### runObject
6405
6405
6406
-
Represents an execution run on a threads.
6406
+
Represents an execution run on a thread.
6407
6407
6408
6408
| Name | Type | Description | Required | Default |
0 commit comments