update

mrbullwinkle · mrbullwinkle · commit 763f5f2a42f3 · 2024-11-01T12:42:51.000-04:00
diff --git a/articles/ai-services/openai/includes/api-versions/latest-inference.md b/articles/ai-services/openai/includes/api-versions/latest-inference.md
@@ -8,7 +8,7 @@ ms.topic: include
 ms.date: 11/01/2024
 ---
 
-## Completions - Create
+## Completions
 
 ```HTTP
 POST https://{endpoint}/openai/deployments/{deployment-id}/completions?api-version=2024-10-21
@@ -42,7 +42,7 @@ Creates a completion for the provided prompt, parameters, and chosen model.
 | frequency_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.<br> | No | 0 |
 | logit_bias | object | Modify the likelihood of specified tokens appearing in the completion.<br><br>Accepts a JSON object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.<br><br>As an example, you can pass `{"50256": -100}` to prevent the <&#124;endoftext&#124;> token from being generated.<br> | No | None |
 | logprobs | integer | Include the log probabilities on the `logprobs` most likely output tokens, as well the chosen tokens. For example, if `logprobs` is 5, the API will return a list of the five most likely tokens. The API will always return the `logprob` of the sampled token, so there may be up to `logprobs+1` elements in the response.<br><br>The maximum value for `logprobs` is 5.<br> | No | None |
-| max_tokens | integer | The maximum number of [tokens](https://platform.openai.com/tokenizer) that can be generated in the completion.<br><br>The token count of your prompt plus `max_tokens` can't exceed the model's context length. | No | 16 |
+| max_tokens | integer | The maximum number of tokens that can be generated in the completion.<br><br>The token count of your prompt plus `max_tokens` can't exceed the model's context length. | No | 16 |
 | n | integer | How many completions to generate for each prompt.<br><br>**Note:** Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for `max_tokens` and `stop`.<br> | No | 1 |
 | presence_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.<br> | No | 0 |
 | seed | integer | If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.<br><br>Determinism isn't guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend.<br> | No |  |
@@ -116,7 +116,7 @@ Status Code: 200
 }
 ```
 
-## Embeddings - Create
+## Embeddings
 
 ```HTTP
 POST https://{endpoint}/openai/deployments/{deployment-id}/embeddings?api-version=2024-10-21
@@ -262,7 +262,7 @@ Status Code: 200
 }
 ```
 
-## Chat completions - Create
+## Chat completions 
 
 ```HTTP
 POST https://{endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version=2024-10-21
@@ -294,7 +294,7 @@ Creates a completion for the chat message
 | top_p | number | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both.<br> | No | 1 |
 | stream | boolean | If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. | No | False |
 | stop | string or array | Up to four sequences where the API will stop generating further tokens.<br> | No |  |
-| max_tokens | integer | The maximum number of [tokens](/tokenizer) that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length.| No |  |
+| max_tokens | integer | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length.| No |  |
 | max_completion_tokens | integer | An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. | No |  |
 | presence_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.<br> | No | 0 |
 | frequency_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.<br> | No | 0 |
@@ -805,7 +805,7 @@ Status Code: 200
 }
 ```
 
-## Image generations - Create
+## Image generation
 
 ```HTTP
 POST https://{endpoint}/openai/deployments/{deployment-id}/images/generations?api-version=2024-10-21
@@ -1235,7 +1235,7 @@ Represents a completion response from the API. Note: both the streamed and nonst
 | top_p | number | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.<br><br>We generally recommend altering this or `temperature` but not both.<br> | No | 1 |
 | stream | boolean | If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only [server-sent events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#Event_stream_format) as they become available, with the stream terminated by a `data: [DONE]` message. | No | False |
 | stop | string or array | Up to four sequences where the API will stop generating further tokens.<br> | No |  |
-| max_tokens | integer | The maximum number of [tokens](/tokenizer) that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length.| No |  |
+| max_tokens | integer | The maximum number of tokens that can be generated in the chat completion.<br><br>The total length of input tokens and generated tokens is limited by the model's context length.| No |  |
 | max_completion_tokens | integer | An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. | No |  |
 | presence_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.<br> | No | 0 |
 | frequency_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.<br> | No | 0 |
@@ -2033,7 +2033,7 @@ No properties defined for this component.
 | description | string | A description of what the function does, used by the model to choose when and how to call the function. | No |  |
 | name | string | The name of the function to be called. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64. | Yes |  |
 | parameters | [FunctionParameters](#functionparameters) | The parameters the functions accepts, described as a JSON Schema object. See the guide](/azure/ai-services/openai/how-to/function-calling) for examples, and the [JSON Schema reference](https://json-schema.org/understanding-json-schema/) for documentation about the format. <br><br>Omitting `parameters` defines a function with an empty parameter list. | No |  |
-| strict | boolean | Whether to enable strict schema adherence when generating the function call. If set to true, the model will follow the exact schema defined in the `parameters` field. Only a subset of JSON Schema is supported when `strict` is `true`. Learn more about Structured Outputs in the [function calling guide](docs/guides/function-calling). | No | False |
+| strict | boolean | Whether to enable strict schema adherence when generating the function call. If set to true, the model will follow the exact schema defined in the `parameters` field. Only a subset of JSON Schema is supported when `strict` is `true`. | No | False |
 
 
 ### ResponseFormatText
@@ -2310,4 +2310,14 @@ The image url or encoded image if successful, and an error otherwise.
 | revised_prompt | string | The prompt that was used to generate the image, if there was any revision to the prompt. | No |  |
 | prompt_filter_results | [dalleFilterResults](#dallefilterresults) | Information about the content filtering category (hate, sexual, violence, self_harm), if it has been detected, as well as the severity level (very_low, low, medium, high-scale that determines the intensity and risk level of harmful content) and if it has been filtered or not. Information about jailbreak content and profanity, if it has been detected, and if it has been filtered or not. And information about customer blocklist, if it has been filtered and its id. | No |  |
 
+### Completions extensions
 
+Completions extensions aren't part of the latest GA version of the Azure OpenAI data plane inference spec.
+
+### Chatmessage
+
+The Chat message object isn't part of the latest GA version of the Azure OpenAI data plane inference spec.
+
+### Text to speech
+
+Is not currently part of the latest Azure OpenAI GA version of the Azure OpenAI data plane inference spec. Refer to the latest [preview](../../reference-preview.md) version for this capability.