You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/includes/chat-completion.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -185,7 +185,7 @@ Consider setting `max_tokens` to a slightly higher value than normal, such as 30
185
185
> [!NOTE]
186
186
> The version `gpt-35-turbo` is equivalent to the `gpt-3.5-turbo` model from OpenAI.
187
187
188
-
Unlike previous GPT-3 and GPT-3.5 models, the `gpt-35-turbo` model and the `gpt-4` and `gpt-4-32k` models continue to be updated. When you create a [deployment](../how-to/create-resource.md#deploy-a-model) of these models, you also need to specify a model version.
188
+
Unlike previous GPT-3 and GPT-3.5 models, the `gpt-35-turbo` model and the `gpt-4` and `gpt-4-32k` models will continue to be updated. When you create a [deployment](../how-to/create-resource.md#deploy-a-model) of these models, you also need to specify a model version.
189
189
190
190
You can find the model retirement dates for these models on the [models](../concepts/models.md) page.
191
191
@@ -307,7 +307,7 @@ The examples so far show the basic mechanics of interacting with the Chat Comple
307
307
- Continuously takes console input and properly formats it as part of the messages list as user role content.
308
308
- Outputs responses that are printed to the console and formatted and added to the messages list as assistant role content.
309
309
310
-
Every time a new question is asked, a running transcript of the conversation so far is sent along with the latest question. Because the model has no memory, you need to send an updated transcript with each new question or the model loses the context of the previous questions and answers.
310
+
Every time a new question is asked, a running transcript of the conversation so far is sent along with the latest question. Because the model has no memory, you need to send an updated transcript with each new question or the model will lose the context of the previous questions and answers.
311
311
312
312
# [OpenAI Python 1.x](#tab/python-new)
313
313
@@ -537,9 +537,9 @@ while True:
537
537
538
538
---
539
539
540
-
In this example, after the token count is reached, the oldest messages in the conversation transcript are removed. For efficiency, `del` is used instead of `pop()`. We start at index 1 to always preserve the system message and only remove user or assistant messages. Over time, this method of managing the conversation can cause the conversation quality to degrade. The model gradually loses the context of the earlier portions of the conversation.
540
+
In this example, after the token count is reached, the oldest messages in the conversation transcript are removed. For efficiency, `del` is used instead of `pop()`. We start at index 1 to always preserve the system message and only remove user or assistant messages. Over time, this method of managing the conversation can cause the conversation quality to degrade as the model gradually loses the context of the earlier portions of the conversation.
541
541
542
-
An alternative approach is to limit the conversation duration to the maximum token length or a specific number of turns. When the maximum token limit is reached, the model could lose context if you allow the conversation to continue. Prompt the user to begin a new conversation and clear the messages list to start with the full token limit available.
542
+
An alternative approach is to limit the conversation duration to the maximum token length or a specific number of turns. After the maximum token limit is reached, the model would lose context if you were to allow the conversation to continue. You can prompt the user to begin a new conversation and clear the messages list to start a new conversation with the full token limit available.
543
543
544
544
The token counting portion of the code demonstrated previously is a simplified version of one of [OpenAI's cookbook examples](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb).
545
545
@@ -549,7 +549,7 @@ Here's a troubleshooting tip.
549
549
550
550
### Don't use ChatML syntax with the chat completion endpoint
551
551
552
-
Some customers try to use the [legacy ChatML syntax](../how-to/chat-markup-language.md) with the chat completion endpoints and newer models. ChatML was a preview capability that only worked with the legacy completions endpoint with the `gpt-35-turbo` version 0301 model. This model is [slated for retirement](../concepts/model-retirements.md). If you attempt to use ChatML syntax with newer models, the chat completion endpoint can result in errors and unexpected model response behavior. We don't recommend this use.
552
+
Some customers try to use the [legacy ChatML syntax](../how-to/chat-markup-language.md) with the chat completion endpoints and newer models. ChatML was a preview capability that only worked with the legacy completions endpoint with the `gpt-35-turbo` version 0301 model. This model is [slated for retirement](../concepts/model-retirements.md). If you attempt to use ChatML syntax with newer models and the chat completion endpoint, it can result in errors and unexpected model response behavior. We don't recommend this use.
0 commit comments