You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> Using GPT-35-Turbo models with the completion endpoint as described in this article remains in preview and is only possible with gpt-35-turbo verision (0301). We strongly recommend using the GA Chat Completion API/endpoint. The Chat Completion API is the recommended method of interacting with the GPT-35-Turbo models. The Chat Completion API is also the only way to access the GPT-4 models.
17
+
> Using GPT-3.5-Turbo models with the completion endpoint as described in this article remains in preview and is only possible with `gpt-35-turbo` version (0301) which is [slated for retirement as early as June 13th, 2024](../concepts/model-retirements.md#current-models). We strongly recommend using the [GA Chat Completion API/endpoint](./chatgpt.md). The Chat Completion API is the recommended method of interacting with the GPT-3.5-Turbo models. The Chat Completion API is also the only way to access the GPT-4 models.
18
18
19
-
The following code snippet shows the most basic way to use the GPT-35-Turbo models with ChatML. If this is your first time using these models programmatically we recommend starting with our [GPT-35-Turbo & GPT-4 Quickstart](../chatgpt-quickstart.md).
19
+
The following code snippet shows the most basic way to use the GPT-3.5-Turbo models with ChatML. If this is your first time using these models programmatically we recommend starting with our [GPT-35-Turbo & GPT-4 Quickstart](../chatgpt-quickstart.md).
20
+
21
+
> [!NOTE]
22
+
> In the Azure OpenAI documentation we refer to GPT-3.5-Turbo, and GPT-35-Turbo interchangeably. The official name of the model on OpenAI is `gpt-3.5-turbo`, but for Azure OpenAI due to Azure specific character constraints the underlying model name is `gpt-35-turbo`.
> The following parameters aren't available with the gpt-35-turbo model: `logprobs`, `best_of`, and `echo`. If you set any of these parameters, you'll get an error.
42
45
43
-
The `<|im_end|>` token indicates the end of a message. We recommend including`<|im_end|>` token as a stop sequence to ensure that the model stops generating text when it reaches the end of the message.
46
+
The `<|im_end|>` token indicates the end of a message. When using ChatML it is recommended to include`<|im_end|>` token as a stop sequence to ensure that the model stops generating text when it reaches the end of the message.
44
47
45
48
Consider setting `max_tokens` to a slightly higher value than normal such as 300 or 500. This ensures that the model doesn't stop generating text before it reaches the end of the message.
46
49
@@ -53,8 +56,6 @@ Unlike previous GPT-3 and GPT-3.5 models, the `gpt-35-turbo` model as well as th
53
56
54
57
You can find the model retirement dates for these models on our [models](../concepts/models.md) page.
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/chatgpt.md
+5-23Lines changed: 5 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,35 +7,17 @@ ms.author: mbullwin #delegenz
7
7
ms.service: azure-ai-openai
8
8
ms.custom: build-2023, build-2023-dataai
9
9
ms.topic: how-to
10
-
ms.date: 03/29/2024
10
+
ms.date: 04/05/2024
11
11
manager: nitinme
12
12
keywords: ChatGPT
13
13
zone_pivot_groups: openai-chat
14
14
---
15
15
16
-
# Learn how to work with the GPT-35-Turbo and GPT-4 models
16
+
# Learn how to work with the GPT-3.5-Turbo and GPT-4 models
17
17
18
-
The GPT-35-Turbo and GPT-4 models are language models that are optimized for conversational interfaces. The models behave differently than the older GPT-3 models. Previous models were text-in and text-out, meaning they accepted a prompt string and returned a completion to append to the prompt. However, the GPT-35-Turbo and GPT-4 models are conversation-in and message-out. The models expect input formatted in a specific chat-like transcript format, and return a completion that represents a model-written message in the chat. While this format was designed specifically for multi-turn conversations, you'll find it can also work well for non-chat scenarios too.
18
+
The GPT-3.5-Turbo and GPT-4 models are language models that are optimized for conversational interfaces. The models behave differently than the older GPT-3 models. Previous models were text-in and text-out, meaning they accepted a prompt string and returned a completion to append to the prompt. However, the GPT-3.5-Turbo and GPT-4 models are conversation-in and message-out. The models expect input formatted in a specific chat-like transcript format, and return a completion that represents a model-written message in the chat. While this format was designed specifically for multi-turn conversations, you'll find it can also work well for non-chat scenarios too.
19
19
20
-
In Azure OpenAI there are two different options for interacting with these type of models:
20
+
This article walks you through getting started with the GPT-3.5-Turbo and GPT-4 models. It's important to use the techniques described here to get the best results. If you try to interact with the models the same way you did with the older model series, the models will often be verbose and provide less useful responses.
21
21
22
-
- Chat Completion API.
23
-
- Completion API with Chat Markup Language (ChatML).
The Chat Completion API is a new dedicated API for interacting with the GPT-35-Turbo and GPT-4 models. This API is the preferred method for accessing these models. **It is also the only way to access the new GPT-4 models**.
26
-
27
-
ChatML uses the same [completion API](../reference.md#completions) that you use for other models like text-davinci-002, it requires a unique token based prompt format known as Chat Markup Language (ChatML). This provides lower level access than the dedicated Chat Completion API, but also requires additional input validation, only supports gpt-35-turbo models, and **the underlying format is more likely to change over time**.
28
-
29
-
This article walks you through getting started with the GPT-35-Turbo and GPT-4 models. It's important to use the techniques described here to get the best results. If you try to interact with the models the same way you did with the older model series, the models will often be verbose and provide less useful responses.
30
-
31
-
::: zone pivot="programming-language-chat-completions"
The following code snippet shows the most basic way to use the GPT-3.5-Turbo and GPT-4 models with the Chat Completion API. If this is your first time using these models programmatically, we recommend starting with our [GPT-3.5-Turbo & GPT-4 Quickstart](../chatgpt-quickstart.md).
18
18
19
+
> [!NOTE]
20
+
> In the Azure OpenAI documentation we refer to GPT-3.5-Turbo, and GPT-35-Turbo interchangeably. The official name of the model on OpenAI is `gpt-3.5-turbo`, but for Azure OpenAI due to Azure specific character constraints the underlying model name is `gpt-35-turbo`.
21
+
19
22
# [OpenAI Python 1.x](#tab/python-new)
20
23
21
24
```python
@@ -173,7 +176,7 @@ Every response includes a `finish_reason`. The possible values for `finish_reaso
173
176
***stop**: API returned complete model output.
174
177
***length**: Incomplete model output due to max_tokens parameter or token limit.
175
178
***content_filter**: Omitted content due to a flag from our content filters.
176
-
***null**:API response still in progress or incomplete.
179
+
***null**:API response still in progress or incomplete.
177
180
178
181
Consider setting `max_tokens` to a slightly higher value than normal such as 300 or 500. This ensures that the model doesn't stop generating text before it reaches the end of the message.
179
182
@@ -361,7 +364,7 @@ while True:
361
364
362
365
---
363
366
364
-
When you run the code above you will get a blank console window. Enter your first question in the window and then hit enter. Once the response is returned, you can repeat the process and keep asking questions.
367
+
When you run the code above you'll get a blank console window. Enter your first question in the window and then hit enter. Once the response is returned, you can repeat the process and keep asking questions.
365
368
366
369
## Managing conversations
367
370
@@ -542,6 +545,16 @@ An alternative approach is to limit the conversation duration to the max token l
542
545
543
546
The token counting portion of the code demonstrated previously is a simplified version of one of [OpenAI's cookbook examples](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb).
544
547
548
+
## Troubleshooting
549
+
550
+
### Don't use ChatML syntax with the Chat Completions endpoint
551
+
552
+
We have found that some customers will try using the [legacy ChatML syntax](../how-to/chat-markup-language.md) with the chat completion endpoints and newer models. ChatML was a preview capability that only worked with the legacy completions endpoint with the `gpt-35-turbo` version 0301 model which is [slated for retirement](../concepts/model-retirements.md). Attempting to use ChatML syntax with newer models and the chat completions endpoint can result errors as well as unexpected model response behavior, and is not recommended.
553
+
554
+
| Error |Cause | Solution |
555
+
|---|---|---|
556
+
| 400 - *Failed to generate output due to special tokens in the input.*| Your prompt contains legacy ChatML tokens not recognized or supported by the model/endpoint. | Ensure that your prompt/messages array does not contain any legacy ChatML tokens. If you are upgrading from a legacy model, please exclude all special tokens before submitting an API request to the model.|
0 commit comments