update based on feedback

mrbullwinkle · mrbullwinkle · commit 997bacd68f01 · 2023-03-20T17:18:31.000-04:00
diff --git a/articles/cognitive-services/openai/concepts/models.md b/articles/cognitive-services/openai/concepts/models.md
@@ -19,8 +19,8 @@ Azure OpenAI provides access to many different models, grouped by family and cap
 
 | Model family | Description |
 |--|--|
-| [GPT-4](#gpt-4-models) | A set of models that improve on GPT-3.5 and can understand as well as generate natural language and code.|
-| [GPT-3](#gpt-3-models) | A series of models that can understand and generate natural language. This includes the new [ChatGPT model](#chatgpt-gpt-35-turbo). |
+| [GPT-4](#gpt-4-models) | A set of models that improve on GPT-3.5 and can understand as well as generate natural language and code. These models are currently in preview.|
+| [GPT-3](#gpt-3-models) | A series of models that can understand and generate natural language. This includes the new [ChatGPT model (preview)](#chatgpt-gpt-35-turbo). |
 | [Codex](#codex-models) | A series of models that can understand and generate code, including translating natural language to code. |
 | [Embeddings](#embeddings-models) | A set of models that can understand and use embeddings. An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Currently, we offer three families of Embeddings models for different functionalities: similarity, text search, and code search. |
 
@@ -56,15 +56,15 @@ You can get a list of models that are available for both inference and fine-tuni
 
 We recommend starting with the most capable model in a model family to confirm whether the model capabilities meet your requirements. Then you can stay with that model or move to a model with lower capability and cost, optimizing around that model's capabilities.
 
-## GPT-4 models (limited preview)
+## GPT-4 models (preview)
 
-GPT-4 is a large multimodal model meaning while it currently accepts text inputs and emits text outputs. It will eventually be able to accept image inputs as well. GPT-4 can solve difficult problems with greater accuracy than any of OpenAI's previous models. Like gpt-35-turbo, GPT-4 is optimized for chat but works well for traditional completions tasks.
+ GPT-4 can solve difficult problems with greater accuracy than any of OpenAI's previous models. Like gpt-35-turbo, GPT-4 is optimized for chat but works well for traditional completions tasks.
 
- These models are currently in limited preview. For access, existing Azure OpenAI customers can [apply by filling out this form](https://aka.ms/oai/get-gpt4).
+ These models are currently in preview. For access, existing Azure OpenAI customers can [apply by filling out this form](https://aka.ms/oai/get-gpt4).
 - `gpt-4`
 - `gpt-4-32k`
 
-The `gpt-4` supports 8192 max input tokens and the `gpt-4-32k` supports up to 32,768. The full name of the model will also indicate version so the first set of models are named `gpt-4-0314`, and `gpt-4-32k-0314`.
+The `gpt-4` supports 8192 max input tokens and the `gpt-4-32k` supports up to 32,768 tokens.
 
 ## GPT-3 models
 
@@ -103,7 +103,7 @@ Ada is usually the fastest model and can perform tasks like parsing text, addres
 
 **Use for**: Parsing text, simple classification, address correction, keywords
 
-### ChatGPT (gpt-35-turbo)
+### ChatGPT (gpt-35-turbo) (preview)
 
 The ChatGPT model (gpt-35-turbo) is a language model designed for conversational interfaces and the model behaves differently than previous GPT-3 models. Previous models were text-in and text-out, meaning they accepted a prompt string and returned a completion to append to the prompt. However, the ChatGPT model is conversation-in and message-out. The model expects a prompt string formatted in a specific chat-like transcript format, and returns a completion that represents a model-written message in the chat.
 
@@ -186,20 +186,20 @@ When using our embeddings models, keep in mind their limitations and risks.
 | text-davinci-002 | Yes | No | East US, South Central US, West Europe | N/A | 4,097 | Jun 2021 |
 | text-davinci-003 | Yes | No | East US, West Europe | N/A | 4,097 | Jun 2021 |
 | text-davinci-fine-tune-002<sup>1</sup> | Yes | No | N/A | East US, West Europe<sup>2</sup> |  |  |
-| gpt-35-turbo<sup>3</sup> (ChatGPT) | Yes | No | East US, South Central US | N/A | 4,096 | Sep 2021
+| gpt-35-turbo<sup>3</sup> (ChatGPT) (preview) | Yes | No | East US, South Central US | N/A | 4,096 | Sep 2021
 
 <sup>1</sup> The model is available by request only. Currently we aren't accepting new requests to use the model.
 <br><sup>2</sup> East US and West Europe are currently unavailable for new customers to fine-tune due to high demand. Please use US South Central region for fine-tuning. 
-<br><sup>3</sup> Currently, only version `"0301"` of this model is available. This version of the model will be deprecated on 8/1/2023 in favor of newer version of the gpt-35-model. See [ChatGPT model versioning](../how-to/chatgpt.md#model-versioning) for more details.
+<br><sup>3</sup> Currently, only version `0301` of this model is available. This version of the model will be deprecated on 8/1/2023 in favor of newer version of the gpt-35-model. See [ChatGPT model versioning](../how-to/chatgpt.md#model-versioning) for more details.
 
 ### GPT-4 Models
 
 |  Model ID                | Supports Completions | Supports Embeddings |  Base model Regions       | Fine-Tuning Regions | Max Request (tokens) | Training Data (up to)  |
 |  ----------------------- | -------------------- | ------------------- | ------------------------- | ------------------- | -------------------- | ---------------------- |
-| `gpt-4` <sup>1,</sup><sup>2</sup>     | Yes                  | No                  | East US, South Central US |  N/A                | 8,192                | September 2021         |
-| `gpt-4-32k` <sup>1,</sup><sup>2</sup> | Yes                  | No                  | East US, South Central US |  N/A                | 32,768               | September 2021         |
+| `gpt-4` <sup>1,</sup><sup>2</sup> (preview)     | Yes                  | No                  | East US, South Central US |  N/A                | 8,192                | September 2021         |
+| `gpt-4-32k` <sup>1,</sup><sup>2</sup> (preview) | Yes                  | No                  | East US, South Central US |  N/A                | 32,768               | September 2021         |
 
-<sup>1</sup> The model is in limited preview and only available by request.<br>
+<sup>1</sup> The model is in preview and only available by request.<br>
 <sup>2</sup> Currently, only version `0314` of this model is available.
 
 ### Codex Models
diff --git a/articles/cognitive-services/openai/includes/chat-completion.md b/articles/cognitive-services/openai/includes/chat-completion.md
@@ -12,11 +12,11 @@ keywords: ChatGPT
 
 ---
 
-## Working with the ChatGPT and GPT-4 models
+## Working with the ChatGPT and GPT-4 models (preview)
 
-The following code snippet shows the most basic way to use the ChatGPT and GPT-4 models with the ChatCompletion API. If this is your first time using these models programmatically, we recommend starting with our [ChatGPT & GPT-4 Quickstart](../chatgpt-quickstart.md).
+The following code snippet shows the most basic way to use the ChatGPT and GPT-4 models with the Chat Completion API. If this is your first time using these models programmatically, we recommend starting with our [ChatGPT & GPT-4 Quickstart](../chatgpt-quickstart.md).
 
-**GPT-4 models are currently in limited preview.** Existing Azure OpenAI customers can [apply for access by filling out this form](https://aka.ms/oai/get-gpt4).
+**GPT-4 models are currently in preview.** Existing Azure OpenAI customers can [apply for access by filling out this form](https://aka.ms/oai/get-gpt4).
 
 ```python
 import os
@@ -86,13 +86,13 @@ Consider setting `max_tokens` to a slightly higher value than normal such as 300
 
 Unlike previous GPT-3 and GPT-3.5 models, the `gpt-35-turbo` model as well as the `gpt-4` and `gpt-4-32k` models will continue to be updated. When creating a [deployment](../how-to/create-resource.md#deploy-a-model) of these models, you'll also need to specify a model version.
 
-Currently, only version `"0301"` is available for ChatGPT and `0314` for GPT-4 models. We'll continue to make updated versions available in the future. You can find model deprecation times on our [models](../concepts/models.md) page.
+Currently, only version `0301` is available for ChatGPT and `0314` for GPT-4 models. We'll continue to make updated versions available in the future. You can find model deprecation times on our [models](../concepts/models.md) page.
 
-## Working with the ChatCompletion API
+## Working with the Chat Completion API
 
 OpenAI trained the ChatGPT and GPT-4 models to accept input formatted as a conversation. The messages parameter takes an array of dictionaries with a conversation organized by role.
 
-The format of a basic ChatCompletion is as follows:
+The format of a basic Chat Completion is as follows:
 
 ```
 {"role": "system", "content": "Provide some context and/or instructions to the model"},
@@ -169,7 +169,7 @@ Context:
 {"role": "user", "content": "What is Azure OpenAI Service?"}
 ```
 
-#### Few shot learning with ChatCompletion
+#### Few shot learning with Chat Completion
 
 You can also give few shot examples to the model. The approach for few shot learning has changed slightly because of the new prompt format. You can now include a series of messages between the user and the assistant in the prompt as few shot examples. These examples can be used to seed answers to common questions to prime the model or teach particular behaviors to the model.
 
@@ -183,9 +183,9 @@ This is only one example of how you can use few shot learning with ChatGPT and G
 {"role": "assistant", "content": "You can check the status of your tax refund by visiting https://www.irs.gov/refunds"},
 ```
 
-#### Using ChatCompletion for non-chat scenarios
+#### Using Chat Completion for non-chat scenarios
 
-The ChatCompletion API is designed to work with multi-turn conversations, but it also works well for non-chat scenarios.
+The Chat Completion API is designed to work with multi-turn conversations, but it also works well for non-chat scenarios.
 
 For example, for an entity extraction scenario, you might use the following prompt:
 
@@ -201,7 +201,7 @@ For example, for an entity extraction scenario, you might use the following prom
 
 ## Creating a basic conversation loop
 
-The examples so far have shown you the basic mechanics of interacting with the ChatCompletion API. This example shows you how to create a conversation loop that performs the following actions:
+The examples so far have shown you the basic mechanics of interacting with the Chat Completion API. This example shows you how to create a conversation loop that performs the following actions:
 
 - Continuously takes console input, and properly formats it as part of the messages array as user role content.
 - Outputs responses that are printed to the console and formatted and added to the messages array as assistant role content.
diff --git a/articles/cognitive-services/openai/includes/chat-markup-language.md b/articles/cognitive-services/openai/includes/chat-markup-language.md
@@ -11,14 +11,12 @@ manager: nitinme
 keywords: ChatGPT
 ---
 
-## Working with the ChatGPT and GPT-4 models
+## Working with the ChatGPT models (preview)
 
 > [!NOTE]
-> The ChatCompletion API is the recommended method of interacting with the ChatGPT and GPT-4 models.
+> The Chat Completion API is the recommended method of interacting with the ChatGPT (gtp-45-turbo) models.
 
-The following code snippet shows the most basic way to use the ChatGPT and GPT-4 models with ChatML. If this is your first time using these models programmatically we recommend starting with our [ChatGPT & GPT-4 Quickstart](../chatgpt-quickstart.md).
-
-**GPT-4 models are currently in limited preview.** Existing Azure OpenAI customers can [apply for access by filling out this form](https://aka.ms/oai/get-gpt4).
+The following code snippet shows the most basic way to use the ChatGPT models with ChatML. If this is your first time using these models programmatically we recommend starting with our [ChatGPT & GPT-4 Quickstart](../chatgpt-quickstart.md).
 
 ```python
 import os
@@ -29,7 +27,7 @@ openai.api_version = "2022-12-01"
 openai.api_key = os.getenv("OPENAI_API_KEY")
 
 response = openai.Completion.create(
-  engine="gpt-35-turbo", # The deployment name you chose when you deployed the ChatGPT or GPT-4 model.
+  engine="gpt-35-turbo", # The deployment name you chose when you deployed the ChatGPT model
   prompt="<|im_start|>system\nAssistant is a large language model trained by OpenAI.\n<|im_end|>\n<|im_start|>user\nWhat's the difference between garbanzo beans and chickpeas?\n<|im_end|>\n<|im_start|>assistant\n",
   temperature=0,
   max_tokens=500,
@@ -53,16 +51,16 @@ Consider setting `max_tokens` to a slightly higher value than normal such as 300
 
 Unlike previous GPT-3 and GPT-3.5 models, the `gpt-35-turbo` model as well as the `gpt-4` and `gpt-4-32k` models will continue to be updated. When creating a [deployment](../how-to/create-resource.md#deploy-a-model) of these models, you'll also need to specify a model version.
 
-Currently, only version `"0301"` is available for ChatGPT and `0314` for GPT-4 models. We'll continue to make updated versions available in the future. You can find model deprecation times on our [models](../concepts/models.md) page.
+Currently, only version `0301` is available for ChatGPT. We'll continue to make updated versions available in the future. You can find model deprecation times on our [models](../concepts/models.md) page.
 
 <a id="chatml"></a>
 
 ## Working with Chat Markup Language (ChatML)
 
 > [!NOTE]  
-> OpenAI continues to improve the ChatGPT and GPT-4 models and the Chat Markup Language used with the models will continue to evolve in the future. We'll keep this document updated with the latest information.
+> OpenAI continues to improve the ChatGPT and the Chat Markup Language used with the models will continue to evolve in the future. We'll keep this document updated with the latest information.
 
-OpenAI trained the ChatGPT and GPT-4 models on special tokens that delineate the different parts of the prompt. The prompt starts with a system message that is used to prime the model followed by a series of messages between the user and the assistant.
+OpenAI trained the ChatGPT on special tokens that delineate the different parts of the prompt. The prompt starts with a system message that is used to prime the model followed by a series of messages between the user and the assistant.
 
 The format of a basic ChatML prompt is as follows:
 
@@ -211,7 +209,7 @@ You can also provide instructions in the system message to guide the model on ho
 
 ## Managing conversations
 
-The token limit for `gpt-35-turbo` is 4096 tokens, whereas the token limits for `gpt-4` and `gpt-4-32k` are 8192 and 32768 respectively. This limit includes the token count from both the prompt and completion. The number of tokens in the prompt combined with the value of the `max_tokens` parameter must stay under 4096 or you'll receive an error.
+The token limit for `gpt-35-turbo` is 4096 tokens. This limit includes the token count from both the prompt and completion. The number of tokens in the prompt combined with the value of the `max_tokens` parameter must stay under 4096 or you'll receive an error.
 
 It’s your responsibility to ensure the prompt and completion falls within the token limit. This means that for longer conversations, you need to keep track of the token count and only send the model a prompt that falls within the token limit.
 
@@ -241,7 +239,7 @@ system_message = f"<|im_start|>system\n{'<your system message>'}\n<|im_end|>"
 messages = [{"sender": "user", "text": user_input}]
 
 response = openai.Completion.create(
-    engine="gpt-35-turbo", # The deployment name you chose when you deployed the ChatGPT or GPT-4 model.
+    engine="gpt-35-turbo", # The deployment name you chose when you deployed the ChatGPT model.
     prompt=create_prompt(system_message, messages),
     temperature=0.5,
     max_tokens=250,
@@ -257,7 +255,7 @@ print(response['choices'][0]['text'])
 
 ## Staying under the token limit
 
-The simplest approach to staying under the token limit is to truncate the oldest messages in the conversation when you reach the token limit.
+The simplest approach to staying under the token limit is to remove the oldest messages in the conversation when you reach the token limit.
 
 You can choose to always include as many tokens as possible while staying under the limit or you could always include a set number of previous messages assuming those messages stay within the limit. It's important to keep in mind that longer prompts take longer to generate a response and incur a higher cost than shorter prompts.