Merge pull request #230017 from dereklegenzoff/delegenz-chatgpt2

prmerger-automator[bot] · web-flow · commit 152967d2e3f6 · 2023-03-09T17:00:34.000Z
Post-release updates for ChatGPT docs
diff --git a/articles/cognitive-services/openai/concepts/models.md b/articles/cognitive-services/openai/concepts/models.md
@@ -174,10 +174,11 @@ When using our embeddings models, keep in mind their limitations and risks.
 | text-davinci-002 | Yes | No | East US, South Central US, West Europe | N/A |
 | text-davinci-003 | Yes | No | East US | N/A |
 | text-davinci-fine-tune-002<sup>1</sup> | Yes | No | N/A | East US, West Europe |
-| gpt-35-turbo (ChatGPT) | Yes | No | N/A | East US, South Central US |
+| gpt-35-turbo<sup>3</sup> (ChatGPT) | Yes | No | N/A | East US, South Central US |
 
 <sup>1</sup> The model is available by request only. Currently we aren't accepting new requests to use the model.
 <br><sup>2</sup> East US is currently unavailable for new customers to fine-tune due to high demand. Please use US South Central region for US based training.
+<br><sup>3</sup> Currently, only version `"0301"` of this model is available. This version of the model will be deprecated on 8/1/2023 in favor of newer version of the gpt-35-model. See [ChatGPT model versioning](../how-to/chatgpt.md#model-versioning) for more details.
 
 ### Codex Models
 |  Model ID  | Supports Completions | Supports Embeddings |  Base model Regions   | Fine-Tuning Regions |	
diff --git a/articles/cognitive-services/openai/how-to/chatgpt.md b/articles/cognitive-services/openai/how-to/chatgpt.md
@@ -40,7 +40,7 @@ response = openai.Completion.create(
 print(response['choices'][0]['text'])
 ```
 > [!NOTE]  
-> The following parameters aren't available with the gpt-35-turbo model: `logprobs`, `best_of`, and `echo`. If you set any of these parameters to a value other than their default, you'll get an error.
+> The following parameters aren't available with the gpt-35-turbo model: `logprobs`, `best_of`, and `echo`. If you set any of these parameters, you'll get an error.
 
 The `<|im_end|>` token indicates the end of a message. We recommend including `<|im_end|>` token as a stop sequence to ensure that the model stops generating text when it reaches the end of the message. You can read more about the special tokens in the [Chat Markup Language (ChatML)](#chatml) section.
 
diff --git a/articles/cognitive-services/openai/includes/chatgpt-python.md b/articles/cognitive-services/openai/includes/chatgpt-python.md
@@ -99,14 +99,13 @@ echo export OPENAI_API_BASE="REPLACE_WITH_YOUR_ENDPOINT_HERE" >> /etc/environmen
     openai.api_key = os.getenv("OPENAI_API_KEY")
 
     response = openai.Completion.create(
-        engine="text-chat-davinci-002-test",
+        engine="gpt-35-turbo",
         prompt="<|im_start|>system\nThe system is an AI assistant that helps people find information.\n<|im_end|>\n<|im_start|>user\nDoes Azure OpenAI support customer managed keys?\n<|im_end|>\n<|im_start|>assistant",
         temperature=1,
         max_tokens=800,
         top_p=0.95,
         frequency_penalty=0,
         presence_penalty=0,
-        best_of=1,
         stop=["<|im_end|>"])
 
     print(response['choices'][0]['text'])
diff --git a/articles/cognitive-services/openai/includes/chatgpt-rest.md b/articles/cognitive-services/openai/includes/chatgpt-rest.md
@@ -77,7 +77,7 @@ echo export OPENAI_API_BASE="REPLACE_WITH_YOUR_ENDPOINT_HERE" >> /etc/environmen
 In a bash shell run the following:
 
 ```bash
-curl https://$OPENAI_API_BASE/openai/deployments/docs-test/completions?api-version=2022-12-01 \
+curl https://$OPENAI_API_BASE/openai/deployments/gpt-35-turbo/completions?api-version=2022-12-01 \
   -H "Content-Type: application/json" \
   -H "api-key: $OPENAI_API_KEY" \
   -d '{
@@ -86,7 +86,6 @@ curl https://$OPENAI_API_BASE/openai/deployments/docs-test/completions?api-versi
   "temperature": 1,
   "frequency_penalty": 0,
   "presence_penalty": 0,
-  "best_of": 1,
   "top_p": 0.95,
   "stop": ["<|im_end|>"]
 }'
@@ -98,7 +97,7 @@ curl https://$OPENAI_API_BASE/openai/deployments/docs-test/completions?api-versi
 {"id":"cmpl-6mZPEDkBPasCTxueCy9iVRMY4ZGD4",
 "object":"text_completion",
 "created":1677033864,
-"model":"text-chat-davinci-002",
+"model":"gpt-35-turbo",
 "choices":
 [{"text":"\nYes, Azure OpenAI supports customer managed keys. These keys allow customers to manage their own encryption keys for the OpenAI services, rather than relying on Azure's managed keys. This provides an additional layer of security for customers' data and models.","index":0,"logprobs":null,"finish_reason":"stop"}],
 "usage":{"prompt_tokens":66,"completion_tokens":52,"total_tokens":118}}
diff --git a/articles/cognitive-services/openai/includes/chatgpt-studio.md b/articles/cognitive-services/openai/includes/chatgpt-studio.md
@@ -89,7 +89,7 @@ openai.api_version = "2022-12-01"
 openai.api_key = os.getenv("OPENAI_API_KEY")
 
 response = openai.Completion.create(
-  engine="text-chat-davinci-002-test",
+  engine="gpt-35-turbo",
   prompt="<|im_start|>system\nYou are an Xbox customer support agent whose primary goal is to help users with issues they are experiencing with their Xbox devices. You are friendly and concise. You only provide factual answers to queries, and do not provide answers that are not related to Xbox.\n<|im_end|>\n<|im_start|>user\nI am interested in buying an Xbox?\n<|im_end|>\n<|im_start|>assistant\nGreat! Xbox is a popular gaming console that offers a wide variety of games and features. You can purchase an Xbox from various retailers such as Amazon, Best Buy, and the Microsoft Store. Additionally, you can also purchase an Xbox bundle that includes games and accessories. If you have any specific questions about the Xbox or the purchasing process, please let me know and I'll be happy to assist you.\n\n<|im_end|>\n<|im_start|>user\nWhat models are available?\n<|im_end|>\n<|im_start|>assistant\nCurrently, there are three Xbox models available for purchase: Xbox Series X, Xbox Series S, and Xbox One X. \n\nThe Xbox Series X is the most powerful Xbox console to date, with a custom-designed processor, 12 teraflops of processing power, and 16GB of memory. It also supports 4K gaming and has a built-in 4K Ultra HD Blu-ray player.\n\nThe Xbox Series S is a more affordable option, with a custom-\n<|im_end|>\n",
   temperature=0,
   max_tokens=100,
diff --git a/articles/cognitive-services/openai/quotas-limits.md b/articles/cognitive-services/openai/quotas-limits.md
@@ -23,8 +23,8 @@ The following sections provide you with a quick guide to the quotas and limits t
 | Limit Name | Limit Value |
 |--|--|
 | OpenAI resources per region | 2 | 
-| Requests per minute per model* | ChatGPT & Davinci-models (002 and later): 120  <br> All other models: 300 |
-| Tokens per minute per model* | ChatGPT & Davinci-models (002 and later): 40,000  <br> All other models: 120,000 |
+| Requests per minute per model* | Davinci-models (002 and later): 120  <br> ChatGPT model: 300 <br> All other models: 300|
+| Tokens per minute per model* | Davinci-models (002 and later): 40,000  <br> ChatGPT model: 120,000 <br> All other models: 120,000 |
 | Max fine-tuned model deployments* | 2 |
 | Ability to deploy same model to multiple deployments | Not allowed |
 | Total number of training jobs per resource | 100 |
diff --git a/articles/cognitive-services/openai/reference.md b/articles/cognitive-services/openai/reference.md
@@ -76,12 +76,12 @@ POST https://{your-resource-name}.openai.azure.com/openai/deployments/{deploymen
 | ```top_p``` | number | Optional | 1 | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both. |
 | ```n``` | integer | Optional |  1 | How many completions to generate for each prompt. Note: Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop. |
 | ```stream``` | boolean | Optional | False | Whether to stream back partial progress. If set, tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.| 
-| ```logprobs``` | integer | Optional | null | Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. For example, if logprobs is 10, the API will return a list of the 10 most likely tokens. the API will always return the logprob of the sampled token, so there may be up to logprobs+1 elements in the response. |
-| ```echo``` | boolean | Optional | False | Echo back the prompt in addition to the completion |
+| ```logprobs``` | integer | Optional | null | Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. For example, if logprobs is 10, the API will return a list of the 10 most likely tokens. the API will always return the logprob of the sampled token, so there may be up to logprobs+1 elements in the response. This parameter cannot be used with `gpt-35-turbo`. |
+| ```echo``` | boolean | Optional | False | Echo back the prompt in addition to the completion. This parameter cannot be used with `gpt-35-turbo`. |
 | ```stop``` | string or array | Optional | null | Up to four sequences where the API will stop generating further tokens. The returned text won't contain the stop sequence. |
 | ```presence_penalty``` | number | Optional | 0 | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
 | ```frequency_penalty``` | number | Optional | 0 | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
-| ```best_of``` | integer | Optional | 1 | Generates best_of completions server-side and returns the "best" (the one with the lowest log probability per token). Results can't be streamed. When used with n, best_of controls the number of candidate completions and n specifies how many to return – best_of must be greater than n. Note: Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop. |
+| ```best_of``` | integer | Optional | 1 | Generates best_of completions server-side and returns the "best" (the one with the lowest log probability per token). Results can't be streamed. When used with n, best_of controls the number of candidate completions and n specifies how many to return – best_of must be greater than n. Note: Because this parameter generates many completions, it can quickly consume your token quota. Use carefully and ensure that you have reasonable settings for max_tokens and stop. This parameter cannot be used with `gpt-35-turbo`. |
 | ```logit_bias``` | map | Optional | null | Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. You can use this tokenizer tool (which works for both GPT-2 and GPT-3) to convert text to token IDs. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. As an example, you can pass {"50256": -100} to prevent the <\|endoftext\|> token from being generated. |
 
 #### Example request