MicrosoftDocs
diff --git a/‎articles/ai-services/openai/includes/chat-completion.md
Lines changed: 229 additions & 15 deletions b/‎articles/ai-services/openai/includes/chat-completion.md
Lines changed: 229 additions & 15 deletions
@@ -6,26 +6,28 @@ author: mrbullwinkle #dereklegenzoff
 ms.author: mbullwin #delegenz
 ms.service: azure-ai-openai
 ms.topic: include
-ms.date: 05/31/2023
+ms.date: 11/02/2023
 manager: nitinme
 keywords: ChatGPT
 
 ---
 
-## Working with the GPT-35-Turbo and GPT-4 models 
+## Working with the GPT-3.5-Turbo and GPT-4 models
 
-The following code snippet shows the most basic way to use the GPT-35-Turbo and GPT-4 models with the Chat Completion API. If this is your first time using these models programmatically, we recommend starting with our [GPT-35-Turbo & GPT-4 Quickstart](../chatgpt-quickstart.md).
+The following code snippet shows the most basic way to use the GPT-3.5-Turbo and GPT-4 models with the Chat Completion API. If this is your first time using these models programmatically, we recommend starting with our [GPT-3.5-Turbo & GPT-4 Quickstart](../chatgpt-quickstart.md).
+
+# [OpenAI Python 0.28.1](#tab/python)
 
 ```python
 import os
 import openai
 openai.api_type = "azure"
 openai.api_version = "2023-05-15" 
-openai.api_base = os.getenv("OPENAI_API_BASE")  # Your Azure OpenAI resource's endpoint value.
-openai.api_key = os.getenv("OPENAI_API_KEY")
+openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")  # Your Azure OpenAI resource's endpoint value.
+openai.api_key = os.getenv("AZURE_OPENAI_KEY")
 
 response = openai.ChatCompletion.create(
-    engine="gpt-35-turbo", # The deployment name you chose when you deployed the GPT-35-Turbo or GPT-4 model.
+    engine="gpt-35-turbo", # The deployment name you chose when you deployed the GPT-3.5-Turbo or GPT-4 model.
     messages=[
         {"role": "system", "content": "Assistant is a large language model trained by OpenAI."},
         {"role": "user", "content": "Who were the founders of Microsoft?"}
@@ -34,12 +36,15 @@ response = openai.ChatCompletion.create(
 
 print(response)
 
-print(response['choices'][0]['message']['content'])
+# To print only the response content text:
+# print(response['choices'][0]['message']['content'])
 ```
 
 ### Output
 
-```
+JSON formatting added artificially for ease of reading.
+
+```json
 {
   "choices": [
     {
@@ -64,6 +69,100 @@ print(response['choices'][0]['message']['content'])
 
 ```
 
+# [OpenAI Python 1.0](#tab/python-new)
+
+```python
+import os
+from openai import AzureOpenAI
+
+client = AzureOpenAI(
+  api_key = os.getenv("AZURE_OPENAI_KEY"),  
+  api_version = "2023-05-15",
+  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
+)
+
+response = client.chat.completions.create(
+    model="gpt-35-turbo", # model = "deployment_name".
+    messages=[
+        {"role": "system", "content": "Assistant is a large language model trained by OpenAI."},
+        {"role": "user", "content": "Who were the founders of Microsoft?"}
+    ]
+)
+
+#print(response)
+print(response.model_dump_json(indent=2))
+print(response.choices[0].message.content)
+```
+
+```output
+{
+  "id": "chatcmpl-8GHoQAJ3zN2DJYqOFiVysrMQJfe1P",
+  "choices": [
+    {
+      "finish_reason": "stop",
+      "index": 0,
+      "message": {
+        "content": "Microsoft was founded by Bill Gates and Paul Allen. They established the company on April 4, 1975. Bill Gates served as the CEO of Microsoft until 2000 and later as Chairman and Chief Software Architect until his retirement in 2008, while Paul Allen left the company in 1983 but remained on the board of directors until 2000.",
+        "role": "assistant",
+        "function_call": null
+      },
+      "content_filter_results": {
+        "hate": {
+          "filtered": false,
+          "severity": "safe"
+        },
+        "self_harm": {
+          "filtered": false,
+          "severity": "safe"
+        },
+        "sexual": {
+          "filtered": false,
+          "severity": "safe"
+        },
+        "violence": {
+          "filtered": false,
+          "severity": "safe"
+        }
+      }
+    }
+  ],
+  "created": 1698892410,
+  "model": "gpt-35-turbo",
+  "object": "chat.completion",
+  "usage": {
+    "completion_tokens": 73,
+    "prompt_tokens": 29,
+    "total_tokens": 102
+  },
+  "prompt_filter_results": [
+    {
+      "prompt_index": 0,
+      "content_filter_results": {
+        "hate": {
+          "filtered": false,
+          "severity": "safe"
+        },
+        "self_harm": {
+          "filtered": false,
+          "severity": "safe"
+        },
+        "sexual": {
+          "filtered": false,
+          "severity": "safe"
+        },
+        "violence": {
+          "filtered": false,
+          "severity": "safe"
+        }
+      }
+    }
+  ]
+}
+Microsoft was founded by Bill Gates and Paul Allen. They established the company on April 4, 1975. Bill Gates served as the CEO of Microsoft until 2000 and later as Chairman and Chief Software Architect until his retirement in 2008, while Paul Allen left the company in 1983 but remained on the board of directors until 2000.
+```
+
+---
+
 > [!NOTE]  
 > The following parameters aren't available with the new GPT-35-Turbo and GPT-4 models: `logprobs`, `best_of`, and `echo`. If you set any of these parameters, you'll get an error.
 
@@ -205,13 +304,16 @@ The examples so far have shown you the basic mechanics of interacting with the C
 
 This means that every time a new question is asked, a running transcript of the conversation so far is sent along with the latest question. Since the model has no memory, you need to send an updated transcript with each new question or the model will lose context of the previous questions and answers.
 
-```Python
+
+# [OpenAI Python 0.28.1](#tab/python)
+
+```python
 import os
 import openai
 openai.api_type = "azure"
 openai.api_version = "2023-05-15" 
-openai.api_base = os.getenv("OPENAI_API_BASE")  # Your Azure OpenAI resource's endpoint value .
-openai.api_key = os.getenv("OPENAI_API_KEY")
+openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")  # Your Azure OpenAI resource's endpoint value.
+openai.api_key = os.getenv("AZURE_OPENAI_KEY")
 
 conversation=[{"role": "system", "content": "You are a helpful assistant."}]
 
@@ -220,14 +322,43 @@ while True:
     conversation.append({"role": "user", "content": user_input})
 
     response = openai.ChatCompletion.create(
-        engine="gpt-3.5-turbo", # The deployment name you chose when you deployed the GPT-35-turbo or GPT-4 model.
+        engine="gpt-35-turbo", # The deployment name you chose when you deployed the GPT-35-turbo or GPT-4 model.
         messages=conversation
     )
 
     conversation.append({"role": "assistant", "content": response["choices"][0]["message"]["content"]})
     print("\n" + response['choices'][0]['message']['content'] + "\n")
 ```
 
+# [OpenAI Python 1.0](#tab/python-new)
+
+```python
+import os
+from openai import AzureOpenAI
+
+client = AzureOpenAI(
+  api_key = os.getenv("AZURE_OPENAI_KEY"),  
+  api_version = "2023-05-15",
+  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")  # Your Azure OpenAI resource's endpoint value.
+)
+
+conversation=[{"role": "system", "content": "You are a helpful assistant."}]
+
+while True:
+    user_input = input("Q:")      
+    conversation.append({"role": "user", "content": user_input})
+
+    response = client.chat.completions.create(
+        model="gpt-35-turbo", # model = "deployment_name".
+        messages=conversation
+    )
+
+    conversation.append({"role": "assistant", "content": response.choices[0].message.content})
+    print("\n" + response.choices[0].message.content + "\n")
+```
+
+---
+
 When you run the code above you will get a blank console window. Enter your first question in the window and then hit enter. Once the response is returned, you can repeat the process and keep asking questions.
 
 ## Managing conversations
@@ -241,7 +372,9 @@ It's your responsibility to ensure the prompt and completion falls within the to
 
 The following code sample shows a simple chat loop example with a technique for handling a 4096 token count using OpenAI's tiktoken library.
 
-The code requires tiktoken `0.3.0`. If you have an older version run `pip install tiktoken --upgrade`.
+The code uses tiktoken `0.5.1`. If you have an older version run `pip install tiktoken --upgrade`.
+
+# [OpenAI Python 0.28.1](#tab/python)
 
 ```python
 import tiktoken
@@ -250,8 +383,8 @@ import os
 
 openai.api_type = "azure"
 openai.api_version = "2023-05-15" 
-openai.api_base = os.getenv("OPENAI_API_BASE")  # Your Azure OpenAI resource's endpoint value.
-openai.api_key = os.getenv("OPENAI_API_KEY")
+openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")  # Your Azure OpenAI resource's endpoint value.
+openai.api_key = os.getenv("AZURE_OPENAI_KEY")
 
 system_message = {"role": "system", "content": "You are a helpful assistant."}
 max_response_tokens = 250
@@ -319,6 +452,87 @@ while True:
     print("\n" + response['choices'][0]['message']['content'] + "\n")
 ```
 
+# [OpenAI Python 1.0](#tab/python-new)
+
+```python
+import tiktoken
+import os
+from openai import AzureOpenAI
+
+client = AzureOpenAI(
+  api_key = os.getenv("AZURE_OPENAI_KEY"),  
+  api_version = "2023-05-15",
+  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")  # Your Azure OpenAI resource's endpoint value.
+)
+
+system_message = {"role": "system", "content": "You are a helpful assistant."}
+max_response_tokens = 250
+token_limit = 4096
+conversation = []
+conversation.append(system_message)
+
+def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
+    """Return the number of tokens used by a list of messages."""
+    try:
+        encoding = tiktoken.encoding_for_model(model)
+    except KeyError:
+        print("Warning: model not found. Using cl100k_base encoding.")
+        encoding = tiktoken.get_encoding("cl100k_base")
+    if model in {
+        "gpt-3.5-turbo-0613",
+        "gpt-3.5-turbo-16k-0613",
+        "gpt-4-0314",
+        "gpt-4-32k-0314",
+        "gpt-4-0613",
+        "gpt-4-32k-0613",
+        }:
+        tokens_per_message = 3
+        tokens_per_name = 1
+    elif model == "gpt-3.5-turbo-0301":
+        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
+        tokens_per_name = -1  # if there's a name, the role is omitted
+    elif "gpt-3.5-turbo" in model:
+        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
+        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
+    elif "gpt-4" in model:
+        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
+        return num_tokens_from_messages(messages, model="gpt-4-0613")
+    else:
+        raise NotImplementedError(
+            f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens."""
+        )
+    num_tokens = 0
+    for message in messages:
+        num_tokens += tokens_per_message
+        for key, value in message.items():
+            num_tokens += len(encoding.encode(value))
+            if key == "name":
+                num_tokens += tokens_per_name
+    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
+    return num_tokens
+while True:
+    user_input = input("Q:")      
+    conversation.append({"role": "user", "content": user_input})
+    conv_history_tokens = num_tokens_from_messages(conversation)
+
+    while conv_history_tokens + max_response_tokens >= token_limit:
+        del conversation[1] 
+        conv_history_tokens = num_tokens_from_messages(conversation)
+
+    response = client.chat.completions.create(
+        model="gpt-35-turbo", # model = "deployment_name".
+        messages=conversation,
+        temperature=0.7,
+        max_tokens=max_response_tokens
+    )
+
+
+    conversation.append({"role": "assistant", "content": response.choices[0].message.content})
+    print("\n" + response.choices[0].message.content + "\n")
+```
+
+---
+
 In this example, once the token count is reached, the oldest messages in the conversation transcript will be removed. `del` is used instead of `pop()` for efficiency, and we start at index 1 so as to always preserve the system message and only remove user/assistant messages. Over time, this method of managing the conversation can cause the conversation quality to degrade as the model will gradually lose context of the earlier portions of the conversation.
 
 An alternative approach is to limit the conversation duration to the max token length or a certain number of turns. Once the max token limit is reached and the model would lose context if you were to allow the conversation to continue, you can prompt the user that they need to begin a new conversation and clear the messages list to start a brand new conversation with the full token limit available.