Skip to content

Commit 00929aa

Browse files
authored
Merge pull request #228928 from MicrosoftDocs/release-azure-openai-chatgpt
Release azure openai chatgpt--scheduled release at 8AM of 3/09
2 parents 6f5de08 + 7a8aff5 commit 00929aa

17 files changed

+779
-13
lines changed
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
title: 'Quickstart - Using the ChatGPT API'
3+
titleSuffix: Azure OpenAI Service
4+
description: Walkthrough on how to get started with Azure OpenAI Service ChatGPT API.
5+
services: cognitive-services
6+
manager: nitinme
7+
ms.service: cognitive-services
8+
ms.subservice: openai
9+
ms.topic: quickstart
10+
author: mrbullwinkle
11+
ms.author: mbullwin
12+
ms.date: 02/07/2023
13+
zone_pivot_groups: openai-quickstart
14+
recommendations: false
15+
---
16+
17+
# Quickstart: Get started using ChatGPT with Azure OpenAI Service
18+
19+
Use this article to get started using Azure OpenAI.
20+
21+
::: zone pivot="programming-language-studio"
22+
23+
[!INCLUDE [Studio quickstart](includes/chatgpt-studio.md)]
24+
25+
::: zone-end
26+
27+
::: zone pivot="programming-language-python"
28+
29+
[!INCLUDE [Python SDK quickstart](includes/chatgpt-python.md)]
30+
31+
::: zone-end
32+
33+
::: zone pivot="rest-api"
34+
35+
[!INCLUDE [REST API quickstart](includes/chatgpt-rest.md)]
36+
37+
::: zone-end

articles/cognitive-services/openai/concepts/models.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ titleSuffix: Azure OpenAI
44
description: Learn about the different model capabilities that are available with Azure OpenAI.
55
ms.service: cognitive-services
66
ms.topic: conceptual
7-
ms.date: 02/13/2023
7+
ms.date: 03/01/2023
88
ms.custom: event-tier1-build-2022, references_regions
99
manager: nitinme
1010
author: ChrisHMSFT
@@ -19,7 +19,7 @@ Azure OpenAI provides access to many different models, grouped by family and cap
1919

2020
| Model family | Description |
2121
|--|--|
22-
| [GPT-3](#gpt-3-models) | A series of models that can understand and generate natural language. |
22+
| [GPT-3](#gpt-3-models) | A series of models that can understand and generate natural language. This includes the new [ChatGPT model](#chatgpt-gpt-35-turbo). |
2323
| [Codex](#codex-models) | A series of models that can understand and generate code, including translating natural language to code. |
2424
| [Embeddings](#embeddings-models) | A set of models that can understand and use embeddings. An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Currently, we offer three families of Embeddings models for different functionalities: similarity, text search, and code search. |
2525

@@ -92,6 +92,12 @@ Ada is usually the fastest model and can perform tasks like parsing text, addres
9292

9393
**Use for**: Parsing text, simple classification, address correction, keywords
9494

95+
### ChatGPT (gpt-35-turbo)
96+
97+
The ChatGPT model (gpt-35-turbo) is a language model designed for conversational interfaces and the model behaves differently than previous GPT-3 models. Previous models were text-in and text-out, meaning they accepted a prompt string and returned a completion to append to the prompt. However, the ChatGPT model is conversation-in and message-out. The model expects a prompt string formatted in a specific chat-like transcript format, and returns a completion that represents a model-written message in the chat.
98+
99+
The ChatGPT model uses the same completion API that you use for other models like text-davinci-002, but it requires a unique prompt format. It's important to use the new prompt format to get the best results. Without the right prompts, the model tends to be verbose and provides less useful responses. To learn more check out our [in-depth how-to](../how-to/chatgpt.md).
100+
95101
## Codex models
96102

97103
The Codex models are descendants of our base GPT-3 models that can understand and generate code. Their training data contains both natural language and billions of lines of public code from GitHub.
@@ -168,6 +174,7 @@ When using our embeddings models, keep in mind their limitations and risks.
168174
| text-davinci-002 | Yes | No | East US, South Central US, West Europe | N/A |
169175
| text-davinci-003 | Yes | No | East US | N/A |
170176
| text-davinci-fine-tune-002<sup>1</sup> | Yes | No | N/A | East US, West Europe |
177+
| gpt-35-turbo (ChatGPT) | Yes | No | N/A | East US, South Central US |
171178

172179
<sup>1</sup> The model is available by request only. Currently we aren't accepting new requests to use the model.
173180
<br><sup>2</sup> East US is currently unavailable for new customers to fine-tune due to high demand. Please use US South Central region for US based training.

articles/cognitive-services/openai/concepts/understand-embeddings.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,4 +34,4 @@ Azure OpenAI embeddings rely on cosine similarity to compute similarity between
3434

3535
## Next steps
3636

37-
Learn more about using Azure OpenAI and embeddings to perform document search with our [embeddings tutorial](../tutorials/embeddings.md).
37+
Learn more about using Azure OpenAI and embeddings to perform document search with our [embeddings tutorial](../tutorials/embeddings.md).
Lines changed: 296 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,296 @@
1+
---
2+
title: How to work with the Chat Markup Language (preview)
3+
titleSuffix: Azure OpenAI
4+
description: Learn how to work with Chat Markup Language (preview)
5+
author: dereklegenzoff
6+
ms.author: delegenz
7+
ms.service: cognitive-services
8+
ms.topic: conceptual
9+
ms.date: 03/09/2023
10+
manager: nitinme
11+
keywords: ChatGPT
12+
---
13+
14+
# Learn how to work with Chat Markup Language (preview)
15+
16+
The ChatGPT model (`gpt-35-turbo`) is a language model designed for conversational interfaces and the model behaves differently than previous GPT-3 models. Previous models were text-in and text-out, meaning they accepted a prompt string and returned a completion to append to the prompt. However, the ChatGPT model is conversation-in and message-out. The model expects a prompt string formatted in a specific chat-like transcript format, and returns a completion that represents a model-written message in the chat. While the prompt format was designed specifically for multi-turn conversations, you'll find it can also work well for non-chat scenarios too.
17+
18+
The ChatGPT model can be used with the same [completion API](/azure/cognitive-services/openai/reference#completions) that you use for other models like text-davinci-002, but it requires a unique prompt format known as Chat Markup Language (ChatML). It's important to use the new prompt format to get the best results. Without the right prompts, the model tends to be verbose and provides less useful responses.
19+
20+
## Working with the ChatGPT model
21+
22+
The following code snippet shows the most basic way to use the ChatGPT model. We also have a UI driven experience that you can learn about in the [ChatGPT Quickstart](../chatgpt-quickstart.md).
23+
24+
```python
25+
import os
26+
import openai
27+
openai.api_type = "azure"
28+
openai.api_base = "https://{your-resource-name}.openai.azure.com/"
29+
openai.api_version = "2022-12-01"
30+
openai.api_key = os.getenv("OPENAI_API_KEY")
31+
32+
response = openai.Completion.create(
33+
engine="gpt-35-turbo",
34+
prompt="<|im_start|>system\nAssistant is a large language model trained by OpenAI.\n<|im_end|>\n<|im_start|>user\nWhat's the difference between garbanzo beans and chickpeas?\n<|im_end|>\n<|im_start|>assistant\n",
35+
temperature=0,
36+
max_tokens=500,
37+
top_p=0.5,
38+
stop=["<|im_end|>"])
39+
40+
print(response['choices'][0]['text'])
41+
```
42+
> [!NOTE]
43+
> The following parameters aren't available with the gpt-35-turbo model: `logprobs`, `best_of`, and `echo`. If you set any of these parameters to a value other than their default, you'll get an error.
44+
45+
The `<|im_end|>` token indicates the end of a message. We recommend including `<|im_end|>` token as a stop sequence to ensure that the model stops generating text when it reaches the end of the message. You can read more about the special tokens in the [Chat Markup Language (ChatML)](#chatml) section.
46+
47+
Consider setting `max_tokens` to a slightly higher value than normal such as 300 or 500. This ensures that the model doesn't stop generating text before it reaches the end of the message.
48+
49+
## Model versioning
50+
51+
> [!NOTE]
52+
> `gpt-35-turbo` is equivalent to the `gpt-3.5-turbo` model from OpenAI.
53+
54+
Unlike previous GPT-3 and GPT-3.5 models, the `gpt-35-turbo` model will continue to be updated. When creating a [deployment](./create-resource.md#deploy-a-model) of `gpt-35-turbo`, you'll also need to specify a model version.
55+
56+
Currently, only version `"0301"` is available. This is equivalent to the `gpt-3.5-turbo-0301` model from OpenAI. We'll continue to make updated versions available in the future. You can find model deprecation times on our [models](../concepts/models.md) page.
57+
58+
One thing that's important to note is that Chat Markup Language (ChatML) will continue to evolve with the new versions of the model. You may need to make updates to your prompts when you upgrade to a new version of the model.
59+
60+
<a id="chatml"></a>
61+
62+
## Working with Chat Markup Language (ChatML)
63+
64+
> [!NOTE]
65+
> OpenAI continues to improve the `gpt-35-turbo` model and the Chat Markup Language used with the model will continue to evolve in the future. We'll keep this document updated with the latest information.
66+
67+
OpenAI trained the gpt-35-turbo model on special tokens that delineate the different parts of the prompt. The prompt starts with a system message that is used to prime the model followed by a series of messages between the user and the assistant.
68+
69+
The format of a basic ChatML prompt is as follows:
70+
71+
```
72+
<|im_start|>system
73+
Provide some context and/or instructions to the model.
74+
<|im_end|>
75+
<|im_start|>user
76+
The user’s message goes here
77+
<|im_end|>
78+
<|im_start|>assistant
79+
```
80+
81+
### System message
82+
83+
The system message is included at the beginning of the prompt between the `<|im_start|>system` and `<|im_end|>` tokens. This message provides the initial instructions to the model. You can provide various information in the system message including:
84+
85+
* A brief description of the assistant
86+
* Personality traits of the assistant
87+
* Instructions or rules you would like the instructions to follow
88+
* Data or information needed for the model, such as relevant questions from an FAQ
89+
90+
You can customize the system message for your use case or just include a basic system message. The system message is optional, but it's recommended to at least include a basic one to get the best results.
91+
92+
### Messages
93+
94+
After the system message, you can include a series of messages between the **user** and the **assistant**. Each message should begin with the `<|im_start|>` token followed by the role (`user` or `assistant`) and end with the `<|im_end|>` token.
95+
96+
```
97+
<|im_start|>user
98+
What is thermodynamics?
99+
<|im_end|>
100+
```
101+
102+
To trigger a response from the model, the prompt should end with `<|im_start|>assistant` token indicating that it's the assistant's turn to respond. You can also include messages between the user and the assistant in the prompt as a way to do few shot learning.
103+
104+
### Prompt examples
105+
106+
The following section shows examples of different styles of prompts that you could use with the ChatGPT model. These examples are just a starting point, and you can experiment with different prompts to customize the behavior for your own use cases.
107+
108+
#### Basic example
109+
110+
If you want the ChatGPT model to behave similarly to [chat.openai.com](https://chat.openai.com/), you can use a basic system message like "Assistant is a large language model trained by OpenAI."
111+
112+
```
113+
<|im_start|>system
114+
Assistant is a large language model trained by OpenAI.
115+
<|im_end|>
116+
<|im_start|>user
117+
What's the difference between garbanzo beans and chickpeas?
118+
<|im_end|>
119+
<|im_start|>assistant
120+
```
121+
122+
#### Example with instructions
123+
124+
For some scenarios, you may want to give additional instructions to the model to define guardrails for what the model is able to do.
125+
126+
```
127+
<|im_start|>system
128+
Assistant is an intelligent chatbot designed to help users answer their tax related questions.
129+
130+
Instructions:
131+
- Only answer questions related to taxes.
132+
- If you're unsure of an answer, you can say "I don't know" or "I'm not sure" and recommend users go to the IRS website for more information.
133+
<|im_end|>
134+
<|im_start|>user
135+
When are my taxes due?
136+
<|im_end|>
137+
<|im_start|>assistant
138+
```
139+
140+
#### Using data for grounding
141+
142+
You can also include relevant data or information in the system message to give the model extra context for the conversation. If you only need to include a small amount of information, you can hard code it in the system message. If you have a large amount of data that the model should be aware of, you can use [embeddings](/azure/cognitive-services/openai/tutorials/embeddings?tabs=command-line) or a product like [Azure Cognitive Search](https://azure.microsoft.com/services/search/) to retrieve the most relevant information at query time.
143+
144+
```
145+
<|im_start|>system
146+
Assistant is an intelligent chatbot designed to help users answer technical questions about Azure OpenAI Serivce. Only answer questions using the context below and if you're not sure of an answer, you can say "I don't know".
147+
148+
Context:
149+
- Azure OpenAI Service provides REST API access to OpenAI's powerful language models including the GPT-3, Codex and Embeddings model series.
150+
- Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-3, Codex, and DALL-E models with the security and enterprise promise of Azure. Azure OpenAI co-develops the APIs with OpenAI, ensuring compatibility and a smooth transition from one to the other.
151+
- At Microsoft, we're committed to the advancement of AI driven by principles that put people first. Microsoft has made significant investments to help guard against abuse and unintended harm, which includes requiring applicants to show well-defined use cases, incorporating Microsoft’s principles for responsible AI use
152+
<|im_end|>
153+
<|im_start|>user
154+
What is Azure OpenAI Service?
155+
<|im_end|>
156+
<|im_start|>assistant
157+
```
158+
159+
#### Few shot learning with ChatML
160+
161+
You can also give few shot examples to the model. The approach for few shot learning has changed slightly because of the new prompt format. You can now include a series of messages between the user and the assistant in the prompt as few shot examples. These examples can be used to seed answers to common questions to prime the model or teach particular behaviors to the model.
162+
163+
This is only one example of how you can use few shot learning with ChatGPT. You can experiment with different approaches to see what works best for your use case.
164+
165+
```
166+
<|im_start|>system
167+
Assistant is an intelligent chatbot designed to help users answer their tax related questions.
168+
<|im_end|>
169+
<|im_start|>user
170+
When do I need to file my taxes by?
171+
<|im_end|>
172+
<|im_start|>assistant
173+
In 2023, you will need to file your taxes by April 18th. The date falls after the usual April 15th deadline because April 15th falls on a Saturday in 2023. For more details, see https://www.irs.gov/filing/individuals/when-to-file
174+
<|im_end|>
175+
<|im_start|>user
176+
How can I check the status of my tax refund?
177+
<|im_end|>
178+
<|im_start|>assistant
179+
You can check the status of your tax refund by visiting https://www.irs.gov/refunds
180+
<|im_end|>
181+
```
182+
183+
#### Using Chat Markup Language for non-chat scenarios
184+
185+
ChatML is designed to make multi-turn conversations easier to manage, but it also works well for non-chat scenarios.
186+
187+
For example, for an entity extraction scenario, you might use the following prompt:
188+
189+
```
190+
<|im_start|>system
191+
You are an assistant designed to extract entities from text. Users will paste in a string of text and you will respond with entities you've extracted from the text as a JSON object. Here's an example of your output format:
192+
{
193+
"name": "",
194+
"company": "",
195+
"phone_number": ""
196+
}
197+
<|im_end|>
198+
<|im_start|>user
199+
Hello. My name is Robert Smith. I’m calling from Contoso Insurance, Delaware. My colleague mentioned that you are interested in learning about our comprehensive benefits policy. Could you give me a call back at (555) 346-9322 when you get a chance so we can go over the benefits?
200+
<|im_end|>
201+
<|im_start|>assistant
202+
```
203+
204+
205+
## Preventing unsafe user inputs
206+
207+
It's important to add mitigations into your application to ensure safe use of the Chat Markup Language.
208+
209+
We recommend that you prevent end-users from being able to include special tokens in their input such as `<|im_start|>` and `<|im_end|>`. We also recommend that you include additional validation to ensure the prompts you're sending to the model are well formed and follow the Chat Markup Language format as described in this document.
210+
211+
You can also provide instructions in the system message to guide the model on how to respond to certain types of user inputs. For example, you can instruct the model to only reply to messages about a certain subject. You can also reinforce this behavior with few shot examples.
212+
213+
214+
## Managing conversations with ChatGPT
215+
216+
The token limit for `gpt-35-turbo` is 4096 tokens. This limit includes the token count from both the prompt and completion. The number of tokens in the prompt combined with the value of the `max_tokens` parameter must stay under 4096 or you'll receive an error.
217+
218+
It’s your responsibility to ensure the prompt and completion falls within the token limit. This means that for longer conversations, you need to keep track of the token count and only send the model a prompt that falls within the token limit.
219+
220+
The following code sample shows a simple example of how you could keep track of the separate messages in the conversation.
221+
222+
```python
223+
import os
224+
import openai
225+
openai.api_type = "azure"
226+
openai.api_base = "https://{your-resource-name}.openai.azure.com/"
227+
openai.api_version = "2022-12-01"
228+
openai.api_key = os.getenv('api_key')
229+
230+
# defining a function to create the prompt from the system message and the conversation messages
231+
def create_prompt(system_message, messages):
232+
prompt = system_message
233+
for message in messages:
234+
prompt += f"\n<|im_start|>{message['sender']}\n{message['text']}\n<|im_end|>"
235+
prompt += "\n<|im_start|>assistant\n"
236+
return prompt
237+
238+
# defining the user input and the system message
239+
user_input = "<your user input>"
240+
system_message = f"<|im_start|>system\n{'<your system message>'}\n<|im_end|>"
241+
242+
# creating a list of messages to track the conversation
243+
messages = [{"sender": "user", "text": user_input}]
244+
245+
response = openai.Completion.create(
246+
engine="gpt-35-turbo",
247+
prompt=create_prompt(system_message, messages),
248+
temperature=0.5,
249+
max_tokens=250,
250+
top_p=0.9,
251+
frequency_penalty=0,
252+
presence_penalty=0,
253+
stop=['<|im_end|>']
254+
)
255+
256+
messages.append({"sender": "assistant", "text": response['choices'][0]['text']})
257+
print(response['choices'][0]['text'])
258+
```
259+
260+
## Staying under the token limit
261+
262+
The simplest approach to staying under the token limit is to truncate the oldest messages in the conversation when you reach the token limit.
263+
264+
You can choose to always include as many tokens as possible while staying under the limit or you could always include a set number of previous messages assuming those messages stay within the limit. It's important to keep in mind that longer prompts take longer to generate a response and incur a higher cost than shorter prompts.
265+
266+
You can estimate the number of tokens in a string by using the [tiktoken](https://github.com/openai/tiktoken) Python library as shown below.
267+
268+
```python
269+
import tiktoken
270+
271+
cl100k_base = tiktoken.get_encoding("cl100k_base")
272+
273+
enc = tiktoken.Encoding(
274+
name="gpt-35-turbo",
275+
pat_str=cl100k_base._pat_str,
276+
mergeable_ranks=cl100k_base._mergeable_ranks,
277+
special_tokens={
278+
**cl100k_base._special_tokens,
279+
"<|im_start|>": 100264,
280+
"<|im_end|>": 100265
281+
}
282+
)
283+
284+
tokens = enc.encode(
285+
"<|im_start|>user\nHello<|im_end|><|im_start|>assistant",
286+
allowed_special={"<|im_start|>", "<|im_end|>"}
287+
)
288+
289+
assert len(tokens) == 7
290+
assert tokens == [100264, 882, 198, 9906, 100265, 100264, 78191]
291+
```
292+
293+
## Next steps
294+
295+
* [Learn more about Azure OpenAI](../overview.md).
296+
* Get started with the ChatGPT model with [the ChatGPT quickstart](../chatgpt-quickstart.md).

0 commit comments

Comments
 (0)