You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/includes/fine-tuning-openai-in-ai-studio.md
-54Lines changed: 0 additions & 54 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,8 +28,6 @@ ms.custom: include, build-2024
28
28
29
29
The following models support fine-tuning:
30
30
31
-
-`babbage-002`
32
-
-`davinci-002`
33
31
-`gpt-35-turbo` (0613)
34
32
-`gpt-35-turbo` (1106)
35
33
-`gpt-35-turbo` (0125)
@@ -64,10 +62,6 @@ Take a moment to review the fine-tuning workflow for using Azure AI Foundry:
64
62
65
63
Your training data and validation data sets consist of input and output examples for how you would like the model to perform.
66
64
67
-
Different model types require a different format of training data.
68
-
69
-
# [chat completion models](#tab/turbo)
70
-
71
65
The training and validation data you use **must** be formatted as a JSON Lines (JSONL) document. For `gpt-35-turbo-0613` the fine-tuning dataset must be formatted in the conversational format that is used by the [Chat completions](../how-to/chatgpt.md) API.
72
66
73
67
If you would like a step-by-step walk-through of fine-tuning a `gpt-35-turbo-0613` model please refer to the [Azure OpenAI fine-tuning tutorial.](../tutorials/fine-tune.md)
@@ -104,54 +98,6 @@ The more training examples you have, the better. Fine tuning jobs will not proce
104
98
105
99
In general, doubling the dataset size can lead to a linear increase in model quality. But keep in mind, low quality examples can negatively impact performance. If you train the model on a large amount of internal data, without first pruning the dataset for only the highest quality examples you could end up with a model that performs much worse than expected.
The training and validation data you use **must** be formatted as a JSON Lines (JSONL) document in which each line represents a single prompt-completion pair. The OpenAI command-line interface (CLI) includes [a data preparation tool](#openai-cli-data-preparation-tool) that validates, gives suggestions, and reformats your training data into a JSONL file ready for fine-tuning.
In addition to the JSONL format, training and validation data files must be encoded in UTF-8 and include a byte-order mark (BOM). The file must be less than 512 MB in size.
118
-
119
-
### Create your training and validation datasets
120
-
121
-
Designing your prompts and completions for fine-tuning is different from designing your prompts for use with any of [our GPT-3 base models](../concepts/legacy-models.md#gpt-3-models). Prompts for completion calls often use either detailed instructions or few-shot learning techniques, and consist of multiple examples. For fine-tuning, each training example should consist of a single input prompt and its desired completion output. You don't need to give detailed instructions or multiple completion examples for the same prompt.
122
-
123
-
The more training examples you have, the better. The minimum number of training examples is 10, but such a small number of examples is often not enough to noticeably influence model responses. OpenAI states it's best practice to have at least 50 high quality training examples. However, it is entirely possible to have a use case that might require 1,000's of high quality training examples to be successful.
124
-
125
-
In general, doubling the dataset size can lead to a linear increase in model quality. But keep in mind, low quality examples can negatively impact performance. If you train the model on a large amount of internal data, without first pruning the dataset for only the highest quality examples you could end up with a model that performs much worse than expected.
126
-
127
-
### OpenAI CLI data preparation tool
128
-
129
-
OpenAI's CLI data preparation tool was developed for the previous generation of fine-tuning models to assist with many of the data preparation steps. This tool will only work for data preparation for models that work with the completion API like `babbage-002` and `davinci-002`. The tool validates, gives suggestions, and reformats your data into a JSONL file ready for fine-tuning.
130
-
131
-
To install the OpenAI CLI, run the following Python command:
132
-
133
-
```console
134
-
pip install openai==0.28.1
135
-
```
136
-
137
-
To analyze your training data with the data preparation tool, run the following Python command. Replace the _\<LOCAL_FILE>_ argument with the full path and file name of the training data file to analyze:
This tool accepts files in the following data formats, if they contain a prompt and a completion column/key:
144
-
145
-
- Comma-separated values (CSV)
146
-
- Tab-separated values (TSV)
147
-
- Microsoft Excel workbook (XLSX)
148
-
- JavaScript Object Notation (JSON)
149
-
- JSON Lines (JSONL)
150
-
151
-
After it guides you through the process of implementing suggested changes, the tool reformats your training data and saves output into a JSONL file ready for fine-tuning.
152
-
153
-
---
154
-
155
101
## Create your fine-tuned model
156
102
157
103
To fine-tune an Azure OpenAI model in an existing Azure AI Foundry project, follow these steps:
Copy file name to clipboardExpand all lines: articles/ai-services/openai/includes/fine-tuning-python.md
+3-57Lines changed: 3 additions & 57 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,8 +26,6 @@ ms.author: mbullwin
26
26
27
27
The following models support fine-tuning:
28
28
29
-
-`babbage-002`
30
-
-`davinci-002`
31
29
-`gpt-35-turbo` (0613)
32
30
-`gpt-35-turbo` (1106)
33
31
-`gpt-35-turbo` (0125)
@@ -60,10 +58,6 @@ Take a moment to review the fine-tuning workflow for using the Python SDK with A
60
58
61
59
Your training data and validation data sets consist of input and output examples for how you would like the model to perform.
62
60
63
-
Different model types require a different format of training data.
64
-
65
-
# [chat completion models](#tab/turbo)
66
-
67
61
The training and validation data you use **must** be formatted as a JSON Lines (JSONL) document. For `gpt-35-turbo-0613` the fine-tuning dataset must be formatted in the conversational format that is used by the [Chat completions](../how-to/chatgpt.md) API.
68
62
69
63
If you would like a step-by-step walk-through of fine-tuning a `gpt-35-turbo-0613` please refer to the [Azure OpenAI fine-tuning tutorial](../tutorials/fine-tune.md)
@@ -100,54 +94,6 @@ The more training examples you have, the better. Fine tuning jobs will not proce
100
94
101
95
In general, doubling the dataset size can lead to a linear increase in model quality. But keep in mind, low quality examples can negatively impact performance. If you train the model on a large amount of internal data, without first pruning the dataset for only the highest quality examples you could end up with a model that performs much worse than expected.
The training and validation data you use **must** be formatted as a JSON Lines (JSONL) document in which each line represents a single prompt-completion pair. The OpenAI command-line interface (CLI) includes [a data preparation tool](#openai-cli-data-preparation-tool) that validates, gives suggestions, and reformats your training data into a JSONL file ready for fine-tuning.
In addition to the JSONL format, training and validation data files must be encoded in UTF-8 and include a byte-order mark (BOM). The file must be less than 512 MB in size.
114
-
115
-
### Create your training and validation datasets
116
-
117
-
Designing your prompts and completions for fine-tuning is different from designing your prompts for use with any of [our GPT-3 base models](../concepts/legacy-models.md#gpt-3-models). Prompts for completion calls often use either detailed instructions or few-shot learning techniques, and consist of multiple examples. For fine-tuning, each training example should consist of a single input prompt and its desired completion output. You don't need to give detailed instructions or multiple completion examples for the same prompt.
118
-
119
-
The more training examples you have, the better. Fine tuning jobs will not proceed without at least 10 training examples, but such a small number is not enough to noticeably influence model responses. It is best practice to provide hundreds, if not thousands, of training examples to be successful.
120
-
121
-
In general, doubling the dataset size can lead to a linear increase in model quality. But keep in mind, low quality examples can negatively impact performance. If you train the model on a large amount of internal data, without first pruning the dataset for only the highest quality examples you could end up with a model that performs much worse than expected.
122
-
123
-
### OpenAI CLI data preparation tool
124
-
125
-
OpenAI's CLI data preparation tool was developed for the previous generation of fine-tuning models to assist with many of the data preparation steps. This tool will only work for data preparation for models that work with the completion API like `babbage-002` and `davinci-002`. The tool validates, gives suggestions, and reformats your data into a JSONL file ready for fine-tuning.
126
-
127
-
To install the OpenAI CLI, run the following Python command:
128
-
129
-
```console
130
-
pip install openai==0.28.1
131
-
```
132
-
133
-
To analyze your training data with the data preparation tool, run the following Python command. Replace the _\<LOCAL_FILE>_ argument with the full path and file name of the training data file to analyze:
This tool accepts files in the following data formats, if they contain a prompt and a completion column/key:
140
-
141
-
- Comma-separated values (CSV)
142
-
- Tab-separated values (TSV)
143
-
- Microsoft Excel workbook (XLSX)
144
-
- JavaScript Object Notation (JSON)
145
-
- JSON Lines (JSONL)
146
-
147
-
After it guides you through the process of implementing suggested changes, the tool reformats your training data and saves output into a JSONL file ready for fine-tuning.
148
-
149
-
---
150
-
151
97
## Upload your training data
152
98
153
99
The next step is to either choose existing prepared training data or upload new prepared training data to use when customizing your model. After you prepare your training data, you can upload your files to the service. There are two ways to upload training data:
api_version="2024-02-01"# This API version or later is required to access fine-tuning for turbo/babbage-002/davinci-002
251
+
api_version="2024-02-01"# This API version or later is required
306
252
)
307
253
308
254
client.fine_tuning.jobs.create(
@@ -580,7 +526,7 @@ az cognitiveservices account deployment create
580
526
581
527
## Use a deployed customized model
582
528
583
-
After your custom model deploys, you can use it like any other deployed model. You can use the **Playgrounds** in [Azure AI Foundry](https://ai.azure.com) to experiment with your new deployment. You can continue to use the same parameters with your custom model, such as `temperature` and `max_tokens`, as you can with other deployed models. For fine-tuned `babbage-002` and `davinci-002` models you will use the Completions playground and the Completions API. For fine-tuned `gpt-35-turbo-0613` models you will use the Chat playground and the Chat completion API.
529
+
After your custom model deploys, you can use it like any other deployed model. You can use the **Chat Playground** in [Azure AI Foundry](https://ai.azure.com) to experiment with your new deployment. You can continue to use the same parameters with your custom model, such as `temperature` and `max_tokens`, as you can with other deployed models.
Copy file name to clipboardExpand all lines: articles/ai-services/openai/includes/fine-tuning-rest.md
+1-66Lines changed: 1 addition & 66 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,8 +25,6 @@ ms.author: mbullwin
25
25
26
26
The following models support fine-tuning:
27
27
28
-
-`babbage-002`
29
-
-`davinci-002`
30
28
-`gpt-35-turbo` (0613)
31
29
-`gpt-35-turbo` (1106)
32
30
-`gpt-35-turbo` (0125)
@@ -59,10 +57,6 @@ Take a moment to review the fine-tuning workflow for using the REST APIS and Pyt
59
57
60
58
Your training data and validation data sets consist of input and output examples for how you would like the model to perform.
61
59
62
-
Different model types require a different format of training data.
63
-
64
-
# [chat completion models](#tab/turbo)
65
-
66
60
The training and validation data you use **must** be formatted as a JSON Lines (JSONL) document. For `gpt-35-turbo-0613` and other related models, the fine-tuning dataset must be formatted in the conversational format that is used by the [Chat completions](../how-to/chatgpt.md) API.
67
61
68
62
If you would like a step-by-step walk-through of fine-tuning a `gpt-35-turbo-0613` please refer to the [Azure OpenAI fine-tuning tutorial.](../tutorials/fine-tune.md)
@@ -100,71 +94,12 @@ The more training examples you have, the better. Fine tuning jobs will not proce
100
94
101
95
In general, doubling the dataset size can lead to a linear increase in model quality. But keep in mind, low quality examples can negatively impact performance. If you train the model on a large amount of internal data without first pruning the dataset for only the highest quality examples, you could end up with a model that performs much worse than expected.
The training and validation data you use **must** be formatted as a JSON Lines (JSONL) document in which each line represents a single prompt-completion pair. The OpenAI command-line interface (CLI) includes [a data preparation tool](#openai-cli-data-preparation-tool) that validates, gives suggestions, and reformats your training data into a JSONL file ready for fine-tuning.
In addition to the JSONL format, training and validation data files must be encoded in UTF-8 and include a byte-order mark (BOM). The file must be less than 512 MB in size.
114
-
115
-
### Create your training and validation datasets
116
-
117
-
Designing your prompts and completions for fine-tuning is different from designing your prompts for use with any of [our GPT-3 base models](../concepts/legacy-models.md#gpt-3-models). Prompts for completion calls often use either detailed instructions or few-shot learning techniques, and consist of multiple examples. For fine-tuning, each training example should consist of a single input prompt and its desired completion output. You don't need to give detailed instructions or multiple completion examples for the same prompt.
118
-
119
-
The more training examples you have, the better. Fine tuning jobs will not proceed without at least 10 training examples, but such a small number is not enough to noticeably influence model responses. It is best practice to provide hundreds, if not thousands, of training examples to be successful.
120
-
121
-
In general, doubling the dataset size can lead to a linear increase in model quality. But keep in mind, low quality examples can negatively impact performance. If you train the model on a large amount of internal data without first pruning the dataset for only the highest quality examples, you could end up with a model that performs much worse than expected.
122
-
123
-
### OpenAI CLI data preparation tool
124
-
125
-
OpenAI's CLI data preparation tool was developed for the previous generation of fine-tuning models to assist with many of the data preparation steps. This tool will only work for data preparation for models that work with the completion API like `babbage-002` and `davinci-002`. The tool validates, gives suggestions, and reformats your data into a JSONL file ready for fine-tuning.
126
-
127
-
To install the OpenAI CLI, run the following Python command:
128
-
129
-
```console
130
-
pip install openai==0.28.1
131
-
```
132
-
133
-
To analyze your training data with the data preparation tool, run the following Python command. Replace the _\<LOCAL_FILE>_ argument with the full path and file name of the training data file to analyze:
This tool accepts files in the following data formats, if they contain a prompt and a completion column/key:
140
-
141
-
- Comma-separated values (CSV)
142
-
- Tab-separated values (TSV)
143
-
- Microsoft Excel workbook (XLSX)
144
-
- JavaScript Object Notation (JSON)
145
-
- JSON Lines (JSONL)
146
-
147
-
After it guides you through the process of implementing suggested changes, the tool reformats your training data and saves output into a JSONL file ready for fine-tuning.
148
-
149
-
---
150
-
151
97
### Select the base model
152
98
153
99
The first step in creating a custom model is to choose a base model. The **Base model** pane lets you choose a base model to use for your custom model. Your choice influences both the performance and the cost of your model.
154
100
155
101
Select the base model from the **Base model type** dropdown, and then select **Next** to continue.
156
102
157
-
You can create a custom model from one of the following available base models:
158
-
159
-
-`babbage-002`
160
-
-`davinci-002`
161
-
-`gpt-35-turbo` (0613)
162
-
-`gpt-35-turbo` (1106)
163
-
-`gpt-35-turbo` (0125)
164
-
-`gpt-4` (0613)
165
-
-`gpt-4o` (2024-08-06)
166
-
-`gpt-4o-mini` (2023-07-18)
167
-
168
103
Or you can fine tune a previously fine-tuned model, formatted as base-model.ft-{jobid}.
169
104
170
105
:::image type="content" source="../media/fine-tuning/models.png" alt-text="Screenshot of model options with a custom fine-tuned model." lightbox="../media/fine-tuning/models.png":::
@@ -373,7 +308,7 @@ az cognitiveservices account deployment create
373
308
374
309
## Use a deployed customized model
375
310
376
-
After your custom model deploys, you can use it like any other deployed model. You can use the **Playgrounds** in [Azure AI Foundry](https://ai.azure.com) to experiment with your new deployment. You can continue to use the same parameters with your custom model, such as `temperature` and `max_tokens`, as you can with other deployed models. For fine-tuned `babbage-002` and `davinci-002` models you'll use the Completions playground and the Completions API. For fine-tuned `gpt-35-turbo-0613` models you'll use the Chat playground and the Chat completion API.
311
+
After your custom model deploys, you can use it like any other deployed model. You can use the **Chat Playgrounds** in [Azure AI Foundry](https://ai.azure.com) to experiment with your new deployment. You can continue to use the same parameters with your custom model, such as `temperature` and `max_tokens`, as you can with other deployed models.
0 commit comments