You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In addition to the JSONL format, training and validation data files must be encoded in UTF-8 and include a byte-order mark (BOM), and the file must be less than 200 MB in size. For more information about formatting your training data, see [Learn how to prepare your dataset for fine-tuning](../how-to/prepare-dataset.md).
53
+
In addition to the JSONL format, training and validation data files must be encoded in UTF-8 and include a byte-order mark (BOM). The file must be less than 200 MB in size. For more information about formatting your training data, see [Learn how to prepare your dataset for fine-tuning](../how-to/prepare-dataset.md).
54
54
55
55
### Create your training and validation datasets
56
56
57
-
Designing your prompts and completions for fine-tuning is different from designing your prompts for use with any of [our GPT-3 base models](../concepts/legacy-models.md#gpt-3-models). Prompts for completion calls often use either detailed instructions or few-shot learning techniques, and consist of multiple examples. For fine-tuning, we recommend that each training example consists of a single input prompt and its desired completion output. You don't need to give detailed instructions or multiple completion examples for the same prompt.
57
+
Designing your prompts and completions for fine-tuning is different from designing your prompts for use with any of [our GPT-3 base models](../concepts/legacy-models.md#gpt-3-models). Prompts for completion calls often use either detailed instructions or few-shot learning techniques, and consist of multiple examples. For fine-tuning, each training example should consist of a single input prompt and its desired completion output. You don't need to give detailed instructions or multiple completion examples for the same prompt.
58
58
59
-
The more training examples you have, the better. We recommend having at least 200 training examples. In general, doubling the dataset size leads to a linear increase in model quality.
59
+
The more training examples you have, the better. It's a best practice to have at least 200 training examples. In general, doubling the dataset size leads to a linear increase in model quality.
60
60
61
61
For more information about preparing training data for various tasks, see [Learn how to prepare your dataset for fine-tuning](../how-to/prepare-dataset.md).
62
62
63
63
### Use the OpenAI CLI data preparation tool
64
64
65
-
We recommend using OpenAI's command-line interface (CLI) to assist with many of the data preparation steps. OpenAI has developed a tool that validates, gives suggestions, and reformats your data into a JSONL file ready for fine-tuning.
65
+
We recommend that you use OpenAI's CLI to assist with many of the data preparation steps. OpenAI has developed a tool that validates, gives suggestions, and reformats your data into a JSONL file ready for fine-tuning.
66
66
67
67
To install the OpenAI CLI, run the following Python command:
68
68
@@ -88,28 +88,28 @@ The tool guides you through suggested changes for your training data. It reforma
88
88
89
89
## Select a base model
90
90
91
-
The first step in creating a customized model is to choose a base model. The choice influences both the performance and the cost of your model. You can create a customized model from one of the following available base models:
91
+
The first step in creating a customized model is to choose a base model. The choice influences both the performance and the cost of your model.
92
+
93
+
You can create a customized model from one of the following available base models:
92
94
-`ada`
93
95
-`babbage`
94
96
-`curie`
95
-
-`code-cushman-001`__\*__
96
-
-`davinci`__\*__
97
-
98
-
__\*__ This model is currently unavailable for new customers.
97
+
-`code-cushman-001` (Currently unavailable for new customers)
98
+
-`davinci` (Currently unavailable for new customers)
99
99
100
-
You can use the [Models API](/rest/api/cognitiveservices/azureopenaistable/models/list) to identify which models are fine-tunable. For more information about our base models, see [Models](../concepts/models.md).
100
+
You can use the [Models API](/rest/api/cognitiveservices/azureopenaistable/models/list) to identify which models are fine-tunable. For more information about our base models, see [Azure OpenAI Service models](../concepts/models.md).
101
101
102
102
## Upload your training data
103
103
104
-
The next step is to either choose existing prepared training data or upload new prepared training data to use when customizing your model. After you prepare your training data, you can upload your files to the service. We offer two ways to upload training data:
104
+
The next step is to either choose existing prepared training data or upload new prepared training data to use when customizing your model. After you prepare your training data, you can upload your files to the service. There are two ways to upload training data:
105
105
106
106
-[From a local file](/rest/api/cognitiveservices/azureopenaistable/files/upload)
107
107
-[Import from an Azure Blob store or other web location](/rest/api/cognitiveservices/azureopenaistable/files/import)
108
108
109
-
For large data files, we recommend you import from an Azure Blob store. Large files can become unstable when uploaded through multipart forms because the requests are atomic and can't be retried or resumed. For more information about Azure Blob storage, see [What is Azure Blob storage?](../../../storage/blobs/storage-blobs-overview.md)
109
+
For large data files, we recommend that you import from an Azure Blob store. Large files can become unstable when uploaded through multipart forms because the requests are atomic and can't be retried or resumed. For more information about Azure Blob storage, see [What is Azure Blob storage?](../../../storage/blobs/storage-blobs-overview.md)
110
110
111
111
> [!NOTE]
112
-
> Training data files must be formatted as JSONL files, encoded in UTF-8 with a byte-order mark (BOM), and less than 200 MB in size.
112
+
> Training data files must be formatted as JSONL files, encoded in UTF-8 with a byte-order mark (BOM). The file must be less than 200 MB in size.
113
113
114
114
The following Python example locally creates sample training and validation dataset files, then uploads the local files by using the Python SDK, and retrieves the returned file IDs. Make sure to save the IDs returned by the example because you need them for the fine-tuning training job creation.
115
115
@@ -152,7 +152,7 @@ with open(training_file_name, 'w') as training_file:
152
152
153
153
# Copy the validation dataset file from the training dataset file.
154
154
# Typically, your training data and validation data should be mutually exclusive.
155
-
# For the purposes of this example, we use the same data.
155
+
# For the purposes of this example, you use the same data.
156
156
print(f'Copying the training file to the validation file')
print(f'Fine-tuning model with job ID: {job_id}.')
207
207
```
208
208
209
-
You can either use default values for the hyperparameters of the fine-tune job, or you can adjust those hyperparameters for your customization needs. In the previous Python example, we set the `n_epochs` hyperparameter to 1, indicating that we want just one full cycle through the training data. For more information about these hyperparameters, see the [Create a Fine tune job](/rest/api/cognitiveservices/azureopenaistable/fine-tunes/create) section of the [REST API](/rest/api/cognitiveservices/azureopenaistable/fine-tunes) documentation.
209
+
You can either use default values for the hyperparameters of the fine-tune job, or you can adjust those hyperparameters for your customization needs. In this example, you set the `n_epochs` hyperparameter to 1, indicating that you want just one full cycle through the training data. For more information about these hyperparameters, see [Create a Fine tune job](/rest/api/cognitiveservices/azureopenaistable/fine-tunes/create).
210
210
211
211
## Check the status of your customized model
212
212
213
-
After you start a fine-tune job, it can take some time to complete. Your job might be queued behind other jobs on our system, and training your model can take minutes or hours depending on the model and dataset size. The following Python example checks the status of your fine-tune job by retrieving information about your job by using the job ID returned from the previous example:
213
+
After you start a fine-tune job, it can take some time to complete. Your job might be queued behind other jobs on the system. Training your model can take minutes or hours depending on the model and dataset size. The following Python example checks the status of your fine-tune job by retrieving information about your job by using the job ID returned from the previous example:
When the fine-tune job succeeds, the value of the `fine_tuned_model` variable in the response body of the `FineTune.retrieve()`` method is set to the name of your customized model. Your model is now also available for discovery from the [list Models API](/rest/api/cognitiveservices/azureopenaistable/models/list). However, you can't issue completion calls to your customized model until your customized model is deployed. You must deploy your customized model to make it available for use with completion calls.
239
+
When the fine-tune job succeeds, the value of the `fine_tuned_model` variable in the response body of the `FineTune.retrieve()` method is set to the name of your customized model. Your model is now also available for discovery from the [list Models API](/rest/api/cognitiveservices/azureopenaistable/models/list). However, you can't issue completion calls to your customized model until your customized model is deployed. You must deploy your customized model to make it available for use with completion calls.
240
240
241
241
[!INCLUDE [Fine-tuning deletion](fine-tune.md)]
242
242
243
243
> [!NOTE]
244
-
> As with all applications, we require a review process prior to going live.
244
+
> As with all applications, Microsoft requires a review process for your custom model before it's available live.
245
245
246
246
You can use either [Azure OpenAI](#deploy-a-model-with-azure-openai) or the [Azure CLI](#deploy-a-model-with-azure-cli) to deploy your customized model.
247
247
@@ -269,7 +269,7 @@ deployment_id = result["id"]
269
269
270
270
### Deploy a model with Azure CLI
271
271
272
-
The following example shows how to use the Azure CLI to deploy your customized model. With the Azure CLI, you must specify a name for the deployment of your customized model. For more information about how to use the Azure CLI to deploy customized models, see [az cognitiveservices account deployment](/cli/azure/cognitiveservices/account/deployment) in the [Azure CLI documentation](/cli/azure).
272
+
The following example shows how to use the Azure CLI to deploy your customized model. With the Azure CLI, you must specify a name for the deployment of your customized model. For more information about how to use the Azure CLI to deploy customized models, see [az cognitiveservices account deployment](/cli/azure/cognitiveservices/account/deployment).
273
273
274
274
To run this Azure CLI command in a console window, you must replace the following _\<placeholders>_ with the corresponding values for your customized model:
275
275
@@ -284,8 +284,8 @@ To run this Azure CLI command in a console window, you must replace the followin
Azure OpenAI attaches a result file named _results.csv_ to each fine-tune job after it's complete. You can use the result file to analyze the training and validation performance of your customized model. The file ID for the result file is listed for each customized model, and you can use the Python SDK to retrieve the file ID and download the result file for analysis.
310
+
Azure OpenAI attaches a result file named _results.csv_ to each fine-tune job after it completes. You can use the result file to analyze the training and validation performance of your customized model. The file ID for the result file is listed for each customized model, and you can use the Python SDK to retrieve the file ID and download the result file for analysis.
311
311
312
312
The following Python example retrieves the file ID of the first result file attached to the fine-tune job for your customized model, and then uses the Python SDK to download the file to your working directory for analysis.
0 commit comments