Skip to content

Commit 3f6c056

Browse files
committed
rebrand from Azure OpenAI Studio
1 parent 67f6947 commit 3f6c056

File tree

1 file changed

+15
-15
lines changed

1 file changed

+15
-15
lines changed

articles/ai-services/openai/tutorials/fine-tune.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -35,18 +35,18 @@ In this tutorial you learn how to:
3535
- [Jupyter Notebooks](https://jupyter.org/)
3636
- An Azure OpenAI resource in a [region where `gpt-4o-mini-2024-07-18` fine-tuning is available](../concepts/models.md). If you don't have a resource the process of creating one is documented in our resource [deployment guide](../how-to/create-resource.md).
3737
- Fine-tuning access requires **Cognitive Services OpenAI Contributor**.
38-
- If you do not already have access to view quota, and deploy models in Azure AI Foundry portal you need [additional permissions](../how-to/role-based-access-control.md).
38+
- If you don't already have access to view quota and deploy models in Azure AI Foundry portal, then you need [more permissions](../how-to/role-based-access-control.md).
3939

4040
> [!IMPORTANT]
41-
> We recommend reviewing the [pricing information](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/#pricing) for fine-tuning to familiarize yourself with the associated costs. In testing, this tutorial resulted in 48,000 tokens being billed (4,800 training tokens * 10 epochs of training). Training costs are in addition to the costs that are associated with fine-tuning inference, and the hourly hosting costs of having a fine-tuned model deployed. Once you have completed the tutorial, you should delete your fine-tuned model deployment otherwise you will continue to incur the hourly hosting cost.
41+
> We recommend reviewing the [pricing information](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/#pricing) for fine-tuning to familiarize yourself with the associated costs. Testing of this tutorial resulted in 48,000 tokens being billed (4,800 training tokens * 10 epochs of training). Training costs are in addition to the costs that are associated with fine-tuning inference, and the hourly hosting costs of having a fine-tuned model deployed. Once you have completed the tutorial, you should delete your fine-tuned model deployment otherwise you continue to incur the hourly hosting cost.
4242
4343
## Set up
4444

4545
### Python libraries
4646

4747
# [OpenAI Python 1.x](#tab/python-new)
4848

49-
This tutorial provides examples of some of the latest OpenAI features include seed/events/checkpoints. In order to take advantage of these features you may need to run `pip install openai --upgrade` to upgrade to the latest release.
49+
This tutorial provides examples of some of the latest OpenAI features include seed/events/checkpoints. In order to take advantage of these features, you might need to run `pip install openai --upgrade` to upgrade to the latest release.
5050

5151
```cmd
5252
pip install openai requests tiktoken numpy
@@ -113,7 +113,7 @@ Fine-tuning `gpt-4o-mini-2024-07-18` requires a specially formatted JSONL traini
113113
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}
114114
```
115115

116-
For this example we'll modify this slightly by changing to:
116+
For this example we modify this slightly by changing to:
117117

118118
```json
119119
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
@@ -123,7 +123,7 @@ For this example we'll modify this slightly by changing to:
123123

124124
While these three examples are helpful to give you the general format, if you want to steer your custom fine-tuned model to respond in a similar way you would need more examples. Generally you want **at least 50 high quality examples** to start out. However, it's entirely possible to have a use case that might require 1,000's of high quality training examples to be successful.
125125

126-
In general, doubling the dataset size can lead to a linear increase in model quality. But keep in mind, low quality examples can negatively impact performance. If you train the model on a large amount of internal data, without first pruning the dataset for only the highest quality examples, you could end up with a model that performs much worse than expected.
126+
In general, doubling the dataset size can lead to a linear increase in model quality. But keep in mind, low quality examples can negatively affect performance. If you train the model on a large amount of internal data without first pruning the dataset for only the highest quality examples, you could end up with a model that performs worse than expected.
127127

128128
You'll need to create two files `training_set.jsonl` and `validation_set.jsonl`.
129129

@@ -203,12 +203,12 @@ First example in validation set:
203203
{'role': 'assistant', 'content': "It's Canberra, not Sydney. Shocking, I know!"}
204204
```
205205

206-
In this case we only have 10 training and 10 validation examples so while this will demonstrate the basic mechanics of fine-tuning a model this in unlikely to be a large enough number of examples to produce a consistently noticeable impact.
206+
In this case we only have 10 training and 10 validation examples so while this demonstrates the basic mechanics of fine-tuning a model this in unlikely to be a large enough number of examples to produce a consistently noticeable effect.
207207

208-
Now you can then run some additional code from OpenAI using the tiktoken library to validate the token counts. Token counting using this method is not going to give you the exact token counts that will be used for fine-tuning, but should provide a good estimate.
208+
Now you can use the tiktoken library to validate the token counts. Token counting using this method isn't going to give you the exact token counts that are used for fine-tuning, but should provide a good estimate.
209209

210210
> [!NOTE]
211-
> Individual examples need to remain under the `gpt-4o-mini-2024-07-18` model's current training example context legnth of: 64,536 tokens. The model's input token limit remains 128,000 tokens.
211+
> Individual examples need to remain under the `gpt-4o-mini-2024-07-18` model's current training example context length of: 64,536 tokens. The model's input token limit remains 128,000 tokens.
212212
213213
```python
214214
# Validate token counts
@@ -371,11 +371,11 @@ Validation file ID: file-8556c3bb41b7416bb7519b47fcd1dd6b
371371

372372
## Begin fine-tuning
373373

374-
Now that the fine-tuning files have been successfully uploaded you can submit your fine-tuning training job:
374+
Now that the fine-tuning files are successfully uploaded you can submit your fine-tuning training job:
375375

376376
# [OpenAI Python 1.x](#tab/python-new)
377377

378-
In this example we're also passing the seed parameter. The seed controls the reproducibility of the job. Passing in the same seed and job parameters should produce the same results, but can differ in rare cases. If a seed isn't specified, one will be generated for you.
378+
In this example, we're also passing the seed parameter. The seed controls the reproducibility of the job. Passing in the same seed and job parameters should produce the same results, but can differ in rare cases. If a seed isn't specified, one is generated for you.
379379

380380
```python
381381
# Submit fine-tuning training job
@@ -556,7 +556,7 @@ Status: pending
556556
}
557557
```
558558

559-
It isn't unusual for training to take more than an hour to complete. Once training is completed the output message will change to something like:
559+
It isn't unusual for training to take more than an hour to complete. Once training is completed the output message changes to something like:
560560

561561
```output
562562
Fine-tuning job ftjob-900fcfc7ea1d4360a9f0cb1697b4eaa6 finished with status: succeeded
@@ -568,7 +568,7 @@ Found 4 fine-tune jobs.
568568

569569
API version: `2024-08-01-preview` or later is required for this command.
570570

571-
While not necessary to complete fine-tuning it can be helpful to examine the individual fine-tuning events that were generated during training. The full training results can also be examined after training is complete in the [training results file](../how-to/fine-tuning.md#analyze-your-customized-model).
571+
While not necessary to complete fine-tuning, it can be helpful to examine the individual fine-tuning events that were generated during training. The full training results can also be examined after training is complete in the [training results file](../how-to/fine-tuning.md#analyze-your-customized-model).
572572

573573
# [OpenAI Python 1.x](#tab/python-new)
574574

@@ -732,7 +732,7 @@ This command isn't available in the 0.28.1 OpenAI Python library. Upgrade to the
732732

733733
API version: `2024-08-01-preview` or later is required for this command.
734734

735-
When each training epoch completes a checkpoint is generated. A checkpoint is a fully functional version of a model which can both be deployed and used as the target model for subsequent fine-tuning jobs. Checkpoints can be particularly useful, as they can provide a snapshot of your model prior to overfitting having occurred. When a fine-tuning job completes you will have the three most recent versions of the model available to deploy. The final epoch will be represented by your fine-tuned model, the previous two epochs will be available as checkpoints.
735+
When each training epoch completes a checkpoint is generated. A checkpoint is a fully functional version of a model which can both be deployed and used as the target model for subsequent fine-tuning jobs. Checkpoints can be useful, as they can provide a snapshot of your model prior to overfitting having occurred. When a fine-tuning job completes, you have the three most recent versions of the model available to deploy. The final epoch will be represented by your fine-tuned model, the previous two epochs are available as checkpoints.
736736

737737
# [OpenAI Python 1.x](#tab/python-new)
738738

@@ -849,8 +849,8 @@ Alternatively, you can deploy your fine-tuned model using any of the other commo
849849
| subscription | The subscription ID for the associated Azure OpenAI resource |
850850
| resource_group | The resource group name for your Azure OpenAI resource |
851851
| resource_name | The Azure OpenAI resource name |
852-
| model_deployment_name | The custom name for your new fine-tuned model deployment. This is the name that will be referenced in your code when making chat completion calls. |
853-
| fine_tuned_model | Retrieve this value from your fine-tuning job results in the previous step. It will look like `gpt-4o-mini-2024-07-18.ft-0e208cf33a6a466994aff31a08aba678`. You'll need to add that value to the deploy_data json. |
852+
| model_deployment_name | The custom name for your new fine-tuned model deployment. This is the name that is referenced in your code when making chat completion calls. |
853+
| fine_tuned_model | Retrieve this value from your fine-tuning job results in the previous step. It looks like `gpt-4o-mini-2024-07-18.ft-0e208cf33a6a466994aff31a08aba678`. You need to add that value to the deploy_data json. |
854854

855855
[!INCLUDE [Fine-tuning deletion](../includes/fine-tune.md)]
856856

0 commit comments

Comments
 (0)