Skip to content

Commit 63c46c0

Browse files
committed
update
1 parent 2d27e3e commit 63c46c0

File tree

2 files changed

+25
-623
lines changed

2 files changed

+25
-623
lines changed

articles/ai-services/openai/tutorials/fine-tune.md

Lines changed: 25 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
2-
title: Azure OpenAI Service fine-tuning gpt-3.5-turbo
2+
title: Azure OpenAI Service fine-tuning gpt-4o-mini
33
titleSuffix: Azure OpenAI
4-
description: Learn how to use Azure OpenAI's latest fine-tuning capabilities with gpt-3.5-turbo.
4+
description: Learn how to use Azure OpenAI's latest fine-tuning capabilities with gpt-4o-mini-2024-07-18
55
#services: cognitive-services
66
manager: nitinme
77
ms.service: azure-ai-openai
88
ms.topic: tutorial
9-
ms.date: 05/15/2024
9+
ms.date: 09/09/2024
1010
author: mrbullwinkle
1111
ms.author: mbullwin
1212
recommendations: false
@@ -206,7 +206,10 @@ First example in validation set:
206206

207207
In this case we only have 10 training and 10 validation examples so while this will demonstrate the basic mechanics of fine-tuning a model this in unlikely to be a large enough number of examples to produce a consistently noticeable impact.
208208

209-
Now you can then run some additional code from OpenAI using the tiktoken library to validate the token counts. Individual examples need to remain under the `gpt-4o-mini-2024-07-18` model's input token limit of 4096 tokens.
209+
Now you can then run some additional code from OpenAI using the tiktoken library to validate the token counts. Token counting using this method is not going to give you the exact token counts that will be used for fine-tuning, but should provide a good estimate.
210+
211+
> [!NOTE]
212+
> Individual examples need to remain under the `gpt-4o-mini-2024-07-18` model's current training example context legnth of: 64,536 tokens. The model's input token limit remains 128,000 tokens.
210213
211214
```python
212215
# Validate token counts
@@ -216,7 +219,7 @@ import tiktoken
216219
import numpy as np
217220
from collections import defaultdict
218221

219-
encoding = tiktoken.get_encoding("cl100k_base") # default encoding used by gpt-4, turbo, and text-embedding-ada-002 models
222+
encoding = tiktoken.get_encoding("o200k_base") # default encoding for gpt-4o models. This requires the latest version of tiktoken to be installed.
220223

221224
def num_tokens_from_messages(messages, tokens_per_message=3, tokens_per_name=1):
222225
num_tokens = 0
@@ -268,27 +271,27 @@ for file in files:
268271
Processing file: training_set.jsonl
269272
270273
#### Distribution of total tokens:
271-
min / max: 47, 62
272-
mean / median: 52.1, 50.5
273-
p5 / p95: 47.9, 57.5
274+
min / max: 46, 59
275+
mean / median: 49.8, 48.5
276+
p5 / p95: 46.0, 53.599999999999994
274277
275278
#### Distribution of assistant tokens:
276-
min / max: 13, 30
277-
mean / median: 17.6, 15.5
278-
p5 / p95: 13.0, 21.9
279+
min / max: 13, 28
280+
mean / median: 16.5, 14.0
281+
p5 / p95: 13.0, 19.9
279282
**************************************************
280283
Processing file: validation_set.jsonl
281284
282285
#### Distribution of total tokens:
283-
min / max: 43, 65
284-
mean / median: 51.4, 49.0
285-
p5 / p95: 45.7, 56.9
286+
min / max: 41, 64
287+
mean / median: 48.9, 47.0
288+
p5 / p95: 43.7, 54.099999999999994
286289
287290
#### Distribution of assistant tokens:
288291
min / max: 8, 29
289-
mean / median: 15.9, 13.5
290-
p5 / p95: 11.6, 20.9
291-
**************************************************
292+
mean / median: 15.0, 12.5
293+
p5 / p95: 10.7, 19.999999999999996
294+
****************************
292295
```
293296

294297
## Upload fine-tuning files
@@ -304,7 +307,7 @@ from openai import AzureOpenAI
304307
client = AzureOpenAI(
305308
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
306309
api_key = os.getenv("AZURE_OPENAI_API_KEY"),
307-
api_version = "2024-05-01-preview" # This API version or later is required to access seed/events/checkpoint features
310+
api_version = "2024-08-01-preview" # This API version or later is required to access seed/events/checkpoint features
308311
)
309312

310313
training_file_name = 'training_set.jsonl'
@@ -564,7 +567,7 @@ Found 4 fine-tune jobs.
564567

565568
## List fine-tuning events
566569

567-
API version: `2024-05-01-preview` or later is required for this command.
570+
API version: `2024-08-01-preview` or later is required for this command.
568571

569572
While not necessary to complete fine-tuning it can be helpful to examine the individual fine-tuning events that were generated during training. The full training results can also be examined after training is complete in the [training results file](../how-to/fine-tuning.md#analyze-your-customized-model).
570573

@@ -728,7 +731,7 @@ This command isn't available in the 0.28.1 OpenAI Python library. Upgrade to the
728731

729732
## List checkpoints
730733

731-
API version: `2024-05-01-preview` or later is required for this command.
734+
API version: `2024-08-01-preview` or later is required for this command.
732735

733736
When each training epoch completes a checkpoint is generated. A checkpoint is a fully functional version of a model which can both be deployed and used as the target model for subsequent fine-tuning jobs. Checkpoints can be particularly useful, as they can provide a snapshot of your model prior to overfitting having occurred. When a fine-tuning job completes you will have the three most recent versions of the model available to deploy. The final epoch will be represented by your fine-tuned model, the previous two epochs will be available as checkpoints.
734737

@@ -911,7 +914,7 @@ from openai import AzureOpenAI
911914
client = AzureOpenAI(
912915
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
913916
api_key = os.getenv("AZURE_OPENAI_API_KEY"),
914-
api_version = "2024-02-01"
917+
api_version = "2024-06-01"
915918
)
916919

917920
response = client.chat.completions.create(
@@ -937,7 +940,7 @@ import openai
937940

938941
openai.api_type = "azure"
939942
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
940-
openai.api_version = "2024-02-01"
943+
openai.api_version = "2024-06-01"
941944
openai.api_key = os.getenv("AZURE_OPENAI_API_KEY")
942945

943946
response = openai.ChatCompletion.create(

0 commit comments

Comments
 (0)