Skip to content

Commit 6b1d789

Browse files
Update prepare-dataset.md
Added a data prep explanation in the item list that describes the dataset size limit on fine-tuning jobs.
1 parent a48a0f0 commit 6b1d789

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

articles/cognitive-services/openai/how-to/prepare-dataset.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ The first step of customizing your model is to prepare a high quality dataset. T
2424
- Each completion should start with a whitespace due to our tokenization, which tokenizes most words with a preceding whitespace.
2525
- Each completion should end with a fixed stop sequence to inform the model when the completion ends. A stop sequence could be `\n`, `###`, or any other token that doesn't appear in any completion.
2626
- For inference, you should format your prompts in the same way as you did when creating the training dataset, including the same separator. Also specify the same stop sequence to properly truncate the completion.
27+
- The dataset cannot exceed 100 Mb in total file size.
2728

2829
## Best practices
2930

0 commit comments

Comments
 (0)