Merge pull request #106430 from nicholasdbrady/patch-1

Court72 · web-flow · commit 4ed54c8dc931 · 2023-03-13T08:36:28.000-06:00
Update prepare-dataset.md
diff --git a/articles/cognitive-services/openai/how-to/prepare-dataset.md b/articles/cognitive-services/openai/how-to/prepare-dataset.md
@@ -24,6 +24,7 @@ The first step of customizing your model is to prepare a high quality dataset. T
 - Each completion should start with a whitespace due to our tokenization, which tokenizes most words with a preceding whitespace.
 - Each completion should end with a fixed stop sequence to inform the model when the completion ends. A stop sequence could be `\n`, `###`, or any other token that doesn't appear in any completion.
 - For inference, you should format your prompts in the same way as you did when creating the training dataset, including the same separator. Also specify the same stop sequence to properly truncate the completion.
+- The dataset cannot exceed 100 Mb in total file size.
 
 ## Best practices