You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,7 +42,7 @@ The central configuration that is used across data preprocessing, training, and
42
42
43
43
## Data Preprocessing
44
44
45
-
We provide a script for data preprocessing [`preprocess_data.py`](preprocess_data.py). This converts a text dataset into the same format used for model pretraining. We refrain from providing a script that prepares an instruction finetuning dataset due to different models requiring unique formatting. We also provide options for packing datasets. For more information, please consult the config documentation under [`docs/config.md`](docs/config.md).
45
+
We provide a script for data preprocessing [`preprocess_data.py`](preprocess_data.py). This converts a text dataset into the same format used for model pretaining (causal language modeling). We refrain from providing a script that prepares an instruction finetuning dataset due to different models requiring unique formatting. We also provide options for packing datasets. For more information, please consult the config documentation under [`docs/config.md`](docs/config.md).
0 commit comments