You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/ept.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,7 +43,7 @@ datasets:
43
43
And the commandline passed to the library should include following.
44
44
45
45
```
46
-
--data_config <path to the data config> --packing=True --max_seq_len 8192
46
+
--data_config_path <path to the data config> --packing=True --max_seq_len 8192
47
47
```
48
48
49
49
Please note that for non tokenized dataset our code adds `EOS_TOKEN` to the lines, for e.g. `Tweet` column before passing that as a dataset.
@@ -102,7 +102,7 @@ NOTE: More in-depth documentation of `sampling_stopping_strategy` and how to spe
102
102
Here also the command line arguments would be
103
103
104
104
```
105
-
--data_config <path to the data config> --packing=True --max_seq_len 8192
105
+
--data_config_path <path to the data config> --packing=True --max_seq_len 8192
106
106
```
107
107
108
108
The code again would add `EOS_TOKEN` to the non tokenized data before using it and also note that the `dataset_text_field` is assumed to be same across all datasets for now.
@@ -131,7 +131,7 @@ datasets:
131
131
The command-line arguments passed to the library should include the following:
132
132
133
133
```
134
-
--data_config <path to the data config> --packing=True --max_seq_len 8192 --max_steps <num training steps>
134
+
--data_config_path <path to the data config> --packing=True --max_seq_len 8192 --max_steps <num training steps>
135
135
```
136
136
137
137
Please note when using streaming, user must pass `max_steps` instead of `num_train_epochs`. See advanced data preprocessing [document](./advanced-data-preprocessing.md#data-streaming) for more info.
0 commit comments