Skip to content

Commit f22e243

Browse files
authored
docs: add note to note that file extension is required in training data path (#447)
* docs: add note to note that file extension is required in training data path Signed-off-by: Will Johnson <[email protected]> * docs: clarify what is being checked Signed-off-by: Will Johnson <[email protected]> * docs: make note, clarification Signed-off-by: Will Johnson <[email protected]> * docs: clarify specifically what code does Signed-off-by: Will Johnson <[email protected]> --------- Signed-off-by: Will Johnson <[email protected]>
1 parent c0362ad commit f22e243

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,11 @@ ARROW | ✅
8080

8181
As iterated above, we also support passing a HF dataset ID directly via `--training_data_path` argument.
8282

83+
**NOTE**: Due to the variety of supported data formats and file types, `--training_data_path` is handled as follows:
84+
- If `--training_data_path` ends in a valid file extension (e.g., .json, .csv), it is treated as a file.
85+
- If `--training_data_path` points to a valid folder, it is treated as a folder.
86+
- If neither of these are true, the data preprocessor tries to load `--training_data_path` as a Hugging Face (HF) dataset ID.
87+
8388
## Use cases supported with `training_data_path` argument
8489

8590
### 1. Data formats with a single sequence and a specified response_template to use for masking on completion.

0 commit comments

Comments
 (0)