Skip to content

Commit 9858e4f

Browse files
committed
Fix windows (#1096)
1 parent fea0809 commit 9858e4f

File tree

4 files changed

+8
-7
lines changed

4 files changed

+8
-7
lines changed

docs/source/LLM/自定义与拓展.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
## 自定义数据集
88
我们支持三种**自定义数据集**的方法.
99

10-
1. 【推荐】直接命令行传参的方式,指定`--dataset xxx.json yyy.jsonl zzz.csv`, **更加方便支持自定义数据集**, 支持五种数据集格式(即使用`SmartPreprocessor`,支持的数据集格式见下方), 支持`dataset_id``dataset_path`.
10+
1. 【推荐】直接命令行传参的方式,指定`--dataset xxx.json yyy.jsonl zzz.csv`, **更加方便支持自定义数据集**, 支持五种数据集格式(即使用`SmartPreprocessor`,支持的数据集格式见下方), 支持`dataset_id``dataset_path`. 不需要修改`dataset_info.json`文件.
1111
2. 添加数据集到`dataset_info.json`中, 比第一种方式更灵活但繁琐, 支持对数据集使用两种预处理器并指定其参数: `RenameColumnsPreprocessor`, `ConversationsPreprocessor`(默认使用`SmartPreprocessor`). 支持直接修改swift内置的`dataset_info.json`, 或者通过`--custom_dataset_info xxx.json`的方式传入外置的json文件(方便pip install而非git clone的用户拓展数据集).
1212
3. **注册数据集**的方式: 比第1、2种方式更加灵活但繁琐, 支持使用函数对数据集进行预处理. 方法1、2在实现上借助了方法3. 可以直接修改源码进行拓展, 或者通过`--custom_register_path xxx.py`的方式传入, 脚本会对py文件进行解析(方便pip install的用户).
1313

docs/source_en/LLM/Customization.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
We support three methods for **customizing datasets**.
1010

11-
1. \[Recommended] Use the command line argument directly to specify `--dataset xxx.json yyy.jsonl zzz.csv`, which is more convenient for supporting custom datasets. It supports five data formats (using `SmartPreprocessor`, supported dataset formats are listed below) and supports `dataset_id` and `dataset_path`.
11+
1. \[Recommended] Use the command line argument directly to specify `--dataset xxx.json yyy.jsonl zzz.csv`, which is more convenient for supporting custom datasets. It supports five data formats (using `SmartPreprocessor`, supported dataset formats are listed below) and supports `dataset_id` and `dataset_path`. No need to modify the `dataset_info.json` file.
1212
2. Adding datasets to `dataset_info.json` is more flexible but cumbersome compared to the first method, and supports using two preprocessors and specifying their parameters: `RenameColumnsPreprocessor`, `ConversationsPreprocessor` (default is to use `SmartPreprocessor`). You can directly modify the built-in `dataset_info.json` in Swift, or pass in an external json file using `--custom_dataset_info xxx.json` (for users who prefer pip install over git clone to expand datasets).
1313
3. Registering datasets: More flexible but cumbersome compared to the first and second methods, it supports using functions to preprocess datasets. Methods 1 and 2 are implemented by leveraging method 3. You can directly modify the source code for expansion, or pass in a custom registration path using `--custom_register_path xxx.py`, where the script will parse the py file (for pip install users).
1414

tests/llm/data/swift_#:#.jsonl

Lines changed: 0 additions & 3 deletions
This file was deleted.

tests/llm/test_run.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -169,8 +169,12 @@ def test_custom_dataset(self):
169169
'swift_multi.json', 'sharegpt.jsonl'
170170
]
171171
val_dataset_fnames = [
172-
'alpaca.jsonl', 'alpaca2.csv', 'conversations.jsonl', 'swift_pre.csv', 'swift_single.jsonl',
173-
'swift_#:#.jsonl'
172+
'alpaca.jsonl',
173+
'alpaca2.csv',
174+
'conversations.jsonl',
175+
'swift_pre.csv',
176+
'swift_single.jsonl',
177+
# 'swift_#:#.jsonl'
174178
]
175179
folder = os.path.join(os.path.dirname(__file__), 'data')
176180
resume_from_checkpoint = None

0 commit comments

Comments
 (0)