Skip to content

Commit e97cb71

Browse files
authored
fix bugs (#3026)
1 parent 58fd32f commit e97cb71

File tree

12 files changed

+159
-79
lines changed

12 files changed

+159
-79
lines changed

docs/source/Instruction/命令行参数.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,8 @@
4646
- download_mode: 数据集下载模式,包含`reuse_dataset_if_exists``force_redownload`,默认为reuse_dataset_if_exists
4747
- columns: 用于对数据集进行列映射,使数据集满足AutoPreprocessor可以处理的样式,具体查看[这里](../Customization/自定义数据集.md)。你可以传入json字符串,例如:`'{"text1": "query", "text2": "response"}'`,默认为None。
4848
- strict: 如果为True,则数据集只要某行有问题直接抛错,否则会丢弃出错数据样本。默认False
49-
- 🔥model_name: 仅用于自我认知任务,传入模型中文名和英文名,以空格分隔,例如:`--model_name 小黄 'Xiao Huang'`。默认为None
50-
- 🔥model_author: 仅用于自我认知任务,传入模型作者的中文名和英文名,以空格分隔,例如:`--model_author '魔搭' 'ModelScope'`。默认为None
49+
- 🔥model_name: 仅用于自我认知任务,只对`swift/self-cognition`数据集生效,替换掉数据集中的`{{NAME}}`通配符。传入模型中文名和英文名,以空格分隔,例如:`--model_name 小黄 'Xiao Huang'`。默认为None
50+
- 🔥model_author: 仅用于自我认知任务,只对`swift/self-cognition`数据集生效,替换掉数据集中的`{{AUTHOR}}`通配符。传入模型作者的中文名和英文名,以空格分隔,例如:`--model_author '魔搭' 'ModelScope'`。默认为None
5151
- custom_dataset_info: 自定义数据集注册的json文件路径,参考[自定义数据集](../Customization/自定义数据集.md)。默认为`[]`
5252

5353
### 模板参数
@@ -113,6 +113,7 @@
113113
- remove_unused_columns: 是否删除数据集中不被使用的列,默认为False
114114
- logging_first_step: 是否记录第一个step的日志,默认为True
115115
- logging_steps: 日志打印间隔,默认为5
116+
- predict_with_generate: 验证时使用生成式的方式,默认为False。
116117
- metric_for_best_model: 默认为None,即当`predict_with_generate`设置为False时,设置为'loss',否则设置为'rouge-l'
117118
- greater_is_better: 默认为None,即当`metric_for_best_model`含'loss'时,设置为False,否则设置为True.
118119

@@ -330,6 +331,7 @@ RLHF参数继承于[训练参数](#训练参数)
330331
- simpo_gamma: SimPO算法中的reward margin项,论文建议设置为0.5-1.5,默认为`1.`
331332
- desirable_weight: KTO算法中对desirable response的loss权重 $\lambda_D$,默认为`1.`
332333
- undesirable_weight: KTO算法中对undesirable response的loss权重 $\lambda_U$,默认为`1.`
334+
- loss_scale: 覆盖模板参数,默认为'last_round'
333335

334336
#### PPO参数
335337
- reward_model: 默认为None

docs/source_en/Instruction/Command-line-parameters.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,8 @@ Hints:
4747
- download_mode: Dataset download mode, including `reuse_dataset_if_exists` and `force_redownload`, default is reuse_dataset_if_exists.
4848
- columns: Used for column mapping of the dataset to ensure that the dataset conforms to the format that AutoPreprocessor can handle. For more details, see [here](../Customization/Custom-dataset.md). You can pass in a JSON string, for example: `'{"text1": "query", "text2": "response"}'`, with the default being None.
4949
- strict: If set to True, any row with an issue in the dataset will throw an error immediately, otherwise, erroneous data samples will be discarded. Default is False.
50-
- 🔥model_name: Used only for self-awareness tasks, pass in the Chinese and English names of the model, separated by a space, e.g., `--model_name Xiao Huang 'Xiao Huang'`. Default is None.
51-
- 🔥model_author: Used only for self-awareness tasks, pass in the Chinese and English names of the model author, separated by a space, e.g., `--model_author '魔搭' 'ModelScope'`. Default is None.
50+
- 🔥model_name: Only applicable to the self-cognition task and effective only on the `swift/self-cognition` dataset. It replaces the `{{NAME}}` placeholder in the dataset. Input the model's name in both Chinese and English, separated by a space, for example: `--model_name 小黄 'Xiao Huang'`. Default is None.
51+
- 🔥model_author: Only applicable to the self-cognition task and effective only on the `swift/self-cognition` dataset. It replaces the `{{AUTHOR}}` placeholder in the dataset. Input the model author's name in both Chinese and English, separated by a space, for example: `--model_author '魔搭' 'ModelScope'`. Default is None.
5252
- custom_dataset_info: The path to the JSON file for custom dataset registration. Refer to [Custom Dataset](../Customization/Custom-dataset.md). Default is `[]`.
5353

5454

@@ -117,6 +117,7 @@ This parameter list inherits from transformers `Seq2SeqTrainingArguments`, with
117117
- remove_unused_columns: Whether to remove unused columns in the dataset, defaults to False.
118118
- logging_first_step: Whether to log the first step, defaults to True.
119119
- logging_steps: Interval for logging, defaults to 5.
120+
- predict_with_generate: Whether to use generative method during validation, default is False.
120121
- metric_for_best_model: Defaults to None, which sets it to 'loss' when `predict_with_generate` is False, otherwise sets it to 'rouge-l'.
121122
- greater_is_better: Defaults to None, which sets it to False when `metric_for_best_model` contains 'loss', otherwise sets to True.
122123

@@ -339,6 +340,7 @@ RLHF arguments inherit from the [training arguments](#training-arguments).
339340
- simpo_gamma: Reward margin term in the SimPO algorithm, with a paper-suggested setting of 0.5-1.5, default is `1.`.
340341
- desirable_weight: Loss weight $\lambda_D$ for desirable response in the KTO algorithm, default is `1.`.
341342
- undesirable_weight: Loss weight $\lambda_U$ for undesirable response in the KTO algorithm, default is `1.`.
343+
- loss_scale: Override template arguments, default is 'last_round'.
342344

343345
#### PPO Arguments
344346

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,24 @@
1+
# https://help.aliyun.com/zh/pai/user-guide/general-environment-variables
12
NNODES=$WORLD_SIZE \
23
NODE_RANK=$RANK \
34
swift sft \
45
--model Qwen/Qwen2.5-7B-Instruct \
5-
--train_type lora \
6-
--dataset 'swift/self-cognition#1000' \
6+
--train_type full \
7+
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh#20000' \
8+
'AI-ModelScope/alpaca-gpt4-data-en#20000' \
9+
--torch_dtype bfloat16 \
710
--num_train_epochs 1 \
811
--per_device_train_batch_size 1 \
9-
--lora_rank 8 \
10-
--lora_alpha 32 \
11-
--learning_rate 1e-4 \
12-
--gradient_accumulation_steps 16 \
12+
--per_device_eval_batch_size 1 \
13+
--learning_rate 1e-5 \
14+
--gradient_accumulation_steps 4 \
1315
--eval_steps 100 \
1416
--save_steps 100 \
1517
--save_total_limit 2 \
1618
--logging_steps 5 \
17-
--deepspeed zero3 \
18-
--model_author swift \
19-
--model_name swift-robot
19+
--max_length 8192 \
20+
--output_dir output \
21+
--system 'You are a helpful assistant.' \
22+
--warmup_ratio 0.05 \
23+
--dataloader_num_workers 4 \
24+
--deepspeed zero2
Lines changed: 26 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,30 @@
1+
nnodes=2
2+
nproc_per_node=4
3+
14
CUDA_VISIBLE_DEVICES=0,1,2,3 \
2-
NNODES=2 \
5+
NNODES=$nnodes \
36
NODE_RANK=0 \
47
MASTER_ADDR=127.0.0.1 \
5-
NPROC_PER_NODE=4 \
8+
MASTER_PORT=29500 \
9+
NPROC_PER_NODE=$nproc_per_node \
610
swift sft \
7-
--model Qwen/Qwen2.5-7B-Instruct \
8-
--train_type lora \
9-
--torch_dtype bfloat16 \
10-
--dataset 'swift/self-cognition#1000' \
11-
--num_train_epochs 1 \
12-
--lora_rank 8 \
13-
--lora_alpha 32 \
14-
--learning_rate 1e-4 \
15-
--gradient_accumulation_steps 16 \
16-
--gradient_checkpointing_kwargs '{"use_reentrant": false}' \
17-
--eval_steps 100 \
18-
--save_steps 100 \
19-
--save_total_limit 2 \
20-
--logging_steps 5 \
21-
--model_author swift \
22-
--model_name swift-robot
11+
--model Qwen/Qwen2.5-7B-Instruct \
12+
--train_type full \
13+
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh#20000' \
14+
'AI-ModelScope/alpaca-gpt4-data-en#20000' \
15+
--torch_dtype bfloat16 \
16+
--num_train_epochs 1 \
17+
--per_device_train_batch_size 1 \
18+
--per_device_eval_batch_size 1 \
19+
--learning_rate 1e-5 \
20+
--gradient_accumulation_steps $(expr 32 / $nproc_per_node / $nnodes) \
21+
--eval_steps 100 \
22+
--save_steps 100 \
23+
--save_total_limit 2 \
24+
--logging_steps 5 \
25+
--max_length 8192 \
26+
--output_dir output \
27+
--system 'You are a helpful assistant.' \
28+
--warmup_ratio 0.05 \
29+
--dataloader_num_workers 4 \
30+
--deepspeed zero2
Lines changed: 26 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,30 @@
1+
nnodes=2
2+
nproc_per_node=4
3+
14
CUDA_VISIBLE_DEVICES=0,1,2,3 \
2-
NNODES=2 \
5+
NNODES=$nnodes \
36
NODE_RANK=1 \
47
MASTER_ADDR=xxx.xxx.xxx.xxx \
5-
NPROC_PER_NODE=4 \
8+
MASTER_PORT=29500 \
9+
NPROC_PER_NODE=$nproc_per_node \
610
swift sft \
7-
--model Qwen/Qwen2.5-7B-Instruct \
8-
--train_type lora \
9-
--torch_dtype bfloat16 \
10-
--dataset 'swift/self-cognition#1000' \
11-
--num_train_epochs 1 \
12-
--lora_rank 8 \
13-
--lora_alpha 32 \
14-
--learning_rate 1e-4 \
15-
--gradient_accumulation_steps 16 \
16-
--gradient_checkpointing_kwargs '{"use_reentrant": false}' \
17-
--eval_steps 100 \
18-
--save_steps 100 \
19-
--save_total_limit 2 \
20-
--logging_steps 5 \
21-
--model_author swift \
22-
--model_name swift-robot
11+
--model Qwen/Qwen2.5-7B-Instruct \
12+
--train_type full \
13+
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh#20000' \
14+
'AI-ModelScope/alpaca-gpt4-data-en#20000' \
15+
--torch_dtype bfloat16 \
16+
--num_train_epochs 1 \
17+
--per_device_train_batch_size 1 \
18+
--per_device_eval_batch_size 1 \
19+
--learning_rate 1e-5 \
20+
--gradient_accumulation_steps $(expr 32 / $nproc_per_node / $nnodes) \
21+
--eval_steps 100 \
22+
--save_steps 100 \
23+
--save_total_limit 2 \
24+
--logging_steps 5 \
25+
--max_length 8192 \
26+
--output_dir output \
27+
--system 'You are a helpful assistant.' \
28+
--warmup_ratio 0.05 \
29+
--dataloader_num_workers 4 \
30+
--deepspeed zero2
Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,31 @@
1+
nnodes=2
2+
nproc_per_node=4
3+
14
CUDA_VISIBLE_DEVICES=0,1,2,3 \
2-
torchrun --master_port 29500 --nproc_per_node=4 --nnodes=2 --node_rank=0 --master_addr=127.0.0.1 \
5+
torchrun \
6+
--master_port 29500 \
7+
--nproc_per_node=$nproc_per_node \
8+
--nnodes=$nnodes \
9+
--node_rank=0 \
10+
--master_addr=127.0.0.1 \
311
swift/cli/sft.py \
412
--model Qwen/Qwen2.5-7B-Instruct \
5-
--train_type lora \
13+
--train_type full \
14+
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh#20000' \
15+
'AI-ModelScope/alpaca-gpt4-data-en#20000' \
616
--torch_dtype bfloat16 \
7-
--dataset 'swift/self-cognition#1000' \
817
--num_train_epochs 1 \
9-
--lora_rank 8 \
10-
--lora_alpha 32 \
11-
--learning_rate 1e-4 \
12-
--gradient_accumulation_steps 16 \
13-
--gradient_checkpointing_kwargs '{"use_reentrant": false}' \
18+
--per_device_train_batch_size 1 \
19+
--per_device_eval_batch_size 1 \
20+
--learning_rate 1e-5 \
21+
--gradient_accumulation_steps $(expr 32 / $nproc_per_node / $nnodes) \
1422
--eval_steps 100 \
1523
--save_steps 100 \
1624
--save_total_limit 2 \
1725
--logging_steps 5 \
18-
--model_author swift \
19-
--model_name swift-robot
26+
--max_length 8192 \
27+
--output_dir output \
28+
--system 'You are a helpful assistant.' \
29+
--warmup_ratio 0.05 \
30+
--dataloader_num_workers 4 \
31+
--deepspeed zero2
Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,31 @@
1+
nnodes=2
2+
nproc_per_node=4
3+
14
CUDA_VISIBLE_DEVICES=0,1,2,3 \
2-
torchrun --master_port 29500 --nproc_per_node=4 --nnodes=2 --node_rank=1 --master_addr=xxx.xxx.xxx.xxx \
5+
torchrun \
6+
--master_port 29500 \
7+
--nproc_per_node=$nproc_per_node \
8+
--nnodes=$nnodes \
9+
--node_rank=1 \
10+
--master_addr=xxx.xxx.xxx.xxx \
311
swift/cli/sft.py \
412
--model Qwen/Qwen2.5-7B-Instruct \
5-
--train_type lora \
13+
--train_type full \
14+
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh#20000' \
15+
'AI-ModelScope/alpaca-gpt4-data-en#20000' \
616
--torch_dtype bfloat16 \
7-
--dataset 'swift/self-cognition#1000' \
817
--num_train_epochs 1 \
9-
--lora_rank 8 \
10-
--lora_alpha 32 \
11-
--learning_rate 1e-4 \
12-
--gradient_accumulation_steps 16 \
13-
--gradient_checkpointing_kwargs '{"use_reentrant": false}' \
18+
--per_device_train_batch_size 1 \
19+
--per_device_eval_batch_size 1 \
20+
--learning_rate 1e-5 \
21+
--gradient_accumulation_steps $(expr 32 / $nproc_per_node / $nnodes) \
1422
--eval_steps 100 \
1523
--save_steps 100 \
1624
--save_total_limit 2 \
1725
--logging_steps 5 \
18-
--model_author swift \
19-
--model_name swift-robot
26+
--max_length 8192 \
27+
--output_dir output \
28+
--system 'You are a helpful assistant.' \
29+
--warmup_ratio 0.05 \
30+
--dataloader_num_workers 4 \
31+
--deepspeed zero2
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# 20GiB
2+
CUDA_VISIBLE_DEVICES=0 \
3+
MAX_PIXELS=1003520 \
4+
swift sft \
5+
--model Qwen/Qwen2.5-VL-7B-Instruct \
6+
--dataset 'AI-ModelScope/LaTeX_OCR:human_handwrite#20000' \
7+
--train_type lora \
8+
--torch_dtype bfloat16 \
9+
--num_train_epochs 1 \
10+
--per_device_train_batch_size 1 \
11+
--per_device_eval_batch_size 2 \
12+
--learning_rate 1e-4 \
13+
--lora_rank 8 \
14+
--lora_alpha 32 \
15+
--target_modules all-linear \
16+
--freeze_vit true \
17+
--gradient_accumulation_steps 16 \
18+
--eval_steps 100 \
19+
--save_steps 100 \
20+
--save_total_limit 5 \
21+
--logging_steps 5 \
22+
--max_length 2048 \
23+
--output_dir output \
24+
--warmup_ratio 0.05 \
25+
--dataloader_num_workers 4 \
26+
--predict_with_generate true \
27+
--metric_for_best_model rouge-l \
28+
--greater_is_better true

swift/llm/argument/rlhf_args.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,8 +68,7 @@ class RLHFArguments(PPOArguments, TrainArguments):
6868
desirable_weight: float = 1.0
6969
undesirable_weight: float = 1.0
7070

71-
# Use last_round by default
72-
loss_scale: str = 'last_round'
71+
loss_scale: Optional[str] = None
7372

7473
def __post_init__(self):
7574
self._init_rm()
@@ -78,6 +77,13 @@ def __post_init__(self):
7877
super().__post_init__()
7978
self._init_ppo()
8079

80+
if self.loss_scale is None:
81+
if self.rlhf_type == 'orpo' and not self.model_meta.is_multimodal:
82+
# Avoid padding labels during the model's forward pass in multimodal models.
83+
# Some multimodal models do not expand the image pad token.
84+
self.loss_scale = 'default'
85+
else:
86+
self.loss_scale = 'last_round'
8187
if self.rlhf_type in ['dpo', 'kto', 'ppo'] and self.train_type == 'full':
8288
self.ref_model = self.ref_model or self.model
8389
self.ref_model_type = self.ref_model_type or self.model_type

swift/llm/model/utils.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,8 @@ def _get_arch_mapping():
207207
res = {}
208208
for model_type, model_meta in MODEL_MAPPING.items():
209209
architectures = model_meta.architectures
210+
if not architectures:
211+
architectures.append('null')
210212
for arch in architectures:
211213
if arch not in res:
212214
res[arch] = []
@@ -216,7 +218,7 @@ def _get_arch_mapping():
216218
@staticmethod
217219
def get_matched_model_types(config: Union[PretrainedConfig, Dict[str, Any]]) -> List[str]:
218220
"""Get possible model_type."""
219-
arch = HfConfigFactory.get_config_attr(config, 'architectures')
221+
arch = HfConfigFactory.get_config_attr(config, 'architectures') or ['null']
220222
if arch:
221223
arch = arch[0]
222224
arch_mapping = HfConfigFactory._get_arch_mapping()

0 commit comments

Comments
 (0)