Skip to content

Commit 4d9e27f

Browse files
committed
Merge commit '2626106a398a96cf4df6801247f8c4aa91820f64' into release/1.5
* commit '2626106a398a96cf4df6801247f8c4aa91820f64': update default_lr; fix do_sample in vllm (#336)
2 parents 234fefe + 2626106 commit 4d9e27f

File tree

16 files changed

+23
-16
lines changed

16 files changed

+23
-16
lines changed

docs/source/LLM/命令行参数.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@
5555
- `--num_train_epochs`: 训练的epoch数, 默认为`1`. 如果`max_steps >= 0`, 则覆盖`num_train_epochs`.
5656
- `--max_steps`: 训练的max_steps数, 默认为`-1`. 如果`max_steps >= 0`, 则覆盖`num_train_epochs`.
5757
- `--optim`: 默认为`'adamw_torch'`.
58-
- `--learning_rate`: 默认值为`None`, 即如果`sft_type`为lora, 则设置为1e-4, 如果`sft_type`为full, 则设置为2e-5.
58+
- `--learning_rate`: 默认值为`None`, 即如果`sft_type`为lora, 则设置为1e-4, 如果`sft_type`为full, 则设置为1e-5.
5959
- `--weight_decay`: 默认值为`0.01`.
6060
- `--gradient_accumulation_steps`: 梯度累加, 默认值为`None`, 设置为`math.ceil(16 / self.batch_size / world_size)`. `total_batch_size = batch_size * gradient_accumulation_steps * world_size`.
6161
- `--max_grad_norm`: 梯度裁剪, 默认值为`0.5`.

examples/pytorch/llm/scripts/qwen_1_8b_chat/full/sft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ python llm_sft.py \
1717
--gradient_checkpointing true \
1818
--batch_size 1 \
1919
--weight_decay 0.01 \
20-
--learning_rate 2e-5 \
20+
--learning_rate 1e-5 \
2121
--gradient_accumulation_steps 16 \
2222
--max_grad_norm 0.5 \
2323
--warmup_ratio 0.03 \

examples/pytorch/llm/scripts/qwen_1_8b_chat/full_ddp/sft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ torchrun \
2424
--gradient_checkpointing true \
2525
--batch_size 1 \
2626
--weight_decay 0.01 \
27-
--learning_rate 2e-5 \
27+
--learning_rate 1e-5 \
2828
--gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
2929
--max_grad_norm 0.5 \
3030
--warmup_ratio 0.03 \

examples/pytorch/llm/scripts/qwen_7b_chat/full/sft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ swift sft \
99
--output_dir output \
1010
--num_train_epochs 1 \
1111
--max_length 4096 \
12-
--learning_rate 2e-5 \
12+
--learning_rate 1e-5 \
1313
--use_flash_attn true \
1414
--save_only_model true \
1515
--dataset codefuse-evol-instruction-zh \

examples/pytorch/llm/scripts/qwen_7b_chat/full_freeze_ddp/sft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ swift sft \
1010
--output_dir output \
1111
--num_train_epochs 1 \
1212
--max_length 4096 \
13-
--learning_rate 2e-5 \
13+
--learning_rate 1e-5 \
1414
--use_flash_attn true \
1515
--save_only_model true \
1616
--dataset codefuse-evol-instruction-zh \

examples/pytorch/llm/scripts/qwen_7b_chat/full_mp/sft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ python llm_sft.py \
1717
--gradient_checkpointing true \
1818
--batch_size 1 \
1919
--weight_decay 0.01 \
20-
--learning_rate 2e-5 \
20+
--learning_rate 1e-5 \
2121
--gradient_accumulation_steps 16 \
2222
--max_grad_norm 0.5 \
2323
--warmup_ratio 0.03 \

examples/pytorch/llm/scripts/qwen_7b_chat/full_mp_ddp/sft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ torchrun \
2222
--gradient_checkpointing true \
2323
--batch_size 1 \
2424
--weight_decay 0.01 \
25-
--learning_rate 2e-5 \
25+
--learning_rate 1e-5 \
2626
--gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
2727
--max_grad_norm 0.5 \
2828
--warmup_ratio 0.03 \

examples/pytorch/llm/scripts/qwen_audio_chat/full_mp/sft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ swift sft \
99
--output_dir output \
1010
--num_train_epochs 1 \
1111
--max_length 2048 \
12-
--learning_rate 2e-5 \
12+
--learning_rate 1e-5 \
1313
--use_flash_attn true \
1414
--save_only_model true \
1515
--dataset aishell1-mini-zh \

examples/pytorch/llm/scripts/qwen_audio_chat/full_mp_ddp/sft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ swift sft \
1010
--output_dir output \
1111
--num_train_epochs 1 \
1212
--max_length 2048 \
13-
--learning_rate 2e-5 \
13+
--learning_rate 1e-5 \
1414
--use_flash_attn true \
1515
--save_only_model true \
1616
--dataset aishell1-mini-zh \

examples/pytorch/llm/scripts/qwen_vl_chat/full_mp/sft.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ swift sft \
99
--output_dir output \
1010
--num_train_epochs 1 \
1111
--max_length 2048 \
12-
--learning_rate 2e-5 \
12+
--learning_rate 1e-5 \
1313
--use_flash_attn true \
1414
--save_only_model true \
1515
--dataset coco-mini-en \

0 commit comments

Comments
 (0)