Skip to content

Commit a875f16

Browse files
authored
Add baichuan2 (#40)
1 parent 20aef70 commit a875f16

File tree

20 files changed

+172
-45
lines changed

20 files changed

+172
-45
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Key features:
3232
[code link](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm)
3333

3434
1. supported SFT methods: [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), full(full parameter fine-tuning)
35-
2. supported models: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b
35+
2. supported models: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b, baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat
3636
3. supported features: quantization, ddp, model parallelism(device map), gradient checkpointing, gradient accumulation, pushing to modelscope hub, custom datasets, multimodal and agent SFT, mutli-round chat, ...
3737
4. supported datasets:
3838
1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en

README_CN.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
3030
[code link](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm)
3131

3232
1. 支持的SFT方法: [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), 全参数微调
33-
2. 支持的模型: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b
33+
2. 支持的模型: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b, baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat
3434
3. 支持的特性: 模型量化, DDP, 模型并行(device_map), gradient checkpointing, 梯度累加, 支持推送ModelScope Hub, 自定义数据集, 多模态和Agent SFT, 多轮对话, ...
3535
4. 支持的数据集:
3636
1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en

examples/pytorch/llm/README.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717
## Features
1818
1. supported SFT methods: [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), full(full parameter fine-tuning)
19-
2. supported models: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b
19+
2. supported models: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b, baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat
2020
3. supported features: quantization, ddp, model parallelism(device map), gradient checkpointing, gradient accumulation, pushing to modelscope hub, custom datasets, multimodal and agent SFT, mutli-round chat, ...
2121
4. supported datasets:
2222
1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en
@@ -59,20 +59,24 @@ pip install .
5959
git clone https://github.com/modelscope/swift.git
6060
cd swift/examples/pytorch/llm
6161

62+
# sft lora and infer qwen-7b, Requires 22GB VRAM.
63+
# If you want to push weights into modelscope hub during training, you need to set '--push_to_hub true'
64+
bash scripts/qwen_7b_chat/lora/sft.sh
65+
bash scripts/qwen_7b_chat/lora/infer.sh
66+
67+
# sft(lora+ddp) and infer qwen-7b, Requires 4*22GB VRAM.
68+
bash scripts/qwen_7b_chat/lora_ddp/sft.sh
69+
bash scripts/qwen_7b_chat/lora_ddp/infer.sh
70+
6271
# sft(qlora) and infer qwen-7b, Requires 16GB VRAM.
6372
# If you want to use quantification, you need to `pip install bitsandbytes -U`
64-
# If you want to push weights into modelscope hub during training, you need to set '--push_to_hub true'
6573
bash scripts/qwen_7b_chat/qlora/sft.sh
6674
bash scripts/qwen_7b_chat/qlora/infer.sh
6775

6876
# sft(qlora+ddp) and infer qwen-7b, Requires 4*16GB VRAM.
6977
bash scripts/qwen_7b_chat/qlora_ddp/sft.sh
7078
bash scripts/qwen_7b_chat/qlora_ddp/infer.sh
7179

72-
# sft(lora+ddp) and infer qwen-7b, Requires 4*22GB VRAM.
73-
bash scripts/qwen_7b_chat/lora_ddp/sft.sh
74-
bash scripts/qwen_7b_chat/lora_ddp/infer.sh
75-
7680
# sft(full) and infer qwen-7b, Requires 95GB VRAM.
7781
bash scripts/qwen_7b_chat/full/sft.sh
7882
bash scripts/qwen_7b_chat/full/infer.sh

examples/pytorch/llm/README_CN.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717

1818
## 特性
1919
1. 支持的SFT方法: [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), 全参数微调
20-
2. 支持的模型: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b
20+
2. 支持的模型: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b, baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat
2121
3. 支持的特性: 模型量化, DDP, 模型并行(device_map), gradient checkpointing, 梯度累加, 支持推送ModelScope Hub, 自定义数据集, 多模态和Agent SFT, 多轮对话, ...
2222
4. 支持的数据集:
2323
1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en
@@ -61,20 +61,24 @@ pip install .
6161
git clone https://github.com/modelscope/swift.git
6262
cd swift/examples/pytorch/llm
6363

64+
# 微调(lora)+推理 qwen-7b, 需要22GB显存.
65+
# 如果你想在训练时, 将权重push到modelscope hub中, 你需要设置`--push_to_hub true`
66+
bash scripts/qwen_7b_chat/lora/sft.sh
67+
bash scripts/qwen_7b_chat/lora/infer.sh
68+
69+
# 微调(lora+ddp)+推理 qwen-7b, 需要4卡*22GB显存.
70+
bash scripts/qwen_7b_chat/lora_ddp/sft.sh
71+
bash scripts/qwen_7b_chat/lora_ddp/infer.sh
72+
6473
# 微调(qlora)+推理 qwen-7b, 需要16GB显存.
6574
# 如果你想要使用量化, 你需要`pip install bitsandbytes -U`
66-
# 如果你想在训练时, 将权重push到modelscope hub中, 你需要设置`--push_to_hub true`
6775
bash scripts/qwen_7b_chat/qlora/sft.sh
6876
bash scripts/qwen_7b_chat/qlora/infer.sh
6977

7078
# 微调(qlora+ddp)+推理 qwen-7b, 需要4卡*16GB显存.
7179
bash scripts/qwen_7b_chat/qlora_ddp/sft.sh
7280
bash scripts/qwen_7b_chat/qlora_ddp/infer.sh
7381

74-
# 微调(lora+ddp)+推理 qwen-7b, 需要4卡*22GB显存.
75-
bash scripts/qwen_7b_chat/lora_ddp/sft.sh
76-
bash scripts/qwen_7b_chat/lora_ddp/infer.sh
77-
7882
# 微调(full)+推理 qwen-7b, 需要95G显存.
7983
bash scripts/qwen_7b_chat/full/sft.sh
8084
bash scripts/qwen_7b_chat/full/infer.sh
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# 16G
2+
CUDA_VISIBLE_DEVICES=0 \
3+
python src/llm_infer.py \
4+
--model_type baichuan2-7b-chat \
5+
--sft_type lora \
6+
--template_type baichuan \
7+
--dtype bf16 \
8+
--ckpt_dir "runs/baichuan2-7b-chat/vx_xxx/checkpoint-xxx" \
9+
--eval_human true \
10+
--max_new_tokens 1024 \
11+
--temperature 0.9 \
12+
--top_k 50 \
13+
--top_p 0.9 \
14+
--do_sample true \
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,36 @@
1-
# 4 * 17G
1+
# 4 * 22GB VRAM
22
nproc_per_node=4
33
CUDA_VISIBLE_DEVICES=0,1,2,3 \
44
torchrun \
55
--nproc_per_node=$nproc_per_node \
66
--master_port 29500 \
77
src/llm_sft.py \
8-
--model_type baichuan-13b-chat \
8+
--model_type baichuan2-7b-chat \
99
--sft_type lora \
10+
--template_type baichuan \
11+
--dtype bf16 \
1012
--output_dir runs \
1113
--ddp_backend nccl \
1214
--dataset alpaca-en,alpaca-zh \
13-
--dataset_sample -1 \
15+
--dataset_sample 20000 \
1416
--num_train_epochs 1 \
1517
--max_length 1024 \
16-
--quantization_bit 4 \
1718
--lora_rank 8 \
1819
--lora_alpha 32 \
19-
--lora_dropout_p 0.1 \
20+
--lora_dropout_p 0.05 \
21+
--lora_target_modules W_pack o_proj \
22+
--gradient_checkpointing true \
2023
--batch_size 1 \
24+
--weight_decay 0. \
2125
--learning_rate 1e-4 \
2226
--gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
27+
--max_grad_norm 0.5 \
28+
--warmup_ratio 0.03 \
2329
--eval_steps 50 \
2430
--save_steps 50 \
2531
--save_total_limit 2 \
2632
--logging_steps 10 \
33+
--push_to_hub false \
34+
--hub_model_id baichuan2-7b-chat-lora \
35+
--hub_private_repo true \
36+
--hub_token 'your-sdk-token' \

examples/pytorch/llm/scripts/qwen_7b/qlora_ddp/infer.sh renamed to examples/pytorch/llm/scripts/qwen_7b/lora_ddp/infer.sh

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# 10G
1+
# 16G
22
CUDA_VISIBLE_DEVICES=0 \
33
python src/llm_infer.py \
44
--model_type qwen-7b \
@@ -7,8 +7,6 @@ python src/llm_infer.py \
77
--dtype bf16 \
88
--ckpt_dir "runs/qwen-7b/vx_xxx/checkpoint-xxx" \
99
--eval_human true \
10-
--quantization_bit 4 \
11-
--bnb_4bit_comp_dtype bf16 \
1210
--max_new_tokens 1024 \
1311
--temperature 0.9 \
1412
--top_k 50 \

examples/pytorch/llm/scripts/qwen_7b/qlora_ddp/sft.sh renamed to examples/pytorch/llm/scripts/qwen_7b/lora_ddp/sft.sh

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# 4 * 16GB VRAM
1+
# 4 * 22GB VRAM
22
nproc_per_node=4
33
CUDA_VISIBLE_DEVICES=0,1,2,3 \
44
torchrun \
@@ -15,12 +15,10 @@ torchrun \
1515
--dataset_sample -1 \
1616
--num_train_epochs 1 \
1717
--max_length 1024 \
18-
--quantization_bit 4 \
19-
--bnb_4bit_comp_dtype bf16 \
20-
--lora_rank 64 \
18+
--lora_rank 8 \
2119
--lora_alpha 32 \
2220
--lora_dropout_p 0.05 \
23-
--lora_target_modules ALL \
21+
--lora_target_modules c_attn c_proj \
2422
--gradient_checkpointing true \
2523
--batch_size 1 \
2624
--weight_decay 0. \
@@ -34,6 +32,6 @@ torchrun \
3432
--logging_steps 10 \
3533
--use_flash_attn false \
3634
--push_to_hub false \
37-
--hub_model_id qwen-7b-qlora \
35+
--hub_model_id qwen-7b-lora \
3836
--hub_private_repo true \
3937
--hub_token 'your-sdk-token' \

examples/pytorch/llm/scripts/baichuan_13b_chat/qlora_ddp/infer.sh renamed to examples/pytorch/llm/scripts/qwen_7b_chat/lora/infer.sh

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
1-
# 12G
1+
# 16G
22
CUDA_VISIBLE_DEVICES=0 \
33
python src/llm_infer.py \
4-
--model_type baichuan-13b-chat \
4+
--model_type qwen-7b-chat \
55
--sft_type lora \
6-
--ckpt_dir "runs/baichuan-13b-chat/vx_xxx/checkpoint-xxx" \
6+
--template_type chatml \
7+
--dtype bf16 \
8+
--ckpt_dir "runs/qwen-7b-chat/vx_xxx/checkpoint-xxx" \
79
--eval_human true \
8-
--quantization_bit 4 \
910
--max_new_tokens 1024 \
1011
--temperature 0.9 \
1112
--top_k 50 \
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# 22GB VRAM
2+
CUDA_VISIBLE_DEVICES=0 \
3+
python src/llm_sft.py \
4+
--model_type qwen-7b-chat \
5+
--sft_type lora \
6+
--template_type chatml \
7+
--dtype bf16 \
8+
--output_dir runs \
9+
--dataset alpaca-en,alpaca-zh \
10+
--dataset_sample -1 \
11+
--num_train_epochs 1 \
12+
--max_length 1024 \
13+
--lora_rank 8 \
14+
--lora_alpha 32 \
15+
--lora_dropout_p 0.05 \
16+
--lora_target_modules c_attn c_proj \
17+
--gradient_checkpointing true \
18+
--batch_size 1 \
19+
--weight_decay 0. \
20+
--learning_rate 1e-4 \
21+
--gradient_accumulation_steps 16 \
22+
--max_grad_norm 0.5 \
23+
--warmup_ratio 0.03 \
24+
--eval_steps 50 \
25+
--save_steps 50 \
26+
--save_total_limit 2 \
27+
--logging_steps 10 \
28+
--use_flash_attn false \
29+
--push_to_hub false \
30+
--hub_model_id qwen-7b-chat-lora \
31+
--hub_private_repo true \
32+
--hub_token 'your-sdk-token' \

0 commit comments

Comments
 (0)