Skip to content

Commit 70d956a

Browse files
authored
Update sh2 (#72)
1 parent 025c525 commit 70d956a

File tree

42 files changed

+361
-229
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+361
-229
lines changed

examples/pytorch/llm/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
8. other: polylm-13b, seqgpt-560m
2828
3. supported features: quantization, DDP, model parallelism(device map), gradient checkpointing, gradient accumulation, pushing to modelscope hub, custom datasets, multimodal and agent SFT, mutli-round chat, ...
2929
4. supported datasets:
30-
1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en, cmnli-zh, jd-zh, dureader-robust-zh, medical-en, medical-zh, medical-mini-zh, sharegpt-en, sharegpt-zh
30+
1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en, cmnli-zh, jd-zh, dureader-robust-zh, medical-en, medical-zh, medical-mini-zh, sharegpt-en, sharegpt-zh, code-python-zh, advertise-gen
3131
2. agent: [damo-agent-zh](https://modelscope.cn/datasets/damo/MSAgent-Bench/summary), damo-agent-mini-zh
3232
3. multi-modal: coco-en
3333
4. other: cls-fudan-news-zh, ner-jave-zh
@@ -71,40 +71,40 @@ Training GPU memory: qlora(low,3090) > lora > full(2*A100)
7171
git clone https://github.com/modelscope/swift.git
7272
cd swift/examples/pytorch/llm
7373

74-
# sft lora and infer qwen-7b-chat, Requires 27GB GPU memory.
74+
# sft lora and infer qwen-7b-chat, Requires 38GB GPU memory.
7575
# You can save GPU memory by setting `--gradient_checkpointing true`, but this will slightly decrease the training speed.
7676
# If you want to push weights into modelscope hub during training, you need to set '--push_to_hub true'.
7777
# Recommended experimental environment: A100
7878
bash scripts/qwen_7b_chat/lora/sft.sh
7979
bash scripts/qwen_7b_chat/lora/infer.sh
8080

81-
# sft(lora+ddp) and infer qwen-7b-chat, Requires 2*27GB GPU memory.
81+
# sft(lora+ddp) and infer qwen-7b-chat, Requires 2*38GB GPU memory.
8282
# Recommended experimental environment: A100
8383
bash scripts/qwen_7b_chat/lora_ddp/sft.sh
8484
bash scripts/qwen_7b_chat/lora_ddp/infer.sh
8585

86-
# sft(lora+mp+ddp) and infer qwen-7b-chat, Requires 4*14GB GPU memory.
86+
# sft(lora+mp+ddp) and infer qwen-7b-chat, Requires 4*15GB GPU memory.
8787
# Recommended experimental environment: V100, A10, 3090
8888
bash scripts/qwen_7b_chat/lora_mp_ddp/sft.sh
8989
bash scripts/qwen_7b_chat/lora_mp_ddp/infer.sh
9090

91-
# sft(qlora) and infer qwen-7b-chat, Requires 13GB GPU memory.
91+
# sft(qlora) and infer qwen-7b-chat, Requires 12GB GPU memory.
9292
# If you want to use quantification, you need to `pip install bitsandbytes -U`
9393
# Recommended experimental environment: A10, 3090
9494
bash scripts/qwen_7b_chat/qlora/sft.sh
9595
bash scripts/qwen_7b_chat/qlora/infer.sh
9696

97-
# sft(qlora+ddp) and infer qwen-7b-chat, Requires 2*13GB GPU memory.
97+
# sft(qlora+ddp) and infer qwen-7b-chat, Requires 2*14GB GPU memory.
9898
# Recommended experimental environment: A10, 3090
9999
bash scripts/qwen_7b_chat/qlora_ddp/sft.sh
100100
bash scripts/qwen_7b_chat/qlora_ddp/infer.sh
101101

102-
# sft(full+mp) and infer qwen-7b-chat, Requires 2*50GB GPU memory.
102+
# sft(full+mp) and infer qwen-7b-chat, Requires 2*75GB GPU memory.
103103
# Recommended experimental environment: A100
104104
bash scripts/qwen_7b_chat/full_mp/sft.sh
105105
bash scripts/qwen_7b_chat/full_mp/infer.sh
106106

107-
# sft(full+mp+ddp) and infer qwen-7b-chat, Requires 4*50GB GPU memory.
107+
# sft(full+mp+ddp) and infer qwen-7b-chat, Requires 4*75GB GPU memory.
108108
# Recommended experimental environment: A100
109109
bash scripts/qwen_7b_chat/full_mp_ddp/sft.sh
110110
bash scripts/qwen_7b_chat/full_mp_ddp/infer.sh

examples/pytorch/llm/README_CN.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
8. other: polylm-13b, seqgpt-560m
2929
3. 支持的特性: 模型量化, DDP, 模型并行(device_map), gradient checkpointing, 梯度累加, 支持推送ModelScope Hub, 自定义数据集, 多模态和Agent SFT, 多轮对话, ...
3030
4. 支持的数据集:
31-
1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en, cmnli-zh, jd-zh, dureader-robust-zh, medical-en, medical-zh, medical-mini-zh, sharegpt-en, sharegpt-zh
31+
1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en, cmnli-zh, jd-zh, dureader-robust-zh, medical-en, medical-zh, medical-mini-zh, sharegpt-en, sharegpt-zh, code-python-zh, advertise-gen
3232
2. agent: [damo-agent-zh](https://modelscope.cn/datasets/damo/MSAgent-Bench/summary), damo-agent-mini-zh
3333
3. 多模态: coco-en
3434
4. 其他: cls-fudan-news-zh, ner-jave-zh
@@ -73,40 +73,40 @@ pip install .
7373
git clone https://github.com/modelscope/swift.git
7474
cd swift/examples/pytorch/llm
7575

76-
# 微调(lora)+推理 qwen-7b-chat, 需要27GB显存.
76+
# 微调(lora)+推理 qwen-7b-chat, 需要38GB显存.
7777
# 你可以通过设置`--gradient_checkpointing true`来节约显存, 但这会略微降低训练速度.
7878
# 如果你想在训练时, 将权重push到modelscope hub中, 你需要设置`--push_to_hub true`.
7979
# 推荐的实验环境: A100
8080
bash scripts/qwen_7b_chat/lora/sft.sh
8181
bash scripts/qwen_7b_chat/lora/infer.sh
8282

83-
# 微调(lora+ddp)+推理 qwen-7b-chat, 需要2卡*27GB显存.
83+
# 微调(lora+ddp)+推理 qwen-7b-chat, 需要2卡*38GB显存.
8484
# 推荐的实验环境: A100
8585
bash scripts/qwen_7b_chat/lora_ddp/sft.sh
8686
bash scripts/qwen_7b_chat/lora_ddp/infer.sh
8787

88-
# 微调(lora+mp+ddp)+推理 qwen-7b-chat, 需要4卡*14GB显存.
88+
# 微调(lora+mp+ddp)+推理 qwen-7b-chat, 需要4卡*15GB显存.
8989
# 推荐的实验环境: V100, 3090, A10
9090
bash scripts/qwen_7b_chat/lora_mp_ddp/sft.sh
9191
bash scripts/qwen_7b_chat/lora_mp_ddp/infer.sh
9292

93-
# 微调(qlora)+推理 qwen-7b-chat, 需要13GB显存.
93+
# 微调(qlora)+推理 qwen-7b-chat, 需要12GB显存.
9494
# 如果你想要使用量化, 你需要`pip install bitsandbytes -U`
9595
# 推荐的实验环境: 3090, A10
9696
bash scripts/qwen_7b_chat/qlora/sft.sh
9797
bash scripts/qwen_7b_chat/qlora/infer.sh
9898

99-
# 微调(qlora+ddp)+推理 qwen-7b-chat, 需要2卡*13GB显存.
99+
# 微调(qlora+ddp)+推理 qwen-7b-chat, 需要2卡*14GB显存.
100100
# 推荐的实验环境: 3090, A10
101101
bash scripts/qwen_7b_chat/qlora_ddp/sft.sh
102102
bash scripts/qwen_7b_chat/qlora_ddp/infer.sh
103103

104-
# 微调(full+mp)+推理 qwen-7b-chat, 需要2卡*50G显存.
104+
# 微调(full+mp)+推理 qwen-7b-chat, 需要2卡*75G显存.
105105
# 推荐的实验环境: A100
106106
bash scripts/qwen_7b_chat/full_mp/sft.sh
107107
bash scripts/qwen_7b_chat/full_mp/infer.sh
108108

109-
# 微调(full+mp+ddp)+推理 qwen-7b-chat, 需要4卡*50G显存.
109+
# 微调(full+mp+ddp)+推理 qwen-7b-chat, 需要4卡*75G显存.
110110
# 推荐的实验环境: A100
111111
bash scripts/qwen_7b_chat/full_mp_ddp/sft.sh
112112
bash scripts/qwen_7b_chat/full_mp_ddp/infer.sh

examples/pytorch/llm/scripts/baichuan2_7b_chat/lora_ddp/infer.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@ python src/llm_infer.py \
55
--template_type baichuan \
66
--dtype bf16 \
77
--ckpt_dir "runs/baichuan2-7b-chat/vx_xxx/checkpoint-xxx" \
8-
--eval_human true \
8+
--eval_human false \
9+
--dataset damo-agent-mini-zh \
10+
--max_length 4096 \
911
--max_new_tokens 1024 \
1012
--temperature 0.9 \
1113
--top_k 50 \

examples/pytorch/llm/scripts/baichuan2_7b_chat/lora_ddp/sft.sh

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# Experimental environment: 2 * A100
2+
# 2 * 44GB GPU memory
23
nproc_per_node=2
34
CUDA_VISIBLE_DEVICES=0,1 \
45
torchrun \
@@ -11,10 +12,10 @@ torchrun \
1112
--dtype bf16 \
1213
--output_dir runs \
1314
--ddp_backend nccl \
14-
--dataset alpaca-en,alpaca-zh \
15-
--dataset_sample 20000 \
15+
--dataset damo-agent-mini-zh \
16+
--train_dataset_sample -1 \
1617
--num_train_epochs 1 \
17-
--max_length 2048 \
18+
--max_length 4096 \
1819
--lora_rank 8 \
1920
--lora_alpha 32 \
2021
--lora_dropout_p 0. \

examples/pytorch/llm/scripts/qwen_agent/lora_ddp/infer.sh renamed to examples/pytorch/llm/scripts/baichuan2_7b_chat/qlora/infer.sh

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
CUDA_VISIBLE_DEVICES=0 \
22
python src/llm_infer.py \
3-
--model_type qwen-7b-chat \
3+
--model_type baichuan2-7b-chat \
44
--sft_type lora \
5-
--template_type chatml \
5+
--template_type baichuan \
66
--dtype bf16 \
7-
--ckpt_dir "runs/qwen-7b-chat/vx_xxx/checkpoint-xxx" \
7+
--ckpt_dir "runs/baichuan2-7b-chat/vx_xxx/checkpoint-xxx" \
88
--eval_human false \
9-
--dataset damo-agent-mini-zh \
10-
--dataset_sample -1 \
9+
--dataset advertise-gen \
1110
--max_length 2048 \
12-
--use_flash_attn true \
11+
--quantization_bit 4 \
12+
--bnb_4bit_comp_dtype bf16 \
1313
--max_new_tokens 1024 \
1414
--temperature 0.9 \
1515
--top_k 50 \
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,18 @@
1-
# Experimental environment: 2 * A100
2-
nproc_per_node=2
3-
CUDA_VISIBLE_DEVICES=0,1 \
4-
torchrun \
5-
--nproc_per_node=$nproc_per_node \
6-
--master_port 29500 \
7-
src/llm_sft.py \
8-
--model_type qwen-7b-chat \
1+
# Experimental environment: 3090
2+
# 12GB GPU memory
3+
CUDA_VISIBLE_DEVICES=0 \
4+
python src/llm_sft.py \
5+
--model_type baichuan2-7b-chat \
96
--sft_type lora \
10-
--template_type chatml \
7+
--template_type baichuan \
118
--dtype bf16 \
129
--output_dir runs \
13-
--ddp_backend nccl \
14-
--dataset damo-agent-mini-zh \
15-
--dataset_sample -1 \
10+
--dataset advertise-gen \
11+
--train_dataset_sample -1 \
1612
--num_train_epochs 1 \
1713
--max_length 2048 \
14+
--quantization_bit 4 \
15+
--bnb_4bit_comp_dtype bf16 \
1816
--lora_rank 8 \
1917
--lora_alpha 32 \
2018
--lora_dropout_p 0. \
@@ -23,15 +21,14 @@ torchrun \
2321
--batch_size 1 \
2422
--weight_decay 0. \
2523
--learning_rate 1e-4 \
26-
--gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
24+
--gradient_accumulation_steps 16 \
2725
--max_grad_norm 0.5 \
2826
--warmup_ratio 0.03 \
2927
--eval_steps 100 \
3028
--save_steps 100 \
3129
--save_total_limit 2 \
3230
--logging_steps 10 \
33-
--use_flash_attn true \
3431
--push_to_hub false \
35-
--hub_model_id qwen-7b-chat-qlora \
32+
--hub_model_id baichuan2-7b-chat-qlora \
3633
--hub_private_repo true \
3734
--hub_token 'your-sdk-token' \

examples/pytorch/llm/scripts/chatglm2_6b/lora_ddp/infer.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@ python src/llm_infer.py \
55
--template_type chatglm2 \
66
--dtype bf16 \
77
--ckpt_dir "runs/chatglm2-6b/vx_xxx/checkpoint-xxx" \
8-
--eval_human true \
8+
--eval_human false \
9+
--dataset code-python-zh \
10+
--max_length 8192 \
911
--max_new_tokens 1024 \
1012
--temperature 0.9 \
1113
--top_k 50 \

examples/pytorch/llm/scripts/chatglm2_6b/lora_ddp/sft.sh

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
# Experimental environment: A100
2+
# 50GB GPU memory
13
nproc_per_node=2
24
CUDA_VISIBLE_DEVICES=0,1 \
35
torchrun \
@@ -10,13 +12,14 @@ torchrun \
1012
--dtype bf16 \
1113
--output_dir runs \
1214
--ddp_backend nccl \
13-
--dataset alpaca-en,alpaca-zh \
14-
--dataset_sample -1 \
15+
--dataset code-python-zh \
16+
--train_dataset_sample -1 \
1517
--num_train_epochs 1 \
16-
--max_length 2048 \
18+
--max_length 8192 \
1719
--lora_rank 8 \
1820
--lora_alpha 32 \
1921
--lora_dropout_p 0. \
22+
--lora_target_modules ALL \
2023
--gradient_checkpointing false \
2124
--batch_size 1 \
2225
--weight_decay 0. \

examples/pytorch/llm/scripts/internlm_7b_chat/lora_ddp/infer.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@ python src/llm_infer.py \
55
--template_type internlm \
66
--dtype bf16 \
77
--ckpt_dir "runs/internlm-7b-chat/vx_xxx/checkpoint-xxx" \
8-
--eval_human true \
8+
--eval_human false \
9+
--dataset jd-zh \
10+
--max_length 2048 \
911
--max_new_tokens 1024 \
1012
--temperature 0.9 \
1113
--top_k 50 \

examples/pytorch/llm/scripts/internlm_7b_chat/lora_ddp/sft.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ torchrun \
1010
--dtype bf16 \
1111
--output_dir runs \
1212
--ddp_backend nccl \
13-
--dataset alpaca-en,alpaca-zh \
14-
--dataset_sample 20000 \
13+
--dataset jd-zh \
14+
--train_dataset_sample -1 \
1515
--num_train_epochs 1 \
1616
--max_length 2048 \
1717
--lora_rank 8 \

0 commit comments

Comments
 (0)