modelscope
diff --git a/‎README.md‎
Lines changed: 3 additions & 3 deletions b/‎README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎README_CN.md‎
Lines changed: 4 additions & 4 deletions b/‎README_CN.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎examples/pytorch/llm/README.md‎
Lines changed: 7 additions & 7 deletions b/‎examples/pytorch/llm/README.md‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎examples/pytorch/llm/README_CN.md‎
Lines changed: 8 additions & 8 deletions b/‎examples/pytorch/llm/README_CN.md‎
Lines changed: 8 additions & 8 deletions
diff --git a/‎examples/pytorch/llm/scripts/baichuan2_7b_chat/lora_ddp/infer.sh‎
Lines changed: 0 additions & 1 deletion b/‎examples/pytorch/llm/scripts/baichuan2_7b_chat/lora_ddp/infer.sh‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎examples/pytorch/llm/scripts/baichuan2_7b_chat/lora_ddp/sft.sh‎
Lines changed: 6 additions & 6 deletions b/‎examples/pytorch/llm/scripts/baichuan2_7b_chat/lora_ddp/sft.sh‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎examples/pytorch/llm/scripts/chatglm_6b/lora_ddp/infer.sh‎ renamed to ‎examples/pytorch/llm/scripts/chatglm2_6b/lora_ddp/infer.sh‎
Lines changed: 2 additions & 1 deletion b/‎examples/pytorch/llm/scripts/chatglm_6b/lora_ddp/infer.sh‎ renamed to ‎examples/pytorch/llm/scripts/chatglm2_6b/lora_ddp/infer.sh‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎examples/pytorch/llm/scripts/chatglm_6b/lora_ddp/sft.sh‎ renamed to ‎examples/pytorch/llm/scripts/chatglm2_6b/lora_ddp/sft.sh‎
Lines changed: 15 additions & 7 deletions b/‎examples/pytorch/llm/scripts/chatglm_6b/lora_ddp/sft.sh‎ renamed to ‎examples/pytorch/llm/scripts/chatglm2_6b/lora_ddp/sft.sh‎
Lines changed: 15 additions & 7 deletions
diff --git a/‎examples/pytorch/llm/scripts/llama2_70b_chat/qlora/infer.sh‎
Lines changed: 0 additions & 1 deletion b/‎examples/pytorch/llm/scripts/llama2_70b_chat/qlora/infer.sh‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎examples/pytorch/llm/scripts/llama2_70b_chat/qlora/sft.sh‎
Lines changed: 3 additions & 4 deletions b/‎examples/pytorch/llm/scripts/llama2_70b_chat/qlora/sft.sh‎
Lines changed: 3 additions & 4 deletions
@@ -32,13 +32,13 @@ Key features:
 [code link](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm)
 
 1. supported SFT methods: [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), full(full parameter fine-tuning)
-2. supported models: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b, baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat
+2. supported models: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b, baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat, seqgpt-560m
 3. supported features: quantization, ddp, model parallelism(device map), gradient checkpointing, gradient accumulation, pushing to modelscope hub, custom datasets, multimodal and agent SFT, mutli-round chat, ...
 4. supported datasets:
-   1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en
+   1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en, cmnli-zh
    2. agent: [damo-agent-zh](https://modelscope.cn/datasets/damo/MSAgent-Bench/summary), damo-agent-mini-zh
    3. multi-modal: coco-en
-5. supported templates: chatml(qwen), baichuan, chatglm2, llama, openbuddy-llama, default
+5. supported templates: chatml(qwen), baichuan, chatglm2, llama, openbuddy-llama, default, default-generation
 
 # Installation
 
 
@@ -30,13 +30,13 @@ SWIFT（Scalable lightWeight Infrastructure for Fine-Tuning）是一个可扩展
 [code link](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm)
 
 1. 支持的SFT方法: [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), 全参数微调
-2. 支持的模型: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b, baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat
+2. 支持的模型: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b, baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat, seqgpt-560m
 3. 支持的特性: 模型量化, DDP, 模型并行(device_map), gradient checkpointing, 梯度累加, 支持推送ModelScope Hub, 自定义数据集, 多模态和Agent SFT, 多轮对话, ...
 4. 支持的数据集:
-   1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en
+   1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en, cmnli-zh
    2. agent: [damo-agent-zh](https://modelscope.cn/datasets/damo/MSAgent-Bench/summary), damo-agent-mini-zh
-   3. multi-modal: coco-en
-5. 支持的对话模板: chatml(qwen), baichuan, chatglm2, llama, openbuddy-llama, default
+   3. 多模态: coco-en
+5. 支持的对话模板: chatml(qwen), baichuan, chatglm2, llama, openbuddy-llama, default, default-generation
 
 # 安装
 
 
@@ -16,13 +16,13 @@
 
 ## Features
 1. supported SFT methods: [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), full(full parameter fine-tuning)
-2. supported models: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b, baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat
+2. supported models: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b, baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat, seqgpt-560m
 3. supported features: quantization, ddp, model parallelism(device map), gradient checkpointing, gradient accumulation, pushing to modelscope hub, custom datasets, multimodal and agent SFT, mutli-round chat, ...
 4. supported datasets:
-   1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en
+   1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en, cmnli-zh
    2. agent: [damo-agent-zh](https://modelscope.cn/datasets/damo/MSAgent-Bench/summary), damo-agent-mini-zh
    3. multi-modal: coco-en
-5. supported templates: chatml(qwen), baichuan, chatglm2, llama, openbuddy-llama, default
+5. supported templates: chatml(qwen), baichuan, chatglm2, llama, openbuddy-llama, default, default-generation
 
 ## Prepare the Environment
 Experimental environment: A10, 3090, A100, ... (V100 does not support bf16, quantization)
@@ -59,21 +59,21 @@ pip install .
 git clone https://github.com/modelscope/swift.git
 cd swift/examples/pytorch/llm
 
-# sft lora and infer qwen-7b, Requires 22GB VRAM.
+# sft lora and infer qwen-7b, Requires 27GB VRAM.
 # If you want to push weights into modelscope hub during training, you need to set '--push_to_hub true'
 bash scripts/qwen_7b_chat/lora/sft.sh
 bash scripts/qwen_7b_chat/lora/infer.sh
 
-# sft(lora+ddp) and infer qwen-7b, Requires 4*22GB VRAM.
+# sft(lora+ddp) and infer qwen-7b, Requires 4*27GB VRAM.
 bash scripts/qwen_7b_chat/lora_ddp/sft.sh
 bash scripts/qwen_7b_chat/lora_ddp/infer.sh
 
-# sft(qlora) and infer qwen-7b, Requires 16GB VRAM.
+# sft(qlora) and infer qwen-7b, Requires 20GB VRAM.
 # If you want to use quantification, you need to `pip install bitsandbytes -U`
 bash scripts/qwen_7b_chat/qlora/sft.sh
 bash scripts/qwen_7b_chat/qlora/infer.sh
 
-# sft(qlora+ddp) and infer qwen-7b, Requires 4*16GB VRAM.
+# sft(qlora+ddp) and infer qwen-7b, Requires 4*20GB VRAM.
 bash scripts/qwen_7b_chat/qlora_ddp/sft.sh
 bash scripts/qwen_7b_chat/qlora_ddp/infer.sh
 
 
@@ -17,13 +17,13 @@
 
 ## 特性
 1. 支持的SFT方法: [lora](https://arxiv.org/abs/2106.09685), [qlora](https://arxiv.org/abs/2305.14314), 全参数微调
-2. 支持的模型: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b, baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat
+2. 支持的模型: qwen-7b, [qwen-7b-chat](https://github.com/QwenLM/Qwen-7B), qwen-vl, [qwen-vl-chat](https://github.com/QwenLM/Qwen-VL), baichuan-7b, baichuan-13b, baichuan-13b-chat, chatglm2-6b, chatglm2-6b-32k, llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat, openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b, polylm-13b, baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat, seqgpt-560m
 3. 支持的特性: 模型量化, DDP, 模型并行(device_map), gradient checkpointing, 梯度累加, 支持推送ModelScope Hub, 自定义数据集, 多模态和Agent SFT, 多轮对话, ...
 4. 支持的数据集:
-   1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en
+   1. NLP: alpaca-en(gpt4), alpaca-zh(gpt4), finance-en, multi-alpaca-all, code-en, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, poetry-zh, instruct-en, gpt4all-en, cmnli-zh
    2. agent: [damo-agent-zh](https://modelscope.cn/datasets/damo/MSAgent-Bench/summary), damo-agent-mini-zh
-   3. multi-modal: coco-en
-5. 支持的对话模板: chatml(qwen), baichuan, chatglm2, llama, openbuddy-llama, default
+   3. 多模态: coco-en
+5. 支持的对话模板: chatml(qwen), baichuan, chatglm2, llama, openbuddy-llama, default, default-generation
 
 ## 准备实验环境
 实验环境: A10, 3090, A100均可. (V100不支持bf16, 量化)
@@ -61,21 +61,21 @@ pip install .
 git clone https://github.com/modelscope/swift.git
 cd swift/examples/pytorch/llm
 
-# 微调(lora)+推理 qwen-7b, 需要22GB显存.
+# 微调(lora)+推理 qwen-7b, 需要27GB显存.
 # 如果你想在训练时, 将权重push到modelscope hub中, 你需要设置`--push_to_hub true`
 bash scripts/qwen_7b_chat/lora/sft.sh
 bash scripts/qwen_7b_chat/lora/infer.sh
 
-# 微调(lora+ddp)+推理 qwen-7b, 需要4卡*22GB显存.
+# 微调(lora+ddp)+推理 qwen-7b, 需要4卡*27GB显存.
 bash scripts/qwen_7b_chat/lora_ddp/sft.sh
 bash scripts/qwen_7b_chat/lora_ddp/infer.sh
 
-# 微调(qlora)+推理 qwen-7b, 需要16GB显存.
+# 微调(qlora)+推理 qwen-7b, 需要20GB显存.
 # 如果你想要使用量化, 你需要`pip install bitsandbytes -U`
 bash scripts/qwen_7b_chat/qlora/sft.sh
 bash scripts/qwen_7b_chat/qlora/infer.sh
 
-# 微调(qlora+ddp)+推理 qwen-7b, 需要4卡*16GB显存.
+# 微调(qlora+ddp)+推理 qwen-7b, 需要4卡*20GB显存.
 bash scripts/qwen_7b_chat/qlora_ddp/sft.sh
 bash scripts/qwen_7b_chat/qlora_ddp/infer.sh
 
 
@@ -1,4 +1,3 @@
-# 16G
 CUDA_VISIBLE_DEVICES=0 \
 python src/llm_infer.py \
     --model_type baichuan2-7b-chat \
 
@@ -1,6 +1,6 @@
-# 4 * 22GB VRAM
-nproc_per_node=4
-CUDA_VISIBLE_DEVICES=0,1,2,3 \
+# Experimental environment: 2 * A100
+nproc_per_node=2
+CUDA_VISIBLE_DEVICES=0,1 \
 torchrun \
     --nproc_per_node=$nproc_per_node \
     --master_port 29500 \
@@ -19,15 +19,15 @@ torchrun \
     --lora_alpha 32 \
     --lora_dropout_p 0.05 \
     --lora_target_modules W_pack o_proj \
-    --gradient_checkpointing true \
+    --gradient_checkpointing false \
     --batch_size 1 \
     --weight_decay 0. \
     --learning_rate 1e-4 \
     --gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
     --max_grad_norm 0.5 \
     --warmup_ratio 0.03 \
-    --eval_steps 50 \
-    --save_steps 50 \
+    --eval_steps 100 \
+    --save_steps 100 \
     --save_total_limit 2 \
     --logging_steps 10 \
     --push_to_hub false \
 
@@ -1,8 +1,9 @@
-# 14G
 CUDA_VISIBLE_DEVICES=0 \
 python src/llm_infer.py \
     --model_type chatglm2-6b \
     --sft_type lora \
+    --template_type chatglm2 \
+    --dtype bf16 \
     --ckpt_dir "runs/chatglm2-6b/vx_xxx/checkpoint-xxx" \
     --eval_human true \
     --max_new_tokens 1024 \
 
@@ -1,26 +1,34 @@
-# 4 * 15G
-# ddp_backend gloo: support windows
-nproc_per_node=4
-CUDA_VISIBLE_DEVICES=0,1,2,3 \
+nproc_per_node=2
+CUDA_VISIBLE_DEVICES=0,1 \
 torchrun \
     --nproc_per_node=$nproc_per_node \
     --master_port 29500 \
     src/llm_sft.py \
     --model_type chatglm2-6b \
     --sft_type lora \
+    --template_type chatglm2 \
+    --dtype bf16 \
     --output_dir runs \
-    --ddp_backend gloo \
+    --ddp_backend nccl \
     --dataset alpaca-en,alpaca-zh \
     --dataset_sample -1 \
     --num_train_epochs 1 \
     --max_length 1024 \
     --lora_rank 8 \
     --lora_alpha 32 \
     --lora_dropout_p 0.1 \
+    --gradient_checkpointing false \
     --batch_size 1 \
+    --weight_decay 0. \
     --learning_rate 1e-4 \
     --gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
-    --eval_steps 50 \
-    --save_steps 50 \
+    --max_grad_norm 0.5 \
+    --warmup_ratio 0.03 \
+    --eval_steps 100 \
+    --save_steps 100 \
     --save_total_limit 2 \
     --logging_steps 10 \
+    --push_to_hub false \
+    --hub_model_id chatglm2-6b-lora \
+    --hub_private_repo true \
+    --hub_token 'your-sdk-token' \
@@ -1,4 +1,3 @@
-# 40G
 CUDA_VISIBLE_DEVICES=0,1 \
 python src/llm_infer.py \
     --model_type llama2-7b-chat \
 
@@ -1,5 +1,4 @@
-# 44G
-# llama2 is not good at Chinese
+# Experimental environment: 2 * 3090
 CUDA_VISIBLE_DEVICES=0,1 \
 python src/llm_sft.py \
     --model_type llama2-70b-chat \
@@ -16,7 +15,7 @@ python src/llm_sft.py \
     --batch_size 1 \
     --learning_rate 1e-4 \
     --gradient_accumulation_steps 16 \
-    --eval_steps 50 \
-    --save_steps 50 \
+    --eval_steps 100 \
+    --save_steps 100 \
     --save_total_limit 2 \
     --logging_steps 10 \
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,3 @@`
`1`		`-# 16G`
`2`	`1`	`CUDA_VISIBLE_DEVICES=0 \`
`3`	`2`	`python src/llm_infer.py \`
`4`	`3`	`--model_type baichuan2-7b-chat \`
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,3 @@`
`1`		`-# 40G`
`2`	`1`	`CUDA_VISIBLE_DEVICES=0,1 \`
`3`	`2`	`python src/llm_infer.py \`
`4`	`3`	`--model_type llama2-7b-chat \`