fix bug: internlm-20b (#75)

Jintao-Huang · web-flow · commit 1a8819b2e2b7 · 2023-09-18T20:07:23.000+08:00
diff --git a/examples/pytorch/llm/README.md b/examples/pytorch/llm/README.md
@@ -23,7 +23,7 @@
    4. chatglm2 series: chatglm2-6b, chatglm2-6b-32k
    5. llama series: llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat
    6. openbuddy-llama series: openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b
-   7. internlm series: internlm-7b, internlm-7b-chat, internlm-7b-chat-8k
+   7. internlm series: internlm-7b, internlm-7b-chat, internlm-7b-chat-8k, internlm-20b, internlm-20-chat
    8. other: polylm-13b, seqgpt-560m
 3. supported features: quantization, DDP, model parallelism(device map), gradient checkpointing, gradient accumulation, pushing to modelscope hub, custom datasets, multimodal and agent SFT, mutli-round chat, ...
 4. supported datasets:
diff --git a/examples/pytorch/llm/README_CN.md b/examples/pytorch/llm/README_CN.md
@@ -24,7 +24,7 @@
    4. chatglm2 系列: chatglm2-6b, chatglm2-6b-32k
    5. llama 系列: llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat
    6. openbuddy-llama 系列: openbuddy-llama2-13b, openbuddy-llama-65b, openbuddy-llama2-70b
-   7. internlm 系列: internlm-7b, internlm-7b-chat, internlm-7b-chat-8k
+   7. internlm 系列: internlm-7b, internlm-7b-chat, internlm-7b-chat-8k, internlm-20b, internlm-20-chat
    8. other: polylm-13b, seqgpt-560m
 3. 支持的特性: 模型量化, DDP, 模型并行(device_map), gradient checkpointing, 梯度累加, 支持推送ModelScope Hub, 自定义数据集, 多模态和Agent SFT, 多轮对话, ...
 4. 支持的数据集:
diff --git a/examples/pytorch/llm/scripts/internlm_20b/qlora_ddp/infer.sh b/examples/pytorch/llm/scripts/internlm_20b/qlora_ddp/infer.sh
diff --git a/examples/pytorch/llm/scripts/internlm_20b_chat/lora_ddp/infer.sh b/examples/pytorch/llm/scripts/internlm_20b_chat/lora_ddp/infer.sh
@@ -0,0 +1,15 @@
+CUDA_VISIBLE_DEVICES=0 \
+python src/llm_infer.py \
+    --model_type internlm-20b-chat \
+    --sft_type lora \
+    --template_type internlm \
+    --dtype bf16 \
+    --ckpt_dir "output/internlm-20b-chat/vx_xxx/checkpoint-xxx" \
+    --eval_human false \
+    --dataset damo-agent-mini-zh \
+    --max_length 4096 \
+    --max_new_tokens 2048 \
+    --temperature 0.9 \
+    --top_k 20 \
+    --top_p 0.9 \
+    --do_sample true \
diff --git a/examples/pytorch/llm/scripts/internlm_20b_chat/lora_ddp/sft.sh b/examples/pytorch/llm/scripts/internlm_20b_chat/lora_ddp/sft.sh
@@ -1,28 +1,26 @@
-# Experimental environment: 2 * A10
-# 2 * 20GB GPU memory
+# Experimental environment: 2 * A100
+# 2 * 60GB GPU memory
 nproc_per_node=2
 CUDA_VISIBLE_DEVICES=0,1 \
 torchrun \
     --nproc_per_node=$nproc_per_node \
     --master_port 29500 \
     src/llm_sft.py \
-    --model_type internlm-20b \
+    --model_type internlm-20b-chat \
     --sft_type lora \
-    --template_type default-generation \
+    --template_type internlm \
     --dtype bf16 \
     --output_dir output \
     --ddp_backend nccl \
-    --dataset cmnli-zh \
+    --dataset damo-agent-mini-zh \
     --train_dataset_sample 20000 \
     --num_train_epochs 1 \
-    --max_length 2048 \
+    --max_length 4096 \
     --lora_rank 8 \
     --lora_alpha 32 \
     --lora_dropout_p 0. \
     --lora_target_modules ALL \
-    --quantization_bit 4 \
-    --bnb_4bit_comp_dtype bf16 \
-    --gradient_checkpointing false \
+    --gradient_checkpointing true \
     --batch_size 1 \
     --weight_decay 0. \
     --learning_rate 1e-4 \
@@ -34,6 +32,6 @@ torchrun \
     --save_total_limit 2 \
     --logging_steps 10 \
     --push_to_hub false \
-    --hub_model_id internlm-20b-lora \
+    --hub_model_id internlm-20b-chat-lora \
     --hub_private_repo true \
     --hub_token 'your-sdk-token' \
diff --git a/examples/pytorch/llm/src/llm_sft.py b/examples/pytorch/llm/src/llm_sft.py
@@ -90,8 +90,8 @@ def llm_sft(args: SftArguments) -> None:
         args.dataset.split(','), args.dataset_test_ratio,
         args.dataset_split_seed)
     if args.train_dataset_sample >= 0:
-        val_dataset_sample = int(args.train_dataset_sample
-                                 * args.dataset_test_ratio)
+        val_dataset_sample = max(
+            int(args.train_dataset_sample * args.dataset_test_ratio), 1)
         train_idxs = np.random.permutation(args.train_dataset_sample)
         train_dataset = train_dataset.select(train_idxs)
         if val_dataset.shape[0] > val_dataset_sample: