Support internlm xcomposer2 (#354)

Jintao-Huang · web-flow · commit 82a5ae72a4da · 2024-01-30T23:06:19.000+08:00
diff --git a/README.md b/README.md
@@ -62,7 +62,8 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用
 
 
 ## 🎉 News
-- 2024.1.30: Support ZeRO-3, just need to specify `--deepspeed_config_path default-zero3`.
+- 2024.1.30: Support [internlm-xcomposer2-7b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/internlm_xcomposer2_7b_chat).
+- 2024.1.30: Support [ZeRO-3](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/), just need to specify `--deepspeed_config_path default-zero3`.
 - 2024.1.29: Support internlm2-math series: internlm2-math-7b, internlm2-math-7b-chat, internlm2-math-20b, internlm2-math-20b-chat.
 - 🔥2024.1.26: Support [yi-vl-6b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_vl_6b_chat), yi-vl-34b-chat.
 - 2024.1.24: Support codefuse-codegeex2-6b-chat, codefuse-qwen-14b-chat.
@@ -198,6 +199,7 @@ app_ui_main(infer_args)
     - [qwen-audio](https://github.com/QwenLM/Qwen-Audio) series: qwen-audio, qwen-audio-chat.
     - [yi-vl](https://github.com/01-ai/Yi) series: yi-vl-6b-chat, yi-vl-34b-chat.
     - [cogagent](https://github.com/THUDM/CogVLM) series: cogagent-18b-chat, cogagent-18b-instruct.
+    - [internlm-xcomposer2](https://github.com/InternLM/InternLM-XComposer) series: internlm-xcomposer2-7b-chat.
   - General:
     - [qwen](https://github.com/QwenLM/Qwen) series: qwen-1_8b, qwen-1_8b-chat, qwen-1_8b-chat-int4, qwen-1_8b-chat-int8, qwen-7b, qwen-7b-chat, qwen-7b-chat-int4, qwen-7b-chat-int8, qwen-14b, qwen-14b-chat, qwen-14b-chat-int4, qwen-14b-chat-int8, qwen-72b, qwen-72b-chat, qwen-72b-chat-int4, qwen-72b-chat-int8.
     - [chatglm](https://github.com/THUDM/ChatGLM-6B) series: chatglm2-6b, chatglm2-6b-32k, chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k.
@@ -246,7 +248,7 @@ app_ui_main(infer_args)
   - Custom Dataset
 - Supported Templates:
   - Text Generation: default-generation, default-generation-bos, chatglm-generation.
-  - Chat: default, qwen, baichuan, chatglm2, chatglm3, llama, openbuddy, internlm, internlm2, yi, yuan, xverse, ziya, skywork, bluelm, zephyr, sus, deepseek, deepseek-coder, codefuse-codellama, codefuse, cogagent-chat, cogagent-instruct, yi-vl.
+  - Chat: default, qwen, baichuan, chatglm2, chatglm3, llama, openbuddy, internlm, internlm2, yi, yuan, xverse, ziya, skywork, bluelm, zephyr, sus, deepseek, deepseek-coder, codefuse-codellama, codefuse, cogagent-chat, cogagent-instruct, yi-vl, internlm-xcomposer2.
 
 
 ## 🔥SCEdit
@@ -464,7 +466,6 @@ You can contact and communicate with us by joining our WeChat Group:
 <img src="asset/wechat.png" width="250" style="display: inline-block;">
 </p>
 
-
 ## Star History
 
 [![Star History Chart](https://api.star-history.com/svg?repos=modelscope/swift&type=Date)](https://star-history.com/#modelscope/swift&Date)
diff --git a/README_CN.md b/README_CN.md
@@ -60,9 +60,10 @@ SWIFT（Scalable lightWeight Infrastructure for Fine-Tuning）是一个可扩展
 用户可以查看 [SWIFT官方文档](docs/source/GetStarted/快速使用.md) 来了解详细信息。
 
 ## 🎉 新闻
-- 2024.1.30: 支持ZeRO-3, 只需要指定`--deepspeed_config_path default-zero3`即可.
+- 2024.1.30: 支持[internlm-xcomposer2-7b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/internlm_xcomposer2_7b_chat).
+- 2024.1.30: 支持[ZeRO-3](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/), 只需要指定`--deepspeed_config_path default-zero3`即可.
 - 2024.1.29: 支持internlm2-math系列: internlm2-math-7b, internlm2-math-7b-chat, internlm2-math-20b, internlm2-math-20b-chat.
-- 2024.1.26: 支持[yi-vl-6b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_vl_6b_chat), yi-vl-34b-chat.
+- 🔥2024.1.26: 支持[yi-vl-6b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_vl_6b_chat), yi-vl-34b-chat.
 - 2024.1.24: 支持codefuse-codegeex2-6b-chat, codefuse-qwen-14b-chat.
 - 2024.1.23: 支持orion系列: orion-14b, [orion-14b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/orion_14b_chat).
 - 2024.1.20: 支持[xverse-13b-256k](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/xverse_13b_256k), xverse-65b-v2, xverse-65b-chat.
@@ -198,6 +199,7 @@ app_ui_main(infer_args)
     - [qwen-audio](https://github.com/QwenLM/Qwen-Audio) 系列: qwen-audio, qwen-audio-chat.
     - [yi-vl](https://github.com/01-ai/Yi) 系列: yi-vl-6b-chat, yi-vl-34b-chat.
     - [cogagent](https://github.com/THUDM/CogVLM) 系列: cogagent-18b-chat, cogagent-18b-instruct.
+    - [internlm-xcomposer2](https://github.com/InternLM/InternLM-XComposer) 系列: internlm-xcomposer2-7b-chat.
   - 通用:
     - [qwen](https://github.com/QwenLM/Qwen) 系列: qwen-1_8b, qwen-1_8b-chat, qwen-1_8b-chat-int4, qwen-1_8b-chat-int8, qwen-7b, qwen-7b-chat, qwen-7b-chat-int4, qwen-7b-chat-int8, qwen-14b, qwen-14b-chat, qwen-14b-chat-int4, qwen-14b-chat-int8, qwen-72b, qwen-72b-chat, qwen-72b-chat-int4, qwen-72b-chat-int8.
     - [chatglm](https://github.com/THUDM/ChatGLM-6B) 系列: chatglm2-6b, chatglm2-6b-32k, chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k.
@@ -246,7 +248,7 @@ app_ui_main(infer_args)
   - 自定义数据集
 - 支持的对话模板:
   - 文本生成: default-generation, default-generation-bos, chatglm-generation.
-  - 对话: default, qwen, baichuan, chatglm2, chatglm3, llama, openbuddy, internlm, internlm2, yi, yuan, xverse, ziya, skywork, bluelm, zephyr, sus, deepseek, deepseek-coder, codefuse-codellama, codefuse, cogagent-chat, cogagent-instruct, yi-vl.
+  - 对话: default, qwen, baichuan, chatglm2, chatglm3, llama, openbuddy, internlm, internlm2, yi, yuan, xverse, ziya, skywork, bluelm, zephyr, sus, deepseek, deepseek-coder, codefuse-codellama, codefuse, cogagent-chat, cogagent-instruct, yi-vl, internlm-xcomposer2.
 
 
 ## 🔥SCEdit
diff --git a/docs/source/LLM/支持的模型和数据集.md b/docs/source/LLM/支持的模型和数据集.md
@@ -72,6 +72,7 @@
 |internlm2-math-7b-chat|[Shanghai_AI_Laboratory/internlm2-math-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-7b/summary)|wqkv|internlm2|&#x2714;|&#x2718;||
 |internlm2-math-20b|[Shanghai_AI_Laboratory/internlm2-math-base-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-base-20b/summary)|wqkv|default-generation-bos|&#x2714;|&#x2718;||
 |internlm2-math-20b-chat|[Shanghai_AI_Laboratory/internlm2-math-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-20b/summary)|wqkv|internlm2|&#x2714;|&#x2718;||
+|internlm-xcomposer2-7b-chat|[Shanghai_AI_Laboratory/internlm-xcomposer2-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-7b/summary)|wqkv|internlm-xcomposer2|&#x2714;|&#x2718;||
 |deepseek-7b|[deepseek-ai/deepseek-llm-7b-base](https://modelscope.cn/models/deepseek-ai/deepseek-llm-7b-base/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;||
 |deepseek-7b-chat|[deepseek-ai/deepseek-llm-7b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-llm-7b-chat/summary)|q_proj, k_proj, v_proj|deepseek|&#x2714;|&#x2714;||
 |deepseek-moe-16b|[deepseek-ai/deepseek-moe-16b-base](https://modelscope.cn/models/deepseek-ai/deepseek-moe-16b-base/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2718;||
diff --git a/docs/source/LLM/自定义与拓展.md b/docs/source/LLM/自定义与拓展.md
@@ -99,9 +99,17 @@ AAAAA,BBBBB,CCCCC
 {"messages": [{"role": "user", "content": "AAAAA"}, {"role": "assistant", "content": "BBBBB"}, {"role": "user", "content": "CCCCC"}, {"role": "assistant", "content": "DDDDD"}]}
 ```
 
-**Qwen-VL 系列**
+**强化学习（DPO）**
+
+```jsonl
+{"query": "11111", "response": "22222", "rejected_response": "33333"}
+{"query": "aaaaa", "response": "bbbbb", "rejected_response": "ccccc"}
+{"query": "AAAAA", "response": "BBBBB", "rejected_response": "CCCCC"}
+```
+
+**Qwen-VL, Internlm-XComposer2 系列**
 
-输入格式兼容: [qwen-vl github](https://github.com/QwenLM/Qwen-VL#data-preparation), 且同样支持csv, json, jsonl格式.
+输入格式兼容: [qwen-vl github](https://github.com/QwenLM/Qwen-VL#data-preparation), 且同样支持csv, json, jsonl格式. img_path支持本地路径和url两类.
 
 ```json
 [{"conversations": [{"from": "user", "value": "Picture 1:<img>img_path</img>\n11111"}, {"from": "assistant", "value": "22222"}]},
@@ -117,12 +125,12 @@ AAAAA,BBBBB,CCCCC
 {"conversations": [{"from": "user", "value": "AAAAA"}, {"from": "assistant", "value": "BBBBB"}, {"from": "user", "value": "CCCCC"}, {"from": "assistant", "value": "DDDDD"}]}
 ```
 
-**强化学习（DPO）**
+**Yi-VL 系列**
 
 ```jsonl
-{"query": "11111", "response": "22222", "rejected_response": "33333"}
-{"query": "aaaaa", "response": "bbbbb", "rejected_response": "ccccc"}
-{"query": "AAAAA", "response": "BBBBB", "rejected_response": "CCCCC"}
+{"query": "55555", "response": "66666", "images": ["image_path"]}
+{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
+{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path", "image_path2", "image_path3"]}
 ```
 
 **CogAgent 系列**
@@ -133,15 +141,6 @@ AAAAA,BBBBB,CCCCC
 {"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path"]}
 ```
 
-**Yi-VL 系列**
-```jsonl
-{"query": "55555", "response": "66666", "images": ["image_path"]}
-{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
-{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path", "image_path2", "image_path3"]}
-```
-
-image字段支持本地图片文件和http可访问的image url两类。
-
 ### 注册数据集的方式
 
 以下是一个**注册数据集**的案例. 完整的py文件可以查看[custom.py](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/custom.py), sh脚本可以查看[custom](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/custom).
diff --git a/examples/pytorch/llm/scripts/internlm_xcomposer2_7b_chat/lora/infer.sh b/examples/pytorch/llm/scripts/internlm_xcomposer2_7b_chat/lora/infer.sh
@@ -0,0 +1,13 @@
+# Experimental environment: V100, A10, 3090
+CUDA_VISIBLE_DEVICES=0 \
+swift infer \
+    --ckpt_dir "output/internlm-xcomposer2-7b-chat/vx_xxx/checkpoint-xxx" \
+    --load_dataset_config true \
+    --max_length 2048 \
+    --use_flash_attn false \
+    --max_new_tokens 2048 \
+    --temperature 0.5 \
+    --top_p 0.7 \
+    --repetition_penalty 1. \
+    --do_sample true \
+    --merge_lora_and_save false \
diff --git a/examples/pytorch/llm/scripts/internlm_xcomposer2_7b_chat/lora/sft.sh b/examples/pytorch/llm/scripts/internlm_xcomposer2_7b_chat/lora/sft.sh
@@ -0,0 +1,31 @@
+# Experimental environment: V100, A10, 3090
+# 21GB GPU memory
+CUDA_VISIBLE_DEVICES=0 \
+swift sft \
+    --model_type internlm-xcomposer2-7b-chat \
+    --sft_type lora \
+    --tuner_backend swift \
+    --template_type AUTO \
+    --dtype AUTO \
+    --output_dir output \
+    --dataset coco-mini-en \
+    --train_dataset_sample -1 \
+    --num_train_epochs 1 \
+    --max_length 2048 \
+    --check_dataset_strategy warning \
+    --lora_rank 8 \
+    --lora_alpha 32 \
+    --lora_dropout_p 0.05 \
+    --lora_target_modules DEFAULT \
+    --gradient_checkpointing true \
+    --batch_size 1 \
+    --weight_decay 0.01 \
+    --learning_rate 1e-4 \
+    --gradient_accumulation_steps 16 \
+    --max_grad_norm 0.5 \
+    --warmup_ratio 0.03 \
+    --eval_steps 100 \
+    --save_steps 100 \
+    --save_total_limit 2 \
+    --logging_steps 10 \
+    --use_flash_attn false \
diff --git a/examples/pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/infer.sh b/examples/pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/infer.sh
@@ -0,0 +1,13 @@
+# Experimental environment: A100
+CUDA_VISIBLE_DEVICES=0 \
+swift infer \
+    --ckpt_dir "output/qwen-14b-chat/vx_xxx/checkpoint-xxx" \
+    --load_dataset_config true \
+    --max_length 2048 \
+    --use_flash_attn true \
+    --max_new_tokens 2048 \
+    --temperature 0.1 \
+    --top_p 0.7 \
+    --repetition_penalty 1. \
+    --do_sample true \
+    --merge_lora_and_save false \
diff --git a/examples/pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/sft.sh b/examples/pytorch/llm/scripts/qwen_14b_chat/full_ddp_zero3/sft.sh
@@ -0,0 +1,33 @@
+# Experimental environment: 8 * A100
+nproc_per_node=8
+NPROC_PER_NODE=$nproc_per_node \
+MASTER_PORT=29500 \
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+swift sft \
+    --model_id_or_path qwen/Qwen-14B-Chat \
+    --model_revision master \
+    --sft_type full \
+    --tuner_backend swift \
+    --template_type AUTO \
+    --dtype AUTO \
+    --output_dir output \
+    --ddp_backend nccl \
+    --dataset blossom-math-zh \
+    --train_dataset_sample -1 \
+    --num_train_epochs 5 \
+    --max_length 2048 \
+    --check_dataset_strategy warning \
+    --gradient_checkpointing true \
+    --batch_size 1 \
+    --weight_decay 0.01 \
+    --learning_rate 1e-4 \
+    --gradient_accumulation_steps $(expr 64 / $nproc_per_node) \
+    --max_grad_norm 0.5 \
+    --warmup_ratio 0.03 \
+    --eval_steps 100 \
+    --save_steps 100 \
+    --save_total_limit 2 \
+    --logging_steps 10 \
+    --use_flash_attn true \
+    --deepspeed_config_path 'default-zero3' \
+    --save_only_model true \
diff --git a/examples/pytorch/llm/scripts/qwen_7b_chat/full_ddp_zero3/infer.sh b/examples/pytorch/llm/scripts/qwen_7b_chat/full_ddp_zero3/infer.sh
@@ -0,0 +1,13 @@
+# Experimental environment: A100
+CUDA_VISIBLE_DEVICES=0 \
+swift infer \
+    --ckpt_dir "output/qwen-7b-chat/vx_xxx/checkpoint-xxx" \
+    --load_dataset_config true \
+    --max_length 2048 \
+    --use_flash_attn true \
+    --max_new_tokens 2048 \
+    --temperature 0.1 \
+    --top_p 0.7 \
+    --repetition_penalty 1. \
+    --do_sample true \
+    --merge_lora_and_save false \
diff --git a/examples/pytorch/llm/scripts/qwen_7b_chat/full_ddp_zero3/sft.sh b/examples/pytorch/llm/scripts/qwen_7b_chat/full_ddp_zero3/sft.sh
@@ -0,0 +1,33 @@
+# Experimental environment: 8 * A100
+nproc_per_node=8
+NPROC_PER_NODE=$nproc_per_node \
+MASTER_PORT=29500 \
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+swift sft \
+    --model_id_or_path qwen/Qwen-7B-Chat \
+    --model_revision master \
+    --sft_type full \
+    --tuner_backend swift \
+    --template_type AUTO \
+    --dtype AUTO \
+    --output_dir output \
+    --ddp_backend nccl \
+    --dataset blossom-math-zh \
+    --train_dataset_sample -1 \
+    --num_train_epochs 5 \
+    --max_length 2048 \
+    --check_dataset_strategy warning \
+    --gradient_checkpointing true \
+    --batch_size 1 \
+    --weight_decay 0.01 \
+    --learning_rate 1e-4 \
+    --gradient_accumulation_steps $(expr 64 / $nproc_per_node) \
+    --max_grad_norm 0.5 \
+    --warmup_ratio 0.03 \
+    --eval_steps 100 \
+    --save_steps 100 \
+    --save_total_limit 2 \
+    --logging_steps 10 \
+    --use_flash_attn true \
+    --deepspeed_config_path 'default-zero3' \
+    --save_only_model true \
diff --git a/scripts/utils/run_model_info.py b/scripts/utils/run_model_info.py
@@ -80,7 +80,14 @@ def get_model_info_readme_en(data: List[str]) -> None:
     model_list = []
     for match in match_list:
         model_list += match[2].strip('.').split(',')
-    model_list = [model.strip() for model in model_list]
+    model_list_2 = []
+    for model in model_list:
+        model = model.strip()
+        model_match = re.search(r'\[(.+)\]\(.+\)', model)
+        if model_match is not None:
+            model = model_match.group(1)
+        model_list_2.append(model)
+    model_list = model_list_2
     model_type_list = [d[0] for d in data]
     print(set(model_type_list) - set(model_list))
     print(set(model_list) - set(model_type_list))
diff --git a/swift/llm/sft.py b/swift/llm/sft.py
@@ -189,6 +189,7 @@ def llm_sft(args: SftArguments) -> Dict[str, Union[str, Any]]:
         fp16=args.fp16,
         eval_steps=args.eval_steps,
         dataloader_num_workers=args.dataloader_num_workers,
+        dataloader_pin_memory=args.dataloader_pin_memory,
         load_best_model_at_end=load_best_model_at_end,
         metric_for_best_model='rouge-l'
         if args.predict_with_generate else 'loss',
diff --git a/swift/llm/utils/argument.py b/swift/llm/utils/argument.py
@@ -120,6 +120,7 @@ class SftArguments:
     save_total_limit: int = 2  # save last and best. -1: all checkpoints
     logging_steps: int = 5
     dataloader_num_workers: int = 1
+    dataloader_pin_memory: bool = True
 
     push_to_hub: bool = False
     # 'user_name/repo_name' or 'repo_name'
@@ -311,9 +312,21 @@ def __post_init__(self) -> None:
         if self.gradient_accumulation_steps is None:
             self.gradient_accumulation_steps = math.ceil(16 / self.batch_size
                                                          / world_size)
+        template_info = TEMPLATE_MAPPING[self.template_type]
         if self.lazy_tokenize is None:
-            template_info = TEMPLATE_MAPPING[self.template_type]
             self.lazy_tokenize = template_info.get('lazy_tokenize', False)
+            logger.info(f'Setting args.lazy_tokenize: {self.lazy_tokenize}')
+        if 'dataloader_num_workers' in template_info:
+            self.dataloader_num_workers = template_info[
+                'dataloader_num_workers']
+            logger.info(
+                f'Setting args.dataloader_num_workers: {self.dataloader_num_workers}'
+            )
+        if 'dataloader_pin_memory' in template_info:
+            self.dataloader_pin_memory = template_info['dataloader_pin_memory']
+            logger.info(
+                f'Setting args.dataloader_pin_memory: {self.dataloader_pin_memory}'
+            )
         if 'qwen-audio' in self.model_type:
             assert self.preprocess_num_proc == 1 or self.lazy_tokenize, 'not support'
         model_info = MODEL_MAPPING[self.model_type]
@@ -469,10 +482,11 @@ def __post_init__(self) -> None:
                 assert self.merge_lora_and_save is True, (
                     'To use VLLM, you need to provide the complete weight parameters. '
                     'Please set --merge_lora_and_save true.')
-        if self.num_beams != 1:
+        template_info = TEMPLATE_MAPPING[self.template_type]
+        support_stream = template_info.get('support_stream', True)
+        if self.num_beams != 1 or not support_stream:
             self.stream = False
             logger.info('Setting self.stream: False')
-        template_info = TEMPLATE_MAPPING[self.template_type]
         self.infer_media_type = template_info.get('infer_media_type', 'none')
 
     @staticmethod
diff --git a/swift/llm/utils/model.py b/swift/llm/utils/model.py
@@ -102,6 +102,8 @@ class ModelType:
     internlm2_math_7b_chat = 'internlm2-math-7b-chat'
     internlm2_math_20b = 'internlm2-math-20b'
     internlm2_math_20b_chat = 'internlm2-math-20b-chat'
+    # internlm-xcomposer2
+    internlm_xcomposer2_7b_chat = 'internlm-xcomposer2-7b-chat'
     # deepseek
     deepseek_7b = 'deepseek-7b'
     deepseek_7b_chat = 'deepseek-7b-chat'
@@ -1023,6 +1025,41 @@ def get_model_tokenizer_internlm2(model_dir: str,
     return model, tokenizer
 
 
+@register_model(
+    ModelType.internlm_xcomposer2_7b_chat,
+    'Shanghai_AI_Laboratory/internlm-xcomposer2-7b',
+    LoRATM.internlm2,
+    TemplateType.internlm_xcomposer2,
+    eos_token='[UNUSED_TOKEN_145]',
+    support_flash_attn=True)
+def get_model_tokenizer_internlm_xcomposer2(model_dir: str,
+                                            torch_dtype: Dtype,
+                                            model_kwargs: Dict[str, Any],
+                                            load_model: bool = True,
+                                            **kwargs):
+    model_config = AutoConfig.from_pretrained(
+        model_dir, trust_remote_code=True)
+    use_flash_attn = kwargs.pop('use_flash_attn', False)
+    model_config._flash_attn_2_enabled = use_flash_attn
+
+    eos_token = kwargs.pop('eos_token', None)
+    model, tokenizer = get_model_tokenizer_from_repo(
+        model_dir,
+        torch_dtype,
+        model_kwargs,
+        load_model,
+        model_config=model_config,
+        **kwargs)
+    if eos_token is not None:
+        if getattr(tokenizer.__class__.eos_token_id, 'fset', None) is None:
+            del tokenizer.__class__.eos_token_id
+        tokenizer.eos_token = eos_token
+    if model is not None and use_flash_attn:
+        # fix AttributeError: no attribute 'attention_dropout'
+        model.model.layers[0].attention.__class__.attention_dropout = 0.
+    return model, tokenizer
+
+
 @register_model(
     ModelType.llama2_7b,
     'modelscope/Llama-2-7b-ms',
diff --git a/swift/llm/utils/template.py b/swift/llm/utils/template.py
diff --git a/swift/llm/utils/utils.py b/swift/llm/utils/utils.py
diff --git a/tests/llm/test_run.py b/tests/llm/test_run.py