modelscope
diff --git a/‎README.md‎
Lines changed: 4 additions & 2 deletions b/‎README.md‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎README_CN.md‎
Lines changed: 4 additions & 2 deletions b/‎README_CN.md‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎docs/source/LLM/VLLM推理加速与部署.md‎
Lines changed: 25 additions & 0 deletions b/‎docs/source/LLM/VLLM推理加速与部署.md‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎docs/source/LLM/命令行参数.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/LLM/命令行参数.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/LLM/支持的模型和数据集.md‎
Lines changed: 6 additions & 4 deletions b/‎docs/source/LLM/支持的模型和数据集.md‎
Lines changed: 6 additions & 4 deletions
diff --git a/‎docs/source/LLM/自定义与拓展.md‎
Lines changed: 11 additions & 4 deletions b/‎docs/source/LLM/自定义与拓展.md‎
Lines changed: 11 additions & 4 deletions
diff --git a/‎docs/source/cources/data_processing.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/cources/data_processing.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/pytorch/llm/scripts/yi_vl_6b_chat/lora/infer.sh‎
Lines changed: 13 additions & 0 deletions b/‎examples/pytorch/llm/scripts/yi_vl_6b_chat/lora/infer.sh‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎examples/pytorch/llm/scripts/yi_vl_6b_chat/lora/sft.sh‎
Lines changed: 31 additions & 0 deletions b/‎examples/pytorch/llm/scripts/yi_vl_6b_chat/lora/sft.sh‎
Lines changed: 31 additions & 0 deletions
diff --git a/‎swift/llm/deploy.py‎
Lines changed: 2 additions & 3 deletions b/‎swift/llm/deploy.py‎
Lines changed: 2 additions & 3 deletions
@@ -62,6 +62,7 @@ Users can check the [documentation of SWIFT](docs/source/GetStarted/快速使用
 
 
 ## 🎉 News
+- 2024.1.26: Support [yi-vl-6b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_vl_6b_chat), yi-vl-34b-chat.
 - 2024.1.24: Support codefuse-codegeex2-6b-chat, codefuse-qwen-14b-chat.
 - 2024.1.23: Support orion series: orion-14b, [orion-14b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/orion_14b_chat).
 - 2024.1.20: Support [xverse-13b-256k](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/xverse_13b_256k), xverse-65b-v2, xverse-65b-chat.
@@ -154,7 +155,8 @@ Here is a simple introduction of web-ui:
   - Multi-Modal:
     - qwen-vl series: qwen-vl, qwen-vl-chat, qwen-vl-chat-int4.
     - qwen-audio series: qwen-audio, qwen-audio-chat.
-    - cogagent series: cogagent-chat, cogagent-vqa.
+    - yi-vl series: yi-vl-6b-chat, yi-vl-34b-chat.
+    - cogagent series: cogagent-18b-chat, cogagent-18b-instruct.
   - General:
     - qwen series: qwen-1_8b, qwen-1_8b-chat, qwen-1_8b-chat-int4, qwen-1_8b-chat-int8, qwen-7b, qwen-7b-chat, qwen-7b-chat-int4, qwen-7b-chat-int8, qwen-14b, qwen-14b-chat, qwen-14b-chat-int4, qwen-14b-chat-int8, qwen-72b, qwen-72b-chat, qwen-72b-chat-int4, qwen-72b-chat-int8.
     - chatglm series: chatglm2-6b, chatglm2-6b-32k, chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k.
@@ -200,7 +202,7 @@ Here is a simple introduction of web-ui:
   - Custom Dataset
 - Supported Templates:
   - Text Generation: default-generation, default-generation-bos, chatglm-generation.
-  - Chat: default, qwen, baichuan, chatglm2, chatglm3, llama, openbuddy, internlm, internlm2, yi, yuan, xverse, ziya, skywork, bluelm, zephyr, sus, deepseek, deepseek-coder, codefuse-codellama, codefuse, cogagent.
+  - Chat: default, qwen, baichuan, chatglm2, chatglm3, llama, openbuddy, internlm, internlm2, yi, yuan, xverse, ziya, skywork, bluelm, zephyr, sus, deepseek, deepseek-coder, codefuse-codellama, codefuse, cogagent-chat, cogagent-instruct.
 
 
 ## 🔥SCEdit
 
@@ -60,6 +60,7 @@ SWIFT（Scalable lightWeight Infrastructure for Fine-Tuning）是一个可扩展
 用户可以查看 [SWIFT官方文档](docs/source/GetStarted/快速使用.md) 来了解详细信息。
 
 ## 🎉 新闻
+- 2024.1.26: 支持[yi-vl-6b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/yi_vl_6b_chat), yi-vl-34b-chat.
 - 2024.1.24: 支持codefuse-codegeex2-6b-chat, codefuse-qwen-14b-chat.
 - 2024.1.23: 支持orion系列: orion-14b, [orion-14b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/orion_14b_chat).
 - 2024.1.20: 支持[xverse-13b-256k](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/xverse_13b_256k), xverse-65b-v2, xverse-65b-chat.
@@ -154,7 +155,8 @@ swift web-ui
   - 多模态:
     - qwen-vl 系列: qwen-vl, qwen-vl-chat, qwen-vl-chat-int4.
     - qwen-audio 系列: qwen-audio, qwen-audio-chat.
-    - cogagent 系列: cogagent-chat, cogagent-vqa.
+    - yi-vl 系列: yi-vl-6b-chat, yi-vl-34b-chat.
+    - cogagent 系列: cogagent-18b-chat, cogagent-18b-instruct.
   - 通用:
     - qwen 系列: qwen-1_8b, qwen-1_8b-chat, qwen-1_8b-chat-int4, qwen-1_8b-chat-int8, qwen-7b, qwen-7b-chat, qwen-7b-chat-int4, qwen-7b-chat-int8, qwen-14b, qwen-14b-chat, qwen-14b-chat-int4, qwen-14b-chat-int8, qwen-72b, qwen-72b-chat, qwen-72b-chat-int4, qwen-72b-chat-int8.
     - chatglm 系列: chatglm2-6b, chatglm2-6b-32k, chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k.
@@ -200,7 +202,7 @@ swift web-ui
   - 自定义数据集
 - 支持的对话模板:
   - 文本生成: default-generation, default-generation-bos, chatglm-generation.
-  - 对话: default, qwen, baichuan, chatglm2, chatglm3, llama, openbuddy, internlm, internlm2, yi, yuan, xverse, ziya, skywork, bluelm, zephyr, sus, deepseek, deepseek-coder, codefuse-codellama, codefuse, cogagent.
+  - 对话: default, qwen, baichuan, chatglm2, chatglm3, llama, openbuddy, internlm, internlm2, yi, yuan, xverse, ziya, skywork, bluelm, zephyr, sus, deepseek, deepseek-coder, codefuse-codellama, codefuse, cogagent-chat, cogagent-instruct.
 
 
 ## 🔥SCEdit
 
@@ -249,6 +249,18 @@ CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen-7b-chat
 
 **客户端:**
 
+测试:
+```bash
+curl http://localhost:8000/v1/chat/completions \
+-H "Content-Type: application/json" \
+-d '{
+"model": "qwen-7b-chat",
+"messages": [{"role": "user", "content": "晚上睡不着觉怎么办？"}],
+"max_tokens": 256,
+"temperature": 0
+}'
+```
+
 使用swift:
 ```python
 from swift.llm import get_model_list_client, XRequestConfig, inference_client
@@ -340,6 +352,19 @@ CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen-7b
 
 **客户端:**
 
+测试:
+```bash
+curl http://localhost:8000/v1/completions \
+-H "Content-Type: application/json" \
+-d '{
+"model": "qwen-7b",
+"prompt": "浙江 -> 杭州\n安徽 -> 合肥\n四川 ->",
+"max_tokens": 32,
+"temperature": 0.1,
+"seed": 42
+}'
+```
+
 使用swift:
 ```python
 from swift.llm import get_model_list_client, XRequestConfig, inference_client
 
@@ -75,7 +75,7 @@
 - `--hub_token`: 推送时需要的SDK token. 可以从[https://modelscope.cn/my/myaccesstoken](https://modelscope.cn/my/myaccesstoken)获取, 默认为`None`, 即从环境变量`MODELSCOPE_API_TOKEN`中获取. 该参数只有在`push_to_hub`设置为True时才生效.
 - `--test_oom_error`: 用于检测训练是否会发生OOM, 默认为`False`. 如果设置为True, 则会将训练集按max_length倒序进行排列, 方便OOM的测试. 该参数一般用于测试, 请谨慎设置.
 - `--disable_tqdm`: 是否不启用tqdm, 这在`nohup`启动脚本时很有用. 默认为`False`, 即为启动tqdm.
-- `--lazy_tokenize`: 用于延迟对文本进行编码, 减少预处理的等待并减少内存占用, 这在处理大数据集时很有用. 默认为`False`, 即在`trainer.train()`之前提前对所有文本进行预处理.
+- `--lazy_tokenize`: 如果设置为False,  则在`trainer.train()`之前提前对所有文本进行预处理. 如果设置为True, 则延迟对文本进行编码, 减少预处理的等待并减少内存占用, 这在处理大数据集时很有用. 默认为`None`, 即我们会根据template的类型进行智能选择, LLM的模型通常设置为False, 多模态的模型通常设置为True(避免图片和音频加载导致过多的内存占用).
 - `--preprocess_num_proc`: 在对数据集预处理时(对文本进行tokenize), 使用多进程. 默认为`1`. 与`lazy_tokenize`命令行参数一样, 用于解决预处理速度慢的问题. 但该策略无法减少内存占用, 所以如果当数据集巨大时, 建议使用`lazy_tokenize`. 推荐设置的值: 4, 8. 请注意: 当使用qwen-audio时, 该参数会强制设置为1, 因为qwen-audio的预处理函数中使用了torch的多进程, 会造成不兼容问题.
 - `--use_flash_attn`: 是否使用flash attn, 默认为`None`. 安装flash_attn的步骤可以查看[https://github.com/Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention). 支持flash_attn的模型可以查看[LLM支持的模型](./支持的模型和数据集.md#模型).
 - `--ignore_args_error`: 是否忽略命令行传参错误抛出的Error, 默认为`False`. 如果需要拷贝代码到notebook中运行, 需要设置成True.
 
@@ -33,8 +33,8 @@
 |qwen-vl|[qwen/Qwen-VL](https://modelscope.cn/models/qwen/Qwen-VL/summary)|c_attn|default-generation|&#x2714;|&#x2718;||
 |qwen-vl-chat|[qwen/Qwen-VL-Chat](https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary)|c_attn|qwen|&#x2714;|&#x2718;||
 |qwen-vl-chat-int4|[qwen/Qwen-VL-Chat-Int4](https://modelscope.cn/models/qwen/Qwen-VL-Chat-Int4/summary)|c_attn|qwen|&#x2714;|&#x2718;|auto_gptq>=0.5|
-|qwen-audio|[qwen/Qwen-Audio](https://modelscope.cn/models/qwen/Qwen-Audio/summary)|c_attn|default-generation|&#x2714;|&#x2718;||
-|qwen-audio-chat|[qwen/Qwen-Audio-Chat](https://modelscope.cn/models/qwen/Qwen-Audio-Chat/summary)|c_attn|qwen|&#x2714;|&#x2718;||
+|qwen-audio|[qwen/Qwen-Audio](https://modelscope.cn/models/qwen/Qwen-Audio/summary)|c_attn|qwen-audio-generation|&#x2714;|&#x2718;||
+|qwen-audio-chat|[qwen/Qwen-Audio-Chat](https://modelscope.cn/models/qwen/Qwen-Audio-Chat/summary)|c_attn|qwen-audio|&#x2714;|&#x2718;||
 |chatglm2-6b|[ZhipuAI/chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary)|query_key_value|chatglm2|&#x2718;|&#x2714;||
 |chatglm2-6b-32k|[ZhipuAI/chatglm2-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm2-6b-32k/summary)|query_key_value|chatglm2|&#x2718;|&#x2714;||
 |chatglm3-6b-base|[ZhipuAI/chatglm3-6b-base](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-base/summary)|query_key_value|chatglm-generation|&#x2718;|&#x2714;||
@@ -53,6 +53,8 @@
 |yi-34b|[01ai/Yi-34B](https://modelscope.cn/models/01ai/Yi-34B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||
 |yi-34b-200k|[01ai/Yi-34B-200K](https://modelscope.cn/models/01ai/Yi-34B-200K/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||
 |yi-34b-chat|[01ai/Yi-34B-Chat](https://modelscope.cn/models/01ai/Yi-34B-Chat/summary)|q_proj, k_proj, v_proj|yi|&#x2714;|&#x2714;||
+|yi-vl-6b-chat|[01ai/Yi-VL-6B](https://modelscope.cn/models/01ai/Yi-VL-6B/summary)|q_proj, k_proj, v_proj|yi-vl|&#x2718;|&#x2718;|transformers>=4.34|
+|yi-vl-34b-chat|[01ai/Yi-VL-34B](https://modelscope.cn/models/01ai/Yi-VL-34B/summary)|q_proj, k_proj, v_proj|yi-vl|&#x2718;|&#x2718;|transformers>=4.34|
 |internlm-7b|[Shanghai_AI_Laboratory/internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2718;|&#x2714;||
 |internlm-7b-chat|[Shanghai_AI_Laboratory/internlm-chat-7b-v1_1](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-v1_1/summary)|q_proj, k_proj, v_proj|internlm|&#x2718;|&#x2714;||
 |internlm-7b-chat-8k|[Shanghai_AI_Laboratory/internlm-chat-7b-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary)|q_proj, k_proj, v_proj|internlm|&#x2718;|&#x2714;||
@@ -131,8 +133,8 @@
 |deepseek-coder-33b|[deepseek-ai/deepseek-coder-33b-base](https://modelscope.cn/models/deepseek-ai/deepseek-coder-33b-base/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;||
 |deepseek-coder-33b-instruct|[deepseek-ai/deepseek-coder-33b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-coder-33b-instruct/summary)|q_proj, k_proj, v_proj|deepseek-coder|&#x2714;|&#x2714;||
 |phi2-3b|[AI-ModelScope/phi-2](https://modelscope.cn/models/AI-ModelScope/phi-2/summary)|Wqkv|default-generation|&#x2714;|&#x2714;||
-|cogagent-chat|[ZhipuAI/cogagent-chat](https://modelscope.cn/models/ZhipuAI/cogagent-chat/summary)|vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense|cogagent|&#x2718;|&#x2718;||
-|cogagent-vqa|[ZhipuAI/cogagent-vqa](https://modelscope.cn/models/ZhipuAI/cogagent-vqa/summary)|vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense|cogagent|&#x2718;|&#x2718;||
+|cogagent-18b-chat|[ZhipuAI/cogagent-chat](https://modelscope.cn/models/ZhipuAI/cogagent-chat/summary)|vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense|cogagent-chat|&#x2718;|&#x2718;||
+|cogagent-18b-instruct|[ZhipuAI/cogagent-vqa](https://modelscope.cn/models/ZhipuAI/cogagent-vqa/summary)|vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense|cogagent-instruct|&#x2718;|&#x2718;||
 
 
 ## 数据集
 
@@ -125,12 +125,19 @@ AAAAA,BBBBB,CCCCC
 {"query": "AAAAA", "response": "BBBBB", "rejected_response": "CCCCC"}
 ```
 
-**CogAgent模型**
+**CogAgent 系列**
 
 ```jsonl
-{"query": "55555", "response": "66666", "image": "some-local-image-path"}
-{"query": "eeeee", "response": "fffff", "history": [], "image": "some-http-image-path"}
-{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "image": "some-local-image-path"}
+{"query": "55555", "response": "66666", "images": ["image_path"]}
+{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
+{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path"]}
+```
+
+**Yi-VL 系列**
+```jsonl
+{"query": "55555", "response": "66666", "images": ["image_path"]}
+{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
+{"query": "EEEEE", "response": "FFFFF", "history": [["AAAAA", "BBBBB"], ["CCCCC", "DDDDD"]], "images": ["image_path", "image_path2", "image_path3"]}
 ```
 
 image字段支持本地图片文件和http可访问的image url两类。
 
@@ -140,7 +140,7 @@ template: Template = get_template(
     'qwen',
     tokenizer,
     max_length=256)
-resp = template.encode({'query': 'How are you?', "response": "I am fine"})
+resp = template.encode({'query': 'How are you?', "response": "I am fine"})[0]
 print(resp)
 # {'input_ids': [151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 4340, 525, 498, 30, 151645, 198, 151644, 77091, 198, 40, 1079, 6915, 151645], 'labels': [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 40, 1079, 6915, 151645]}
 ```
 
@@ -0,0 +1,13 @@
+# Experimental environment: V100, A10, 3090
+CUDA_VISIBLE_DEVICES=0 \
+swift infer \
+    --ckpt_dir "output/yi-vl-6b-chat/vx_xxx/checkpoint-xxx" \
+    --load_dataset_config true \
+    --max_length 2048 \
+    --use_flash_attn false \
+    --max_new_tokens 2048 \
+    --temperature 0.5 \
+    --top_p 0.7 \
+    --repetition_penalty 1. \
+    --do_sample true \
+    --merge_lora_and_save false \
@@ -0,0 +1,31 @@
+# Experimental environment: V100, A10, 3090
+# 18GB GPU memory
+CUDA_VISIBLE_DEVICES=0 \
+swift sft \
+    --model_type yi-vl-6b-chat \
+    --sft_type lora \
+    --tuner_backend swift \
+    --template_type AUTO \
+    --dtype AUTO \
+    --output_dir output \
+    --dataset coco-mini-en \
+    --train_dataset_sample -1 \
+    --num_train_epochs 1 \
+    --max_length 2048 \
+    --check_dataset_strategy warning \
+    --lora_rank 8 \
+    --lora_alpha 32 \
+    --lora_dropout_p 0.05 \
+    --lora_target_modules DEFAULT \
+    --gradient_checkpointing true \
+    --batch_size 1 \
+    --weight_decay 0.01 \
+    --learning_rate 1e-4 \
+    --gradient_accumulation_steps 16 \
+    --max_grad_norm 0.5 \
+    --warmup_ratio 0.03 \
+    --eval_steps 100 \
+    --save_steps 100 \
+    --save_total_limit 2 \
+    --logging_steps 10 \
+    --use_flash_attn false \
@@ -1,6 +1,5 @@
 # Copyright (c) Alibaba, Inc. and its affiliates.
 import time
-from copy import deepcopy
 from dataclasses import asdict
 from http import HTTPStatus
 from typing import List, Optional, Union
@@ -88,7 +87,7 @@ async def inference_vllm_async(request: Union[ChatCompletionRequest,
                 f'the model `{llm_engine.model_type}` is in text generation format. '
                 'Please use the `completions` API.')
         example = messages_to_history(request.messages)
-        input_ids = template.encode(example)['input_ids']
+        input_ids = template.encode(example)[0]['input_ids']
         request_id = f'chatcmpl-{random_uuid()}'
     else:
         if not is_generation_template(template.template_type):
@@ -97,7 +96,7 @@ async def inference_vllm_async(request: Union[ChatCompletionRequest,
                 f'The chat template `{template.template_type}` corresponding to '
                 f'the model `{llm_engine.model_type}` is in chat format. '
                 'Please use the `chat.completions` API.')
-        input_ids = template.encode({'query': request.prompt})['input_ids']
+        input_ids = template.encode({'query': request.prompt})[0]['input_ids']
         request_id = f'cmpl-{random_uuid()}'
 
     error_msg = await check_length(request, input_ids)