modelscope
diff --git a/‎README.md‎
Lines changed: 8 additions & 4 deletions b/‎README.md‎
Lines changed: 8 additions & 4 deletions
diff --git a/‎README_CN.md‎
Lines changed: 8 additions & 4 deletions b/‎README_CN.md‎
Lines changed: 8 additions & 4 deletions
diff --git a/‎docs/source/Customization/插件化.md‎
Lines changed: 0 additions & 5 deletions b/‎docs/source/Customization/插件化.md‎
Lines changed: 0 additions & 5 deletions
diff --git a/‎docs/source/Customization/自定义数据集.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/Customization/自定义数据集.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source/GetStarted/快速开始.md‎
Lines changed: 4 additions & 2 deletions b/‎docs/source/GetStarted/快速开始.md‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎docs/source/Instruction/ReleaseNote3.0.md‎
Lines changed: 0 additions & 1 deletion b/‎docs/source/Instruction/ReleaseNote3.0.md‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎docs/source/Instruction/命令行参数.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/source/Instruction/命令行参数.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/source_en/Customization/Custom-dataset.md‎
Lines changed: 7 additions & 6 deletions b/‎docs/source_en/Customization/Custom-dataset.md‎
Lines changed: 7 additions & 6 deletions
diff --git a/‎docs/source_en/Customization/Pluginization.md‎
Lines changed: 0 additions & 6 deletions b/‎docs/source_en/Customization/Pluginization.md‎
Lines changed: 0 additions & 6 deletions
diff --git a/‎docs/source_en/GetStarted/Quick-start.md‎
Lines changed: 4 additions & 2 deletions b/‎docs/source_en/GetStarted/Quick-start.md‎
Lines changed: 4 additions & 2 deletions
@@ -146,7 +146,8 @@ After training is complete, use the following command to perform inference with
 CUDA_VISIBLE_DEVICES=0 \
 swift infer \
     --adapters output/vx-xxx/checkpoint-xxx \
-    --stream true
+    --stream true \
+    --max_new_tokens 2048
 
 # merge-lora and use vLLM for inference acceleration
 CUDA_VISIBLE_DEVICES=0 \
@@ -155,7 +156,8 @@ swift infer \
     --stream true \
     --merge_lora true \
     --infer_backend vllm \
-    --max_model_len 8192
+    --max_model_len 8192 \
+    --max_new_tokens 2048
 ```
 
 ### Web-UI
@@ -262,15 +264,17 @@ CUDA_VISIBLE_DEVICES=0 swift rlhf \
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --model Qwen/Qwen2.5-7B-Instruct \
     --stream true \
-    --infer_backend pt
+    --infer_backend pt \
+    --max_new_tokens 2048
 
 # LoRA
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --model Qwen/Qwen2.5-7B-Instruct \
     --adapters swift/test_lora \
     --stream true \
     --infer_backend pt \
-    --temperature 0
+    --temperature 0 \
+    --max_new_tokens 2048
 ```
 
 ### Deployment
 
@@ -139,7 +139,8 @@ swift sft \
 CUDA_VISIBLE_DEVICES=0 \
 swift infer \
     --adapters output/vx-xxx/checkpoint-xxx \
-    --stream true
+    --stream true \
+    --max_new_tokens 2048
 
 # merge-lora并使用vLLM进行推理加速
 CUDA_VISIBLE_DEVICES=0 \
@@ -148,7 +149,8 @@ swift infer \
     --stream true \
     --merge_lora true \
     --infer_backend vllm \
-    --max_model_len 8192
+    --max_model_len 8192 \
+    --max_new_tokens 2048
 ```
 
 ### Web-UI
@@ -254,15 +256,17 @@ CUDA_VISIBLE_DEVICES=0 swift rlhf \
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --model Qwen/Qwen2.5-7B-Instruct \
     --stream true \
-    --infer_backend pt
+    --infer_backend pt \
+    --max_new_tokens 2048
 
 # LoRA
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --model Qwen/Qwen2.5-7B-Instruct \
     --adapters swift/test_lora \
     --stream true \
     --infer_backend pt \
-    --temperature 0
+    --temperature 0 \
+    --max_new_tokens 2048
 ```
 
 ### 部署
 
@@ -8,11 +8,6 @@ example在[这里](https://github.com/modelscope/swift/blob/main/swift/plugin/ca
 
 callback会在trainer构造前注册进trainer中，example中给出了一个简单版本的EarlyStop方案。
 
-## 定制化trainer
-
-example在[这里](https://github.com/modelscope/swift/blob/main/swift/plugin/custom_trainer.py).
-
-用户可以在这里继承现有trainer，并实现自己的训练逻辑，例如定制data_loader、定制compute_loss等。example中给出了一个text-classification任务的trainer。
 
 ## 定制化loss
 
 
@@ -81,6 +81,7 @@ query-response格式：
 
 微调：
 ```jsonl
+{"messages": [{"role": "user", "content": "浙江的省会在哪？"}, {"role": "assistant", "content": "浙江的省会在杭州。"}]}
 {"messages": [{"role": "user", "content": "<image><image>两张图片有什么区别"}, {"role": "assistant", "content": "前一张是小猫，后一张是小狗"}], "images": ["/xxx/x.jpg", "xxx/x.png"]}
 {"messages": [{"role": "user", "content": "<audio>语音说了什么"}, {"role": "assistant", "content": "今天天气真好呀"}], "audios": ["/xxx/x.mp3"]}
 {"messages": [{"role": "system", "content": "你是个有用无害的助手"}, {"role": "user", "content": "<image>图片中是什么，<video>视频中是什么"}, {"role": "assistant", "content": "图片中是一个大象，视频中是一只小狗在草地上奔跑"}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
 
@@ -63,7 +63,8 @@ swift sft \
 CUDA_VISIBLE_DEVICES=0 \
 swift infer \
     --adapters output/vx-xxx/checkpoint-xxx \
-    --stream true
+    --stream true \
+    --max_new_tokens 2048
 
 # merge-lora并使用vLLM进行推理加速
 CUDA_VISIBLE_DEVICES=0 \
@@ -72,7 +73,8 @@ swift infer \
     --stream true \
     --merge_lora true \
     --infer_backend vllm \
-    --max_model_len 8192
+    --max_model_len 8192 \
+    --max_new_tokens 2048
 ```
 
 > [!TIP]
 
@@ -18,7 +18,6 @@
     - 采用messages格式作为入参接口
 4. 支持了plugin机制，用于定制训练过程，目前支持的plugin有：
     - callback 定制训练回调方法
-    - custom_trainer 定制trainer
     - loss 定制loss方法
     - loss_scale 定制每个token的权重
     - metric 定制交叉验证的指标
 
@@ -16,12 +16,13 @@
 - custom_register_path: 自定义模型、对话模板和数据集注册的`.py`文件路径
 
 ### 模型参数
-
+- task_type: 默认为'causal_lm'. 可选为'causal_lm', 'seq_cls'. 例子可以查看[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls).
 - 🔥model: 模型id或模型本地路径。如果是自定义模型请配合`model_type`和`template`使用，具体可以参考[自定义模型](../Customization/自定义模型.md)
 - model_type: 模型类型。相同的模型架构、template、模型加载过程被定义为一个model_type
 - model_revision: 模型版本
 - 🔥torch_dtype: 模型权重的数据类型，支持`float16`,`bfloat16`,`float32`，默认从config文件中读取
 - attn_impl: attention类型，支持`flash_attn`, `sdpa`, `eager`，默认使用sdpa
+- num_labels: 分类模型需要指定。代表标签数量，默认为None
 - rope_scaling: rope类型，支持`linear`和`dynamic`，请配合`max_length`共同使用
 - device_map: 模型使用的device_map配置，例如：'auto'、'cpu'、json字符串、json文件路径
 - local_repo_path: 部分模型在加载时依赖于github repo. 为了避免`git clone`时遇到网络问题, 可以直接使用本地repo. 该参数需要传入本地repo的路径, 默认为`None`
@@ -290,7 +291,6 @@ Vera使用`target_modules`, `target_regex`, `modules_to_save`三个参数.
 - resume_only_model: 如果resume_from_checkpoint，仅resume模型权重，默认为False
 - check_model: 检查本地模型文件有损坏或修改并给出提示，默认为True。如果是断网环境，请设置为False
 - loss_type: loss类型，默认使用模型自带损失函数
-- num_labels: 分类模型需要指定。代表标签数量，默认为None
 
 - packing: 是否使用packing，默认为False
 - 🔥lazy_tokenize: 是否使用lazy_tokenize，在LLM训练中默认False，MLLM训练中默认True
 
@@ -74,14 +74,15 @@ For multimodal datasets, the format is the same as the tasks mentioned above. Th
 Pre-training:
 ```jsonl
 {"messages": [{"role": "assistant", "content": "Pre-trained text goes here"}]}
-{"messages": [{"role": "assistant", "content": "<image> is a puppy, <image> is a kitten"}], "images": ["/xxx/x.jpg", "/xxx/x.png"]}
-{"messages": [{"role": "assistant", "content": "<audio> describes how nice the weather is today"}], "audios": ["/xxx/x.wav"]}
-{"messages": [{"role": "assistant", "content": "<image> is an elephant, <video> is a lion running"}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
+{"messages": [{"role": "assistant", "content": "<image>is a puppy, <image>is a kitten"}], "images": ["/xxx/x.jpg", "/xxx/x.png"]}
+{"messages": [{"role": "assistant", "content": "<audio>describes how nice the weather is today"}], "audios": ["/xxx/x.wav"]}
+{"messages": [{"role": "assistant", "content": "<image>is an elephant, <video>is a lion running"}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
 ```
 
 Supervised Fine-tuning:
 
 ```jsonl
+{"messages": [{"role": "user", "content": "Where is the capital of Zhejiang?"}, {"role": "assistant", "content": "The capital of Zhejiang is Hangzhou."}]}
 {"messages": [{"role": "user", "content": "<image><image>What is the difference between the two images?"}, {"role": "assistant", "content": "The first one is a kitten, and the second one is a puppy."}], "images": ["/xxx/x.jpg", "xxx/x.png"]}
 {"messages": [{"role": "user", "content": "<audio>What did the audio say?"}, {"role": "assistant", "content": "The weather is really nice today."}], "audios": ["/xxx/x.mp3"]}
 {"messages": [{"role": "system", "content": "You are a helpful and harmless assistant."}, {"role": "user", "content": "<image>What is in the image, <video>What is in the video?"}, {"role": "assistant", "content": "The image shows an elephant, and the video shows a puppy running on the grass."}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
@@ -93,7 +94,7 @@ The data format for RLHF can refer to the format used for pure text large models
 For grounding (object detection) tasks, SWIFT supports two methods:
 1. Maintain consistency with the above multimodal dataset format, adding special characters in the dataset, for example:
 ```jsonl
-{"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "<image> Find a <ref> elephant </ref>"}, {"role": "assistant", "content": "<box>(200,450),(500,800)</box>"}], "images": ["/xxx/x.jpg"]}
+{"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "<image>Find a <ref> elephant </ref>"}, {"role": "assistant", "content": "<box>(200,450),(500,800)</box>"}], "images": ["/xxx/x.jpg"]}
 ```
 With this type of data, please note:
   - Grounding tasks often require special characters. You need to determine which model to use, read the model paper to identify special characters for grounding tasks, and combine the data accordingly.
@@ -104,9 +105,9 @@ With this type of data, please note:
 
 ```jsonl
 # Object detection
-{"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "<image> Identify <bbox>"}, {"role": "assistant", "content": "<ref-object>"}], "images": ["/coco2014/train2014/COCO_train2014_000000001507.jpg"], "objects": "[{\"caption\": \"guy in red\", \"bbox\": [138, 136, 235, 359], \"bbox_type\": \"real\", \"image\": 0}]"}
+{"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "<image>Identify <bbox>"}, {"role": "assistant", "content": "<ref-object>"}], "images": ["/coco2014/train2014/COCO_train2014_000000001507.jpg"], "objects": "[{\"caption\": \"guy in red\", \"bbox\": [138, 136, 235, 359], \"bbox_type\": \"real\", \"image\": 0}]"}
 # Grounding to multiple bboxes
-{"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "<image> Find <ref-object>"}, {"role": "assistant", "content": "<bbox>"}], "images": ["/coco2014/train2014/COCO_train2014_000000001507.jpg"], "objects": "[{\"caption\": \"guy in red\", \"bbox\": [[138, 136, 235, 359], [1,2,3,4]], \"bbox_type\": \"real\", \"image\": 0}]"}
+{"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "<image>Find <ref-object>"}, {"role": "assistant", "content": "<bbox>"}], "images": ["/coco2014/train2014/COCO_train2014_000000001507.jpg"], "objects": "[{\"caption\": \"guy in red\", \"bbox\": [[138, 136, 235, 359], [1,2,3,4]], \"bbox_type\": \"real\", \"image\": 0}]"}
 ```
 
 This format adds the objects field, which includes:
 
@@ -8,12 +8,6 @@ Examples can be found [here](https://github.com/modelscope/swift/blob/main/swift
 
 Callbacks are registered into the trainer before constructing the trainer. The example provides a simple version of the EarlyStop scheme.
 
-## Customized Trainer
-
-Examples can be found [here](https://github.com/modelscope/swift/blob/main/swift/plugin/custom_trainer.py).
-
-Users can inherit existing trainers and implement their own training logic here, such as customizing data loaders, customizing compute_loss, etc. The example demonstrates a trainer for a text-classification task.
-
 ## Customized Loss
 
 Examples can be found [here](https://github.com/modelscope/swift/blob/main/swift/plugin/loss.py).
 
@@ -63,7 +63,8 @@ After training is complete, use the following command to perform inference with
 CUDA_VISIBLE_DEVICES=0 \
 swift infer \
     --adapters output/vx-xxx/checkpoint-xxx \
-    --stream true
+    --stream true \
+    --max_new_tokens 2048
 
 # merge-lora and use vLLM for inference acceleration
 CUDA_VISIBLE_DEVICES=0 \
@@ -72,7 +73,8 @@ swift infer \
     --stream true \
     --merge_lora true \
     --infer_backend vllm \
-    --max_model_len 8192
+    --max_model_len 8192 \
+    --max_new_tokens 2048
 ```
 
 > [!TIP]