modelscope
diff --git a/‎README.md‎
Lines changed: 0 additions & 4 deletions b/‎README.md‎
Lines changed: 0 additions & 4 deletions
diff --git a/‎README_CN.md‎
Lines changed: 0 additions & 4 deletions b/‎README_CN.md‎
Lines changed: 0 additions & 4 deletions
diff --git a/‎docs/source/Instruction/命令行参数.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/Instruction/命令行参数.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source_en/Instruction/Command-line-parameters.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/source_en/Instruction/Command-line-parameters.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎examples/infer/demo_agent.py‎
Lines changed: 3 additions & 3 deletions b/‎examples/infer/demo_agent.py‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎examples/train/agent/deepseek_r1.sh‎
Lines changed: 27 additions & 0 deletions b/‎examples/train/agent/deepseek_r1.sh‎
Lines changed: 27 additions & 0 deletions
diff --git a/‎examples/train/agent/loss_scale/infer.md‎
Lines changed: 0 additions & 163 deletions b/‎examples/train/agent/loss_scale/infer.md‎
Lines changed: 0 additions & 163 deletions
diff --git a/‎examples/train/agent/loss_scale/infer_lora.py‎
Lines changed: 90 additions & 0 deletions b/‎examples/train/agent/loss_scale/infer_lora.py‎
Lines changed: 90 additions & 0 deletions
@@ -28,10 +28,6 @@
 <p align="center">
         <a href="https://arxiv.org/abs/2408.05517">Paper</a> &nbsp ｜ <a href="https://swift.readthedocs.io/en/latest/">Swift3.x En Doc</a> &nbsp ｜ &nbsp <a href="https://swift.readthedocs.io/zh-cn/latest/">Swift3.x中文文档</a> &nbsp
 </p>
-<p align="center">
-        <a href="https://swift2x-en.readthedocs.io/en/latest/">Swift2.x En Doc</a> &nbsp ｜ &nbsp <a href="https://swift2x.readthedocs.io/zh-cn/latest/">Swift2.x中文文档</a> &nbsp
-</p>
-
 
 ## 📖 Table of Contents
 - [Groups](#-Groups)
 
@@ -29,10 +29,6 @@
 <p align="center">
         <a href="https://arxiv.org/abs/2408.05517">论文</a> &nbsp ｜ <a href="https://swift.readthedocs.io/en/latest/">Swift3.x En Doc</a> &nbsp ｜ &nbsp <a href="https://swift.readthedocs.io/zh-cn/latest/">Swift3.x中文文档</a> &nbsp
 </p>
-<p align="center">
-        <a href="https://swift2x-en.readthedocs.io/en/latest/">Swift2.x En Doc</a> &nbsp ｜ &nbsp <a href="https://swift2x.readthedocs.io/zh-cn/latest/">Swift2.x中文文档</a> &nbsp
-</p>
-
 
 ##  📖 目录
 - [用户群](#-用户群)
 
@@ -70,6 +70,7 @@
 - 🔥max_pixels: 多模态模型输入图片的最大像素数（H\*W），将超过该限制的图像进行缩放。默认为None，不限制最大像素数
 - 🔥agent_template: Agent模板，确定如何将工具列表转换成system，如何从模型回复中提取toolcall，以及确定`{"role": "tool_call", "content": "xxx"}`, `{"role": "tool_response", "content": "xxx"}`的模板格式。可选为"react_en", "hermes", "glm4", "qwen_en", "toolbench"等，更多请查看[这里](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/agent_template/__init__.py)。默认为None，根据模型类型进行选择。
 - response_prefix: response的前缀字符，例如QwQ-32B将response_prefix设置为`'<think>\n'`。默认为None，根据模型自动设置
+  - 注意：若对deepseek-r1/qwq模型使用不包含`<think>...</think>`的数据集进行训练，请加在推理训练后模型时额外传入`--response_prefix ''`
 - padding_side: 当训练`batch_size>=2`时的padding_side，可选值为'left'、'right'，默认为'right'。（推理时的batch_size>=2时，只进行左padding）
 - loss_scale: 训练tokens的loss权重设置。默认为`'default'`，代表所有response（含history）以1计算交叉熵损失。可选值为'default'、'last_round'、'all'，以及agent需要的loss_scale: 'react'、'agentflan'、'alpha_umi'和'qwen'。其中'last_round'代表只计算最后一轮response的损失，'all'代表计算所有tokens的损失。agent部分可以查看[插件化](../Customization/插件化.md)和[Agent文档](./Agent支持.md)
 - use_chat_template: 使用chat模板或generation模板，默认为`True`。`swift pt`会自动设置为generation模板
 
@@ -73,6 +73,7 @@ Hints:
 - 🔥agent_template: Agent template, which determines how to convert the list of tools into a system, how to extract tool calls from the model's response, and specifies the template format for `{"role": "tool_call", "content": "xxx"}` and `{"role": "tool_response", "content": "xxx"}`. Optional values include "react_en", "hermes", "glm4", "qwen_en", "toolbench", etc. For more details, please check [here](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/agent_template/__init__.py). The default value is None, meaning it will be selected based on the model type.
 - norm_bbox: Controls how to scale bounding boxes (bbox). Options are 'norm1000' and 'none'. 'norm1000' represents scaling bbox coordinates to one-thousandths, and 'none' means no scaling. Default is None, automatically selected based on the model.
 - response_prefix: The prefix character for the response, for example, setting the response_prefix to `'<think>\n'` for QwQ-32B. The default is None, and it is automatically set according to the model.
+  - Note: If you are training the deepseek-r1/qwq model with a dataset that does not include `<think>...</think>`, please pass `--response_prefix ''` additionally when inferring after training.
 - padding_side: Padding side when `batch_size>=2` during training. Options are 'left' and 'right', with 'right' as the default. (For inference with batch_size>=2, only left padding is applied.)
 - loss_scale: Setting for the loss weight of training tokens. Default is `'default'`, meaning all responses (including history) are calculated with a cross-entropy loss of 1. Options are 'default', 'last_round', 'all', and agent-specific loss scales: 'react', 'agentflan', 'alpha_umi', and 'qwen'. 'last_round' means calculating only the loss of the last round's response, and 'all' calculates the loss for all tokens. For agent parts, see [Pluginization](../Customization/Pluginization.md) and [Agent Training](./Agent-support.md).
 - use_chat_template: Use chat template or generation template, default is `True`. `swift pt` is automatically set to the generation template.
 
@@ -109,10 +109,10 @@ def infer_continue_generate(engine):
         from swift.llm import LmdeployEngine
         engine = LmdeployEngine(model)
 
-    agent_template = agent_templates['hermes']()  # react_en/qwen_en/qwen_en_parallel
-    engine.default_template.agent_template = agent_template
+    # agent_template = agent_templates['hermes']()  # react_en/qwen_en/qwen_en_parallel
+    # engine.default_template.agent_template = agent_template
 
     infer(engine, get_infer_request())
     infer_stream(engine, get_infer_request())
 
-    infer_continue_generate(engine)
+    # infer_continue_generate(engine)
@@ -0,0 +1,27 @@
+CUDA_VISIBLE_DEVICES=0 \
+swift sft \
+    --model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
+    --train_type full \
+    --dataset AI-ModelScope/function-calling-chatml \
+    --agent_template react_en \
+    --loss_scale react \
+    --response_prefix '' \
+    --torch_dtype bfloat16 \
+    --num_train_epochs 2 \
+    --per_device_train_batch_size 1 \
+    --per_device_eval_batch_size 1 \
+    --learning_rate 1e-5 \
+    --gradient_accumulation_steps 8 \
+    --eval_steps 100 \
+    --save_steps 100 \
+    --save_total_limit 2 \
+    --logging_steps 5 \
+    --max_length 8192 \
+    --save_only_model true \
+    --packing true \
+    --use_liger_kernel true \
+    --output_dir output \
+    --warmup_ratio 0.05 \
+    --attn_impl flash_attn \
+    --dataloader_num_workers 4 \
+    --dataset_num_proc 16
@@ -0,0 +1,90 @@
+# Copyright (c) Alibaba, Inc. and its affiliates.
+import os
+
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+# os.environ['SWIFT_DEBUG'] = '1'
+
+
+def infer(engine: 'InferEngine', infer_request: 'InferRequest'):
+    stop = [engine.default_template.agent_template.keyword.observation]  # compat react_en
+    request_config = RequestConfig(max_tokens=512, temperature=0, stop=stop)
+    resp_list = engine.infer([infer_request], request_config)
+    query = infer_request.messages[0]['content']
+    response = resp_list[0].choices[0].message.content
+    print(f'query: {query}')
+    print(f'response: {response}')
+    print(f'tool_calls: {resp_list[0].choices[0].message.tool_calls}')
+
+    tool = '{"temperature": 32, "condition": "Sunny", "humidity": 50}'
+    print(f'tool_response: {tool}')
+    infer_request.messages += [{'role': 'assistant', 'content': response}, {'role': 'tool', 'content': tool}]
+    resp_list = engine.infer([infer_request], request_config)
+    response2 = resp_list[0].choices[0].message.content
+    print(f'response2: {response2}')
+
+
+def infer_stream(engine: 'InferEngine', infer_request: 'InferRequest'):
+    stop = [engine.default_template.agent_template.keyword.observation]
+    request_config = RequestConfig(max_tokens=512, temperature=0, stream=True, stop=stop)
+    gen_list = engine.infer([infer_request], request_config)
+    query = infer_request.messages[0]['content']
+    response = ''
+    print(f'query: {query}\nresponse: ', end='')
+    for resp in gen_list[0]:
+        if resp is None:
+            continue
+        delta = resp.choices[0].delta.content
+        response += delta
+        print(delta, end='', flush=True)
+    print()
+    print(f'tool_calls: {resp.choices[0].delta.tool_calls}')
+
+    tool = '{"temperature": 32, "condition": "Sunny", "humidity": 50}'
+    print(f'tool_response: {tool}\nresponse2: ', end='')
+    infer_request.messages += [{'role': 'assistant', 'content': response}, {'role': 'tool', 'content': tool}]
+    gen_list = engine.infer([infer_request], request_config)
+    for resp in gen_list[0]:
+        if resp is None:
+            continue
+        print(resp.choices[0].delta.content, end='', flush=True)
+    print()
+
+
+def get_infer_request():
+    return InferRequest(
+        messages=[{
+            'role': 'user',
+            'content': "How's the weather in Beijing today?"
+        }],
+        tools=[{
+            'name': 'get_current_weather',
+            'description': 'Get the current weather in a given location',
+            'parameters': {
+                'type': 'object',
+                'properties': {
+                    'location': {
+                        'type': 'string',
+                        'description': 'The city and state, e.g. San Francisco, CA'
+                    },
+                    'unit': {
+                        'type': 'string',
+                        'enum': ['celsius', 'fahrenheit']
+                    }
+                },
+                'required': ['location']
+            }
+        }])
+
+
+if __name__ == '__main__':
+    from swift.llm import InferEngine, InferRequest, PtEngine, RequestConfig
+    from swift.plugin import agent_templates
+    model = 'Qwen/Qwen2.5-3B'
+    adapters = ['output/vx-xxx/checkpoint-xxx']
+    engine = PtEngine(model, adapters=adapters, max_batch_size=8)
+
+    # agent_template = agent_templates['hermes']()  # react_en/qwen_en/qwen_en_parallel
+    # engine.default_template.agent_template = agent_template
+
+    infer(engine, get_infer_request())
+    infer_stream(engine, get_infer_request())