update docs (#3961)

Jintao-Huang · web-flow · commit 4ea46fbaea44 · 2025-04-23T10:19:26.000+08:00
diff --git a/docs/source/Instruction/Agent支持.md b/docs/source/Instruction/Agent支持.md
@@ -16,7 +16,7 @@
 
 以下为上述两条数据样本由qwen2_5和qwen2_5_vl的template进行encode后的input_ids和labels，选择的agent_template为**hermes**：
 
-样本一：
+样本一（并行工具调用）：
 ```text
 [INPUT_IDS] <|im_start|>system
 You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
@@ -61,7 +61,7 @@ For each function call, return a json object with function name and arguments wi
 </tool_call><|im_end|>[-100 * 67]根据天气预报工具，北京今天的空气质量指数为10，属于良好水平；上海今天的空气质量指数为72，属于轻度污染水平。<|im_end|>
 ```
 
-样本二：
+样本二（多模态，混合assistant和tool_call）：
 ```text
 [INPUT_IDS] <|im_start|>system
 You are a helpful assistant.
@@ -103,7 +103,7 @@ For each function call, return a json object with function name and arguments wi
 </tool_call><|im_end|>[-100 * 759]成功打开日历App，现在的时间为中午11点<|im_end|>
 ```
 
-**react_en**也是最常使用的agent template格式，以下为样本一由qwen2_5使用`agent_template='react_en'`进行encode后的input_ids和labels：
+**react_en**是常用的agent template格式之一，以下为样本一由qwen2_5使用`agent_template='react_en'`进行encode后的input_ids和labels：
 
 ```text
 [INPUT_IDS] <|im_start|>system
@@ -142,7 +142,18 @@ Action Input: {'city': '上海'}
 Observation:[-100 * 45]根据天气预报工具，北京今天的空气质量指数为10，属于良好水平；上海今天的空气质量指数为72，属于轻度污染水平。<|im_end|>
 ```
 
-更多的agent template可选值参考[这里](https://github.com/modelscope/swift/blob/main/swift/plugin/agent_template/__init__.py).
+更多模型和agent_template的尝试可以使用以下代码，更多的agent template可选值参考[这里](https://github.com/modelscope/swift/blob/main/swift/plugin/agent_template/__init__.py)。
+```python
+from swift.llm import get_model_tokenizer, get_template
+
+_, tokenizer = get_model_tokenizer('ZhipuAI/GLM-4-9B-0414', load_model=False)
+template = get_template(tokenizer.model_meta.template, tokenizer, agent_template='hermes')
+data = {...}
+template.set_mode('train')
+encoded = template.encode(data)
+print(f'[INPUT_IDS] {template.safe_decode(encoded["input_ids"])}\n')
+print(f'[LABELS] {template.safe_decode(encoded["labels"])}')
+```
 
 
 ## tools格式
@@ -174,21 +185,23 @@ tools = [{
 
 ## loss_scale的使用
 
-loss_scale可以对模型输出部分的训练权重进行调节。例如在ReACT格式中，可以设置`--loss_scale react`（loss_scale配置文件书写在[这里](https://github.com/modelscope/swift/blob/main/swift/plugin/loss_scale/config/default_loss_scale_config.json)），该参数起到的作用是：
+loss_scale可以对模型输出部分的训练损失权重进行调节。例如在ReACT格式中，可以设置`--loss_scale react`（loss_scale配置文件书写在[这里](https://github.com/modelscope/swift/blob/main/swift/plugin/loss_scale/config/default_loss_scale_config.json)），该参数起到的作用是：
 
 'Thought:'和'Final Answer:'部分权重为1，'Action:'和'Action Input:'部分权重为2，'Observation:'字段本身权重为2，'Observation:'后面的工具调用结果权重为0。
 
 具体的loss_scale插件设计，请参考[插件化](../Customization/插件化.md)文档.
 
 
 ## 训练
-参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/agent)，支持不同模型的丝滑切换。
+- 训练Base模型的Agent能力，通过修改`--model`切换不同模型，参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/agent/qwen2_5.sh)。
+- 训练GLM4的agent_template为hermes，参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/agent/glm4.sh)。
+- 使用`--loss_scale`对模型输出部分的损失权重进行调整，参加[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/agent/loss_scale)。
 
 ## 推理
 
-- 原始模型或者全参数训练参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo_agent.py)。
-- LoRA训练参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/agent/loss_scale/infer.md)。
+- 原始模型或者全参数训练后模型的推理，参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo_agent.py)。
+- LoRA训练后推理，参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/agent/loss_scale/infer.md)。
 
 ## 部署
 
-参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/deploy/agent)。
+服务端和客户端代码，参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/deploy/agent)。
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -27,9 +27,9 @@ Swift DOCUMENTATION
    Instruction/导出与推送.md
    Instruction/强化微调.md
    Instruction/GRPO.md
+   Instruction/Agent支持.md
    Instruction/支持的模型和数据集.md
    Instruction/使用tuners.md
-   Instruction/Agent支持.md
    Instruction/常见问题整理.md
 
 .. toctree::
diff --git a/docs/source_en/Customization/Pluginization.md b/docs/source_en/Customization/Pluginization.md
@@ -95,7 +95,7 @@ In the above definition, we added a new `custom` metric. Its value consists of t
 ## Customizing Optimizers
 
 An example can be found [here](https://github.com/modelscope/swift/blob/main/swift/plugin/optimizer.py).
-- Apply different learning rates to different parts of the model. For example, use separate learning rates for ViT and LLM, as referenced [here ](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/lora_llm_full_vit/custom_plugin.py).
+- Apply different learning rates to different parts of the model. For example, use separate learning rates for ViT and LLM, as referenced [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/lora_llm_full_vit/custom_plugin.py).
 
 Users can add their own optimizers and learning rate schedulers here:
 
@@ -126,8 +126,8 @@ The example is [here](https://github.com/modelscope/swift/blob/main/swift/plugin
 ## Customizing Tuners
 
 An example can be found [here](https://github.com/modelscope/swift/blob/main/swift/plugin/tuner.py).
-- For the multimodal model, full-parameter training is applied to the ViT part, while LoRA training is used for the LLM part. Refer to [here ](https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal/lora_llm_full_vit).
-- For Phi4-multimodal, train its existing LoRA directly without adding extra LoRA. Refer to [here ](https://github.com/modelscope/ms-swift/blob/main/examples/train/plugins/tuner_phi4_mm.sh).
+- For the multimodal model, full-parameter training is applied to the ViT part, while LoRA training is used for the LLM part. Refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal/lora_llm_full_vit).
+- For Phi4-multimodal, train its existing LoRA directly without adding extra LoRA. Refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/plugins/tuner_phi4_mm.sh).
 
 Tuner customization is another unique feature of SWIFT. Developers can bypass the complex tuner initialization process and code integration costs by registering new tuners here:
 
diff --git a/docs/source_en/Instruction/Agent-support.md b/docs/source_en/Instruction/Agent-support.md
@@ -18,7 +18,7 @@ Example data samples for the pure text Agent and multimodal Agent are as follows
 
 The following are the `input_ids` and `labels` after encoding the two data samples mentioned above using the templates for **qwen2_5** and **qwen2_5_vl** , with the selected `agent_template` being **hermes** :
 
-Sample One:
+Sample One (Parallel Tool Calls):
 
 ```text
 [INPUT_IDS] <|im_start|>system
@@ -64,7 +64,7 @@ According to the weather forecast tool, the air quality index (AQI) in Beijing i
 </tool_call><|im_end|>[-100 * 67]According to the weather forecast tool, the air quality index (AQI) in Beijing is 10, which indicates good air quality; whereas in Shanghai, the AQI is 72, indicating mild pollution.<|im_end|>
 ```
 
-Sample Two:
+Sample Two (Multimodal, Mixed Assistant and Tool Call):
 
 ```text
 [INPUT_IDS] <|im_start|>system
@@ -107,7 +107,7 @@ I can check the current time by opening the calendar app.
 </tool_call><|im_end|>[-100 * 759]Successfully opened the calendar app. The current time is 11 o'clock in the morning.<|im_end|>
 ```
 
-**react_en** is also the most commonly used agent template format. The following are the `input_ids` and `labels` after encoding Sample One using qwen2_5 with `agent_template='react_en'`:
+**react_en** is one of the commonly used agent template formats. Below is an example of the `input_ids` and `labels` after encoding by qwen2_5 using `agent_template='react_en'`:
 
 ```text
 [INPUT_IDS] <|im_start|>system
@@ -146,7 +146,19 @@ Action Input: {'city': 'Shanghai'}
 Observation:[-100 * 45]According to the weather forecast tool, the air quality index (AQI) in Beijing is 10, which indicates good air quality; whereas in Shanghai, the AQI is 72, indicating mild pollution.<|im_end|>
 ```
 
-For more optional values of the agent template, refer to [here](https://github.com/modelscope/swift/blob/main/swift/plugin/agent_template/__init__.py).
+The following code can be used to experiment with more models and `agent_template` options. For more selectable values of `agent_template`, refer to [here](https://github.com/modelscope/swift/blob/main/swift/plugin/agent_template/__init__.py).
+
+```python
+from swift.llm import get_model_tokenizer, get_template
+
+_, tokenizer = get_model_tokenizer('ZhipuAI/GLM-4-9B-0414', load_model=False)
+template = get_template(tokenizer.model_meta.template, tokenizer, agent_template='hermes')
+data = {...}
+template.set_mode('train')
+encoded = template.encode(data)
+print(f'[INPUT_IDS] {template.safe_decode(encoded["input_ids"])}\n')
+print(f'[LABELS] {template.safe_decode(encoded["labels"])}')
+```
 
 ## Tools Format
 
@@ -178,7 +190,7 @@ tools = [{
 
 ## Usage of loss_scale
 
-`loss_scale` can be used to adjust the training weight of specific parts in the model's output. For example, in the ReACT format, you can set `--loss_scale react` (the `loss_scale` configuration file can be found [here ](https://github.com/modelscope/swift/blob/main/swift/plugin/loss_scale/config/default_loss_scale_config.json)). The role of this parameter is as follows:
+`loss_scale` can be used to adjust the training loss weight for the model's output section. For example, in the ReACT format, you can set `--loss_scale react` (the loss_scale configuration file is written [here](https://github.com/modelscope/swift/blob/main/swift/plugin/loss_scale/config/default_loss_scale_config.json)). The role of this parameter is as follows:
 
 - The weight for the 'Thought:' and 'Final Answer:' sections is 1.
 - The weight for the 'Action:' and 'Action Input:' sections is 2.
@@ -189,13 +201,15 @@ For the detailed design of the `loss_scale` plugin, please refer to the [Plugin-
 
 ## Training
 
-Refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/agent)for smooth switching between different models.
+- Train the Agent capabilities of Base models by switching different models through modifying `--model`. Refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/agent/qwen2_5.sh).
+- The agent_template for training GLM4 is hermes. Refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/agent/glm4.sh).
+- Use `--loss_scale` to adjust the loss weight of the model output section. Refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/agent/loss_scale).
 
 ## Inference
 
-- For the original model or full-parameter training, refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo_agent.py).
-- For LoRA training, refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/agent/loss_scale/infer.md).
+- For inference of the original model or fully trained model, refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo_agent.py).
+- For inference after LoRA training, refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/agent/loss_scale/infer.md).
 
 ## Deployment
 
-Refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/deploy/agent).
+For server and client code, refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/deploy/agent).
diff --git a/docs/source_en/index.rst b/docs/source_en/index.rst
@@ -27,9 +27,9 @@ Swift DOCUMENTATION
    Instruction/Export-and-push.md
    Instruction/Reinforced-Fine-tuning.md
    Instruction/GRPO.md
+   Instruction/Agent-support.md
    Instruction/Supported-models-and-datasets.md
    Instruction/Use-tuners.md
-   Instruction/Agent-support.md
    Instruction/Frequently-asked-questions.md
 
 
diff --git a/examples/deploy/agent/client.py b/examples/deploy/agent/client.py
@@ -36,9 +36,10 @@ def infer(client, model: str, messages, tools):
     response = resp.choices[0].message.content
     print(f'query: {query}')
     print(f'response: {response}')
+    print(f'tool_calls: {resp.choices[0].message.tool_calls}')
 
     tool = '{"temperature": 32, "condition": "Sunny", "humidity": 50}'
-    print(f'tool: {tool}')
+    print(f'tool_response: {tool}')
     messages += [{'role': 'assistant', 'content': response}, {'role': 'tool', 'content': tool}]
     resp = client.chat.completions.create(model=model, messages=messages, tools=tools, max_tokens=512, temperature=0)
     response2 = resp.choices[0].message.content
@@ -58,9 +59,10 @@ def infer_stream(client, model: str, messages, tools):
         response += delta
         print(delta, end='', flush=True)
     print()
+    print(f'tool_calls: {chunk.choices[0].delta.tool_calls}')
 
     tool = '{"temperature": 32, "condition": "Sunny", "humidity": 50}'
-    print(f'tool: {tool}')
+    print(f'tool_response: {tool}')
     messages += [{'role': 'assistant', 'content': response}, {'role': 'tool', 'content': tool}]
     gen = client.chat.completions.create(
         model=model, messages=messages, tools=tools, max_tokens=512, temperature=0, stream=True)
diff --git a/examples/infer/demo_agent.py b/examples/infer/demo_agent.py
@@ -12,9 +12,10 @@ def infer(engine: 'InferEngine', infer_request: 'InferRequest'):
     response = resp_list[0].choices[0].message.content
     print(f'query: {query}')
     print(f'response: {response}')
+    print(f'tool_calls: {resp_list[0].choices[0].message.tool_calls}')
 
     tool = '{"temperature": 32, "condition": "Sunny", "humidity": 50}'
-    print(f'tool: {tool}')
+    print(f'tool_response: {tool}')
     infer_request.messages += [{'role': 'assistant', 'content': response}, {'role': 'tool', 'content': tool}]
     resp_list = engine.infer([infer_request], request_config)
     response2 = resp_list[0].choices[0].message.content
@@ -35,9 +36,10 @@ def infer_stream(engine: 'InferEngine', infer_request: 'InferRequest'):
         response += delta
         print(delta, end='', flush=True)
     print()
+    print(f'tool_calls: {resp.choices[0].delta.tool_calls}')
 
     tool = '{"temperature": 32, "condition": "Sunny", "humidity": 50}'
-    print(f'tool: {tool}\nresponse2: ', end='')
+    print(f'tool_response: {tool}\nresponse2: ', end='')
     infer_request.messages += [{'role': 'assistant', 'content': response}, {'role': 'tool', 'content': tool}]
     gen_list = engine.infer([infer_request], request_config)
     for resp in gen_list[0]:
@@ -73,6 +75,24 @@ def get_infer_request():
         }])
 
 
+def infer_continue_generate(engine):
+    # Continue generating after the assistant message.
+    infer_request = InferRequest(messages=[{
+        'role': 'user',
+        'content': 'How is the weather today?'
+    }, {
+        'role': 'assistant',
+        'content': 'It is sunny today, '
+    }, {
+        'role': 'assistant',
+        'content': None
+    }])
+    request_config = RequestConfig(max_tokens=512, temperature=0)
+    resp_list = engine.infer([infer_request], request_config)
+    response = resp_list[0].choices[0].message.content
+    print(f'response: {response}')
+
+
 if __name__ == '__main__':
     from swift.llm import InferEngine, InferRequest, PtEngine, RequestConfig
     model = 'Qwen/Qwen2.5-1.5B-Instruct'
@@ -89,3 +109,5 @@ def get_infer_request():
 
     infer(engine, get_infer_request())
     infer_stream(engine, get_infer_request())
+
+    infer_continue_generate(engine)
diff --git a/swift/llm/dataset/dataset/llm.py b/swift/llm/dataset/dataset/llm.py
@@ -744,11 +744,6 @@ def preprocess(self, row: Dict[str, Any]) -> Optional[Dict[str, Any]]:
         messages = res['messages']
         if messages[0]['role'] == 'system':
             messages.pop(0)
-        for message in messages:
-            if message['role'] == 'function-call':
-                message['role'] = 'tool_call'
-            elif message['role'] == 'function-response':
-                message['role'] = 'tool_response'
         return res
 
 
diff --git a/swift/llm/dataset/preprocessor/core.py b/swift/llm/dataset/preprocessor/core.py
@@ -401,6 +401,8 @@ def __init__(
         self.content_keys = ['content', 'value'] if content_key is None else [content_key]
         self.user_roles = ['user', 'human'] if user_role is None else [user_role]
         self.assistant_roles = ['assistant', 'gpt', 'bot'] if assistant_role is None else [assistant_role]
+        self.tool_call_roles = ['function_call']
+        self.tool_response_roles = ['function_response', 'observation', 'observations']
 
         self.system_role = system_role
         self.repair_messages = repair_messages
@@ -446,6 +448,10 @@ def to_std_messages(self, messages: List[Dict[str, str]], system: Optional[str])
                 message['role'] = 'user'
             elif role in self.assistant_roles:
                 message['role'] = 'assistant'
+            elif role.replace('-', '_') in self.tool_call_roles:
+                message['role'] = 'tool_call'
+            elif role.replace('-', '_') in self.tool_response_roles:
+                message['role'] = 'tool_response'
 
     @staticmethod
     def _to_std_key(messages: List[Dict[str, str]], std_key: str, optional_keys: List[str]) -> None: