[docs] update loss_scale docs (#5516)

Jintao-Huang · web-flow · commit d586583573bd · 2025-08-25T11:29:21.000+08:00
diff --git a/docs/source/Instruction/Agent支持.md b/docs/source/Instruction/Agent支持.md
@@ -186,11 +186,23 @@ tools = [{
 
 ## loss_scale的使用
 
-loss_scale可以对模型输出部分的训练损失权重进行调节。例如在ReACT格式中，可以设置`--loss_scale react`（loss_scale配置文件书写在[这里](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/loss_scale/config/react.json)），该参数起到的作用是：
+loss_scale参数可用于调节模型输出部分在训练过程中的损失权重。目前支持两种配置方式：字符串精确匹配和正则表达式匹配。
 
-'Thought:'和'Final Answer:'部分权重为1，'Action:'和'Action Input:'部分权重为2，'Observation:'字段本身权重为2，'Observation:'后面的工具调用结果权重为0。
+1. 字符串匹配示例：ReACT 格式
 
-具体的loss_scale插件设计，请参考[插件化](../Customization/插件化.md)文档.
+以 ReACT 格式为例，可通过 `--loss_scale react` 启用相应的 loss_scale 配置（配置文件详见 [react.json](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/loss_scale/config/react.json)）。该方式基于字符串精确匹配，配置中的字典映射需提供一个包含两个元素的列表，分别表示：当前匹配字符串本身的损失权重，
+从该字符串之后到下一个指定字符串之前的内容的损失权重。该设置的具体效果如下：
+- 'Action:' 和 'Action Input:' 字段自身及其后续内容的损失权重均为 2；
+- 'Thought:' 和 'Final Answer:' 字段自身及其后续内容的损失权重均为 1；
+- 'Observation:' 字段自身的权重为 2，但其后跟随的工具调用结果部分的损失权重为 0。
+
+2. 正则匹配示例：忽略空思维块
+
+在训练推理模型时，我们可能需要忽略数据集中存在的形如 `<think>\n\n</think>\n\n`的空思维标记损失计算。此时可使用 `--loss_scale ignore_empty_think`（配置文件详见 [ignore_empty_think.json](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/loss_scale/config/ignore_empty_think.json)）。该配置采用正则表达式匹配方式，字典映射的列表只需指定一个值，表示匹配内容的损失权重。该设置的具体效果如下：
+
+- 所有与正则表达式`<think>\\s*</think>\\s*`匹配的字符串，loss_scale为0，即不计算损失。
+
+更多的loss_scale插件设计，请参考[插件化](../Customization/插件化.md)文档.
 
 
 ## 训练
diff --git a/docs/source_en/Instruction/Agent-support.md b/docs/source_en/Instruction/Agent-support.md
@@ -190,14 +190,32 @@ tools = [{
 
 ## Usage of loss_scale
 
-`loss_scale` can be used to adjust the training loss weight for the model's output section. For example, in the ReACT format, you can set `--loss_scale react` (the loss_scale configuration file is written [here](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/loss_scale/config/react.json)). The role of this parameter is as follows:
+The `loss_scale` parameter can be used to adjust the loss weights for different parts of the model output during training. Currently, two configuration methods are supported: exact string matching and regular expression (regex) matching.
 
-- The weight for the 'Thought:' and 'Final Answer:' sections is 1.
-- The weight for the 'Action:' and 'Action Input:' sections is 2.
-- The weight for the 'Observation:' field itself is 2.
-- The weight for the tool invocation results following the 'Observation:' field is 0.
+1. String Matching Example: ReACT Format
 
-For the detailed design of the `loss_scale` plugin, please refer to the [Plugin-based Architecture](../Customization/Pluginization.md)documentation.
+Take the ReACT format as an example. You can enable the corresponding `loss_scale` configuration via `--loss_scale react` (see configuration file [react.json](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/loss_scale/config/react.json)). This method relies on exact string matching. The dictionary mapping in the configuration must provide a list of two elements, representing:
+
+- The loss weight for the matched string itself,
+- The loss weight for content following the matched string, up to (but not including) the next specified string.
+
+The specific effects of this configuration are as follows:
+
+- The `'Action:'` and `'Action Input:'` keywords and their subsequent content both have a loss weight of 2;
+- The `'Thought:'` and `'Final Answer:'` keywords and their subsequent content both have a loss weight of 1;
+- The `'Observation:'` field itself has a loss weight of 2, but the subsequent tool call result content has a loss weight of 0.
+
+
+2. Regular Expression Matching Example: Ignoring Empty Thought Blocks
+
+When training reasoning models, it may be necessary to exclude loss computation for empty thought blocks in the dataset, such as sequences like `<think>\n\n</think>\n\n`.
+
+In such cases, use `--loss_scale ignore_empty_think` (see configuration file [ignore_empty_think.json](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/loss_scale/config/ignore_empty_think.json)). This configuration uses regular expression matching, where the dictionary mapping only needs to specify a single value—the loss weight for the matched content.
+
+The specific effect of this setting is:
+- Any string matching the regular expression `<think>\\s*</think>\\s*` is assigned a `loss_scale` of 0, meaning no loss is computed for these segments.
+
+For more `loss_scale` plugin designs, please refer to the [Pluginization](../Customization/Pluginization.md) documentation.
 
 ## Training
 
diff --git a/swift/plugin/agent_template/glm4.py b/swift/plugin/agent_template/glm4.py
@@ -109,13 +109,15 @@ def get_toolcall(self, response: str) -> List['Function']:
 
     def _format_tools(self, tools: List[Union[str, dict]], system: str, user_message=None) -> str:
         tool_descs = [
-            '# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>'
+            '# Tools\n\nYou may call one or more functions to assist with the user query.\n\n'
+            'You are provided with function signatures within <tools></tools> XML tags:\n<tools>'
         ]
         for tool in tools:
             tool_descs.append(f'{json.dumps(tool, ensure_ascii=False)}')
-        tool_descs.append(
-            '</tools>\n\nFor each function call, output the function name and arguments within the following XML format:\n<tool_call>{function-name}\n<arg_key>{arg-key-1}</arg_key>\n<arg_value>{arg-value-1}</arg_value>\n<arg_key>{arg-key-2}</arg_key>\n<arg_value>{arg-value-2}</arg_value>\n...\n</tool_call>'
-        )
+        tool_descs.append('</tools>\n\nFor each function call, output the function name and arguments within '
+                          'the following XML format:\n<tool_call>{function-name}\n<arg_key>{arg-key-1}</arg_key>\n'
+                          '<arg_value>{arg-value-1}</arg_value>\n<arg_key>{arg-key-2}</arg_key>\n'
+                          '<arg_value>{arg-value-2}</arg_value>\n...\n</tool_call>')
         tool_descs = '\n'.join(tool_descs)
         if system.strip():
             tool_descs += '<|system|>\n' + system.strip()