support alpha-umi algo and dataset (#752)

hjh0119 · web-flow · commit b5f178993079 · 2024-04-21T23:47:03.000+08:00
diff --git a/docs/source/LLM/Agent微调最佳实践.md b/docs/source/LLM/Agent微调最佳实践.md
@@ -440,7 +440,7 @@ print()
 
 
 ### 微调
-将`dataset`换为`ms-agent-for-agentfabric`和`ms-agent-for-agentfabric-default`
+将`dataset`换为`ms-agent-for-agentfabric-default`和`ms-agent-for-agentfabric-addition`
 ```shell
 # Experimental environment: 8GPU
 nproc_per_node=8
diff --git a/docs/source/LLM/支持的模型和数据集.md b/docs/source/LLM/支持的模型和数据集.md
@@ -277,6 +277,10 @@
 |damo-agent-zh|[damo/MSAgent-Bench](https://modelscope.cn/datasets/damo/MSAgent-Bench/summary)|422115|161|965.7±440.9, min=321, max=31535|chat, agent, multi-round|-|
 |damo-agent-mini-zh|[damo/MSAgent-Bench](https://modelscope.cn/datasets/damo/MSAgent-Bench/summary)|39964|152|1230.9±350.1, min=558, max=4982|chat, agent, multi-round|-|
 |agent-instruct-all-en|[huangjintao/AgentInstruct_copy](https://modelscope.cn/datasets/huangjintao/AgentInstruct_copy/summary)|1866|0|1144.3±635.5, min=206, max=6412|chat, agent, multi-round|-|
+|toolbench-for-alpha-umi-backbone|[shenweizhou/alpha-umi-toolbench-processed-v2](https://modelscope.cn/datasets/shenweizhou/alpha-umi-toolbench-processed-v2/summary)|500951|0|1479.3±885.4, min=221, max=18467|chat, agent|-|
+|toolbench-for-alpha-umi-caller|[shenweizhou/alpha-umi-toolbench-processed-v2](https://modelscope.cn/datasets/shenweizhou/alpha-umi-toolbench-processed-v2/summary)|369279|0|1405.7±784.8, min=214, max=17912|chat, agent|-|
+|toolbench-for-alpha-umi-planner|[shenweizhou/alpha-umi-toolbench-processed-v2](https://modelscope.cn/datasets/shenweizhou/alpha-umi-toolbench-processed-v2/summary)|494975|0|1391.1±866.5, min=173, max=18451|chat, agent|-|
+|toolbench-for-alpha-umi-summarizer|[shenweizhou/alpha-umi-toolbench-processed-v2](https://modelscope.cn/datasets/shenweizhou/alpha-umi-toolbench-processed-v2/summary)|83132|0|1641.5±839.0, min=123, max=9852|chat, agent|-|
 |code-alpaca-en|[wyj123456/code_alpaca_en](https://modelscope.cn/datasets/wyj123456/code_alpaca_en/summary)|20016|0|100.1±60.1, min=29, max=1776|chat, coding|[sahil2801/CodeAlpaca-20k](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k)|
 |🔥leetcode-python-en|[AI-ModelScope/leetcode-solutions-python](https://modelscope.cn/datasets/AI-ModelScope/leetcode-solutions-python/summary)|2359|0|723.8±233.5, min=259, max=2117|chat, coding|-|
 |🔥codefuse-python-en|[codefuse-ai/CodeExercise-Python-27k](https://modelscope.cn/datasets/codefuse-ai/CodeExercise-Python-27k/summary)|27224|0|483.6±193.9, min=45, max=3082|chat, coding|-|
diff --git a/docs/source_en/LLM/Agent-best-practice.md b/docs/source_en/LLM/Agent-best-practice.md
@@ -430,7 +430,7 @@ This section focuses on the interactive framework AgentFabric within Modelscope-
 Due to the mismatch between the system prompt in ms-agent and that in Modelscope-Agent, direct training yields suboptimal results. To address this, we have created a new dataset [ms_agent_for_agentfabric](https://modelscope.cn/datasets/AI-ModelScope/ms_agent_for_agentfabric/summary) by converting the format from ms-agent, which is now integrated into SWIFT. The `ms-agent-for-agentfabric-default` includes 30,000 entries converted from ms-agent data, while `ms-agent-for-agentfabric-additional` contains 488 entries filtered from actual function call access data by the open-source AgentFabric framework.
 
 ### Fine-tuning
-Replace `dataset` with `ms-agent-for-agentfabric` and `ms-agent-for-agentfabric-default`:
+Replace `dataset` with `ms-agent-for-agentfabric-default` and `ms-agent-for-agentfabric-addition`:
 ```shell
 # Experimental environment: 8GPU
 nproc_per_node=8
diff --git a/docs/source_en/LLM/Supported-models-datasets.md b/docs/source_en/LLM/Supported-models-datasets.md
@@ -277,6 +277,10 @@ The table below introduces the datasets supported by SWIFT:
 |damo-agent-zh|[damo/MSAgent-Bench](https://modelscope.cn/datasets/damo/MSAgent-Bench/summary)|422115|161|965.7±440.9, min=321, max=31535|chat, agent, multi-round|-|
 |damo-agent-mini-zh|[damo/MSAgent-Bench](https://modelscope.cn/datasets/damo/MSAgent-Bench/summary)|39964|152|1230.9±350.1, min=558, max=4982|chat, agent, multi-round|-|
 |agent-instruct-all-en|[huangjintao/AgentInstruct_copy](https://modelscope.cn/datasets/huangjintao/AgentInstruct_copy/summary)|1866|0|1144.3±635.5, min=206, max=6412|chat, agent, multi-round|-|
+|toolbench-for-alpha-umi-backbone|[shenweizhou/alpha-umi-toolbench-processed-v2](https://modelscope.cn/datasets/shenweizhou/alpha-umi-toolbench-processed-v2/summary)|500951|0|1479.3±885.4, min=221, max=18467|chat, agent|-|
+|toolbench-for-alpha-umi-caller|[shenweizhou/alpha-umi-toolbench-processed-v2](https://modelscope.cn/datasets/shenweizhou/alpha-umi-toolbench-processed-v2/summary)|369279|0|1405.7±784.8, min=214, max=17912|chat, agent|-|
+|toolbench-for-alpha-umi-planner|[shenweizhou/alpha-umi-toolbench-processed-v2](https://modelscope.cn/datasets/shenweizhou/alpha-umi-toolbench-processed-v2/summary)|494975|0|1391.1±866.5, min=173, max=18451|chat, agent|-|
+|toolbench-for-alpha-umi-summarizer|[shenweizhou/alpha-umi-toolbench-processed-v2](https://modelscope.cn/datasets/shenweizhou/alpha-umi-toolbench-processed-v2/summary)|83132|0|1641.5±839.0, min=123, max=9852|chat, agent|-|
 |code-alpaca-en|[wyj123456/code_alpaca_en](https://modelscope.cn/datasets/wyj123456/code_alpaca_en/summary)|20016|0|100.1±60.1, min=29, max=1776|chat, coding|[sahil2801/CodeAlpaca-20k](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k)|
 |🔥leetcode-python-en|[AI-ModelScope/leetcode-solutions-python](https://modelscope.cn/datasets/AI-ModelScope/leetcode-solutions-python/summary)|2359|0|723.8±233.5, min=259, max=2117|chat, coding|-|
 |🔥codefuse-python-en|[codefuse-ai/CodeExercise-Python-27k](https://modelscope.cn/datasets/codefuse-ai/CodeExercise-Python-27k/summary)|27224|0|483.6±193.9, min=45, max=3082|chat, coding|-|
diff --git a/swift/llm/agent/utils.py b/swift/llm/agent/utils.py
@@ -55,5 +55,24 @@ def calculate_loss_scale(response: str,
             agent_content.append(c['key'])
             agent_content.append(c['content'])
         return agent_content, weights
+    elif ('Action:' in response
+          or 'Next:' in response) and use_loss_scale:  # alpha-umi
+        agent_keyword = ['Next:', 'Action:', 'Action Input:']
+        agent_parts = split_str_parts_by(response, agent_keyword)
+        weights = []
+        agent_content = []
+        for c in agent_parts:
+            if c['key'] in ('Action:', 'Action Input:', 'Next:'):
+                weights += [2.0]
+                weights += [2.0]
+            elif c['key'] in ('Thought:', 'Final Answer:', ''):
+                weights += [1.0]
+                weights += [1.0]
+            elif c['key'] in ('Observation:', ):
+                weights += [2.0]
+                weights += [0.0]
+            agent_content.append(c['key'])
+            agent_content.append(c['content'])
+        return agent_content, weights
     else:
         return [response], [1.0]
diff --git a/swift/llm/utils/dataset.py b/swift/llm/utils/dataset.py
@@ -75,6 +75,10 @@ class DatasetName:
     damo_agent_zh = 'damo-agent-zh'
     damo_agent_mini_zh = 'damo-agent-mini-zh'
     agent_instruct_all_en = 'agent-instruct-all-en'
+    toolbench_for_alpha_umi_backbone = 'toolbench-for-alpha-umi-backbone'
+    toolbench_for_alpha_umi_caller = 'toolbench-for-alpha-umi-caller'
+    toolbench_for_alpha_umi_planner = 'toolbench-for-alpha-umi-planner'
+    toolbench_for_alpha_umi_summarizer = 'toolbench-for-alpha-umi-summarizer'
     # coding
     code_alpaca_en = 'code-alpaca-en'
     leetcode_python_en = 'leetcode-python-en'
@@ -1115,6 +1119,14 @@ def _preprocess_capcha_images(dataset: HfDataset) -> HfDataset:
     return dataset
 
 
+def _repair_planner(conversations: list) -> list:
+    if isinstance(conversations, str):
+        conversations = ast.literal_eval(conversations)
+    if len(conversations) == 2 and conversations[0]['from'] != 'user':
+        conversations[0]['from'] = 'user'
+    return conversations
+
+
 register_dataset(
     DatasetName.capcha_images,
     'AI-ModelScope/captcha-images', [('default', 'train')],
@@ -1158,6 +1170,39 @@ def _preprocess_capcha_images(dataset: HfDataset) -> HfDataset:
     get_dataset_from_repo,
     tags=['chat', 'coding', '🔥'])
 
+register_dataset(
+    DatasetName.toolbench_for_alpha_umi_backbone,
+    'shenweizhou/alpha-umi-toolbench-processed-v2', [('backbone', 'train')],
+    None,
+    ConversationsPreprocessor('system', system_role=None),
+    get_dataset_from_repo,
+    tags=['chat', 'agent'])
+
+register_dataset(
+    DatasetName.toolbench_for_alpha_umi_caller,
+    'shenweizhou/alpha-umi-toolbench-processed-v2', [('caller', 'train')],
+    None,
+    ConversationsPreprocessor('system', 'caller', None),
+    get_dataset_from_repo,
+    tags=['chat', 'agent'])
+
+register_dataset(
+    DatasetName.toolbench_for_alpha_umi_planner,
+    'shenweizhou/alpha-umi-toolbench-processed-v2', [('planner', 'train')],
+    None,
+    ConversationsPreprocessor(
+        repair_conversations=_repair_planner, error_strategy='delete'),
+    get_dataset_from_repo,
+    tags=['chat', 'agent'])
+
+register_dataset(
+    DatasetName.toolbench_for_alpha_umi_summarizer,
+    'shenweizhou/alpha-umi-toolbench-processed-v2', [('summarizer', 'train')],
+    None,
+    ConversationsPreprocessor('system', 'conclusion', None),
+    get_dataset_from_repo,
+    tags=['chat', 'agent'])
+
 
 def _preprocess_blossom_math(dataset: HfDataset) -> HfDataset:
     response = []