Skip to content

Commit aceb51f

Browse files
authored
Merge pull request #97 from MOLYHECI/main
add quick eval general text & func call operators docs
2 parents 325e1a9 + a5df1e3 commit aceb51f

File tree

9 files changed

+923
-10
lines changed

9 files changed

+923
-10
lines changed

docs/.vuepress/notes/en/guide.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ export const Guide: ThemeNote = defineNoteConfig({
3030
'prompted_vqa',
3131
'mathquestion_extract',
3232
'knowledge_cleaning',
33+
'quick_general_text_evaluation'
3334
],
3435
},
3536
// {
@@ -79,6 +80,7 @@ export const Guide: ThemeNote = defineNoteConfig({
7980
"rare_operators",
8081
"knowledgebase_QA_operators",
8182
"agenticrag_operators",
83+
"funccall_operators"
8284
]
8385
},
8486
{

docs/.vuepress/notes/zh/guide.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ export const Guide: ThemeNote = defineNoteConfig({
3030
"prompted_vqa",
3131
"mathquestion_extract",
3232
'knowledge_cleaning',
33+
'quick_general_text_evaluation'
3334
],
3435
},
3536
// {
@@ -79,6 +80,7 @@ export const Guide: ThemeNote = defineNoteConfig({
7980
"rare_operators",
8081
"knowledgebase_QA_operators",
8182
"agenticrag_operators",
83+
"funccall_operators"
8284
// "video_process",
8385
]
8486
},

docs/.vuepress/public/dim_eval.png

61.2 KB
Loading
Lines changed: 252 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,252 @@
1+
---
2+
title: Function Call Data Synthesis Operators
3+
createTime: 2025/07/20 21:50:53
4+
permalink: /en/guide/qdq6vy95/
5+
---
6+
7+
# Function Call Data Synthesis Operators
8+
9+
## Overview
10+
11+
Function call data synthesis operators are designed to synthesize structured function call data from dialogues or real-world task descriptions. These operators cover scenario extraction and expansion, task generation and validation, function generation, and multi-agent multi-turn conversation generation.
12+
13+
All related operators are located in [dataflow/operators/conversations/func_call_operators.py](https://github.com/OpenDCAI/DataFlow/blob/main/dataflow/operators/conversations/func_call_operators.py). The table below summarizes their applicable scenarios:
14+
15+
<table class="tg">
16+
<thead>
17+
<tr>
18+
<th class="tg-0pky">Name</th>
19+
<th class="tg-0pky">Type</th>
20+
<th class="tg-0pky">Description</th>
21+
<th class="tg-0pky">Repo or Paper</th>
22+
</tr>
23+
</thead>
24+
<tbody>
25+
<tr>
26+
<td class="tg-0pky">ScenarioExtractor</td>
27+
<td class="tg-0pky">Scenario Extraction</td>
28+
<td class="tg-0pky">Extracts scenario descriptions from conversations using LLM.</td>
29+
<td class="tg-0pky" rowspan="8">
30+
<a href="https://github.com/PKU-Baichuan-MLSystemLab/BUTTON">Data</a><br>
31+
<a href="https://arxiv.org/abs/2410.12952">Paper</a><br>
32+
</td>
33+
</tr>
34+
<tr>
35+
<td class="tg-0pky">ScenarioExpander</td>
36+
<td class="tg-0pky">Scenario Expansion</td>
37+
<td class="tg-0pky">Generates alternative scenarios based on original ones using LLM.</td>
38+
</tr>
39+
<tr>
40+
<td class="tg-0pky">AtomTaskGenerator</td>
41+
<td class="tg-0pky">Task Generation</td>
42+
<td class="tg-0pky">Generates atomic tasks from scenario descriptions using LLM.</td>
43+
</tr>
44+
<tr>
45+
<td class="tg-0pky">SequentialTaskGenerator</td>
46+
<td class="tg-0pky">Task Generation</td>
47+
<td class="tg-0pky">Generates subsequent tasks and composes them into sequential tasks.</td>
48+
</tr>
49+
<tr>
50+
<td class="tg-0pky">ParaSeqTaskGenerator</td>
51+
<td class="tg-0pky">Task Generation</td>
52+
<td class="tg-0pky">Generates parallel and subsequent tasks and combines them with the original task.</td>
53+
</tr>
54+
<tr>
55+
<td class="tg-0pky">CompositionTaskFilter</td>
56+
<td class="tg-0pky">Task Filtering</td>
57+
<td class="tg-0pky">Validates compositional tasks and filters out incomplete ones using LLM.</td>
58+
</tr>
59+
<tr>
60+
<td class="tg-0pky">FunctionGenerator</td>
61+
<td class="tg-0pky">Function Generation</td>
62+
<td class="tg-0pky">Generates function definitions for a given task composition and its subtasks.</td>
63+
</tr>
64+
<tr>
65+
<td class="tg-0pky">MultiTurnConversationGenerator</td>
66+
<td class="tg-0pky">Dialogue Generation</td>
67+
<td class="tg-0pky">Generates multi-turn conversations with User, Assistant, and Tool agents based on tasks and functions.</td>
68+
</tr>
69+
</tbody>
70+
</table>
71+
72+
## Operator Details
73+
74+
### 1. ScenarioExtractor ✨
75+
76+
**Description:**
77+
Extracts concise task scenario descriptions from dialogue using an LLM.
78+
79+
**Parameters:**
80+
81+
- `__init__()`
82+
- `llm_serving`: LLM interface instance
83+
- `run()`
84+
- `storage`: data storage interface
85+
- `input_chat_key`: field name for conversation input
86+
- `output_key`: output field name (default: `"scenario"`)
87+
88+
**Highlights:**
89+
90+
- Strong contextual understanding
91+
- Forms basis for downstream task generation
92+
- Supports batch processing
93+
94+
---
95+
96+
### 2. ScenarioExpander ✨
97+
98+
**Description:**
99+
Expands extracted task scenarios to generate varied alternatives via LLM.
100+
101+
**Parameters:**
102+
103+
- `__init__()`
104+
- `llm_serving`: LLM interface instance
105+
- `run()`
106+
- `storage`: data storage interface
107+
- `input_scenario_key`: field name of original scenario
108+
- `output_key`: output field name (default: `"modified_scenario"`)
109+
110+
**Highlights:**
111+
112+
- Enhances scenario diversity
113+
- Useful for data augmentation
114+
115+
---
116+
117+
### 3. AtomTaskGenerator ✨
118+
119+
**Description:**
120+
Generates fine-grained atomic tasks from a given scenario.
121+
122+
**Parameters:**
123+
124+
- `__init__()`
125+
- `llm_serving`: LLM interface instance
126+
- `run()`
127+
- `storage`: data storage interface
128+
- `input_scenario_key`: field name for scenario input
129+
- `output_key`: output field name (default: `"atom_task"`)
130+
131+
**Highlights:**
132+
133+
- Atomic-level task granularity
134+
- Task decomposition from scenario
135+
136+
---
137+
138+
### 4. SequentialTaskGenerator ✨
139+
140+
**Description:**
141+
Creates follow-up tasks and combines them with atomic tasks into a sequential flow.
142+
143+
**Parameters:**
144+
145+
- `__init__()`
146+
- `llm_serving`: LLM interface instance
147+
- `run()`
148+
- `storage`: data storage interface
149+
- `input_task_key`: field name for atomic task
150+
- `output_subsequent_task_key`: subsequent task field (default: `"subsequent_task"`)
151+
- `output_composition_task_key`: composed task field (default: `"composition_task"`)
152+
153+
**Highlights:**
154+
155+
- Supports multi-step task flow generation
156+
- Clear structure and traceability
157+
158+
---
159+
160+
### 5. ParaSeqTaskGenerator ✨
161+
162+
**Description:**
163+
Generates parallel and sequential extensions for an atomic task and composes them into a complex task.
164+
165+
**Parameters:**
166+
167+
- `__init__()`
168+
- `llm_serving`: LLM interface instance
169+
- `run()`
170+
- `storage`: data storage interface
171+
- `input_task_key`: atomic task field
172+
- `output_parallel_task_key`: parallel task field (default: `"parallel_task"`)
173+
- `output_subsequent_task_key`: subsequent task field (default: `"subsequent_task"`)
174+
- `output_composition_task_key`: composed task field (default: `"composition_task"`)
175+
176+
**Highlights:**
177+
178+
- Multi-dimensional task modeling
179+
- Captures concurrency and sequencing
180+
181+
---
182+
183+
### 6. CompositionTaskFilter ✨
184+
185+
**Description:**
186+
Validates if a composed task is logically complete and executable. Filters invalid or incoherent compositions.
187+
188+
**Parameters:**
189+
190+
- `__init__()`
191+
- `llm_serving`: LLM interface instance
192+
- `run()`
193+
- `storage`: data storage interface
194+
- `input_composition_task_key`: composed task field
195+
- `input_sub_tasks_keys`: list of subtask field names
196+
- `output_key`: label field for executability (default: `"runable_label"`)
197+
198+
**Highlights:**
199+
200+
- Logical and semantic validation
201+
- Filters for downstream function generation
202+
203+
---
204+
205+
### 7. FunctionGenerator ✨
206+
207+
**Description:**
208+
Generates structured function call specifications (name, parameters, doc) for a composed task and its subtasks.
209+
210+
**Parameters:**
211+
212+
- `__init__()`
213+
- `llm_serving`: LLM interface instance
214+
- `run()`
215+
- `storage`: data storage interface
216+
- `input_composition_task_key`: composed task field
217+
- `input_sub_tasks_keys`: subtask field names
218+
- `output_key`: output field for functions (default: `"functions"`)
219+
220+
**Highlights:**
221+
222+
- LLM-based function synthesis
223+
- Designed for tool/agent integration
224+
- Structured JSON-like output
225+
226+
---
227+
228+
### 8. MultiTurnConversationGenerator ✨🚀
229+
230+
**Description:**
231+
Simulates multi-turn conversations involving User, Assistant, and Tool agents to complete the composed task via function calls.
232+
233+
**Parameters:**
234+
235+
- `__init__()`
236+
- `llm_serving`: LLM interface instance
237+
- `run()`
238+
- `storage`: data storage interface
239+
- `input_task_key`: composed task field
240+
- `input_sub_tasks_keys`: list of subtask fields
241+
- `input_functions_key`: field name for function list
242+
- `output_conversations_key`: output field for conversations (default: `"conversations"`)
243+
244+
**Highlights:**
245+
246+
- Multi-agent interactive generation
247+
- Supports function call injection
248+
- Up to 5 full interaction rounds
249+
250+
---
251+
252+
For code examples, refer to the [Function Call Data Synthesis Pipeline](https://opendcai.github.io/DataFlow-Doc/en/guide/e6kz1s79/) or the [GitHub source file](https://github.com/OpenDCAI/DataFlow/blob/main/dataflow/operators/conversations/func_call_operators.py).

docs/en/notes/guide/pipelines/FuncCallPipeline.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ This input data can be stored in a specified file (e.g., `json`, `jsonl`) and ma
3535

3636
```python
3737
self.storage = FileStorage(
38-
    first_entry_file_name="./dataflow/example/FuncCallPipeline/chat_data.jsonl",
38+
first_entry_file_name="../example_data/FuncCallPipeline/chat_data.jsonl",
3939
    cache_path="./cache",
4040
    file_name_prefix="dataflow_cache_step",
4141
    cache_type="jsonl",
@@ -171,10 +171,13 @@ multi_turn_conversations_generator = MultiTurnConversationGenerator(
171171

172172
## 3. How to Run
173173

174-
This pipeline can be executed with a simple Python command:
174+
You can create a new working directory outside the `DataFlow` project path, for example, `workspace`, and run `dataflow init` inside it. This command will copy the pipelines and example data into your working directory. Then, navigate to the `api_pipelines/` path to execute the pipelines.
175175

176176
```bash
177-
python test/test_func_call.py
177+
cd workspace
178+
dataflow init
179+
cd api_pipelines/
180+
python func_call_synthesis.py
178181
```
179182

180183
## 4. Pipeline Example
@@ -199,7 +202,7 @@ class FuncCallPipeline:
199202
    def __init__(self):
200203

201204
        self.storage = FileStorage(
202-
            first_entry_file_name="./dataflow/example/FuncCallPipeline/chat_data.jsonl",
205+
first_entry_file_name="../example_data/FuncCallPipeline/chat_data.jsonl",
203206
            cache_path="./cache",
204207
            file_name_prefix="dataflow_cache_step",
205208
            cache_type="jsonl",

0 commit comments

Comments
 (0)