|
18 | 18 |
|
19 | 19 | </div> |
20 | 20 |
|
| 21 | + |
| 22 | +## News |
| 23 | +- [2025-07-25] 🎉 We release the dataflow-agent. |
| 24 | +- [2025-06-30] 🎉 We release the documentation of dataflow. |
| 25 | +<!-- - [2025-05-30] 🎉 We added two data processing pipelines, i.e. knowledge base cleaning, and agentic rag data construction pipeline. --> |
| 26 | +<!-- - [2025-04-30] 🎉 We added four data processing pipelines, i.e. text, code, nl2sql, and reasoning data pipeline. --> |
| 27 | +<!-- - [2024-12-26] 🎉 Our first data evaluation and processing system is now open source. --> |
| 28 | +- [2024-10-14] 🎉 We summarize data evaluation papers and codes in [👋 Awesome Data Evaluation](./Awesome_Data_Evaluation.md) |
| 29 | +- [2024-10-14] 🎉 Our first data-centric evaluation system is now open source. |
| 30 | +- |
21 | 31 | ## Overview |
22 | | -DataFlow is a data evaluation and processing system designed to 1) evaluate data quality from multiple dimensions; 2) filter out high-quality data and 3) generate chain-of-thought or other types of augmentation. We mainly support SOTA algorithms within academic papers with strong theoretical support. |
| 32 | +DataFlow is a data evaluation and processing system designed to: |
| 33 | +1. Evaluate data quality from multiple dimensions; |
| 34 | +2. Filter out high-quality data and |
| 35 | +3. Generate chain-of-thought or other types of augmentation. We mainly support SOTA algorithms within academic papers with strong theoretical support. |
23 | 36 |
|
24 | 37 | <!-- We now support text, image, video, and multimodality data types. --> |
25 | 38 | Specifically, we first build various `operators` based on rules, LLMs, and LLM APIs, which are then assembled into six `pipelines`. These pipelines form the complete `Dataflow` system. Further, We also build an `agent` that can flexibly compose new pipelines with existing `operators` on demand. |
26 | 39 |
|
| 40 | + |
| 41 | +## Pipelines |
27 | 42 | Current Pipelines in Dataflow are as follows: |
28 | 43 | - **Reasoning Pipeline**: Enhances existing question–answer pairs with (1) extended chain-of-thought, (2) category classification, and (3) difficulty estimation. |
29 | 44 | - **Text2SQL Pipeline**: Translates natural language questions into SQL queries, supplemented with explanations, chain-of-thought reasoning, and contextual schema information. |
30 | 45 |
|
31 | 46 |
|
32 | | -## News |
33 | | -- [2025-07-25] 🎉 We release the dataflow-agent. |
34 | | -- [2025-06-30] 🎉 We release the documentation of dataflow. |
35 | | -- [2025-05-30] 🎉 We added two data processing pipelines, i.e. knowledge base cleaning, and agentic rag data construction pipeline. |
36 | | -- [2025-04-30] 🎉 We added four data processing pipelines, i.e. text, code, nl2sql, and reasoning data pipeline. |
37 | | -- [2024-12-26] 🎉 Our first data evaluation and processing system is now open source. |
38 | | -- [2024-10-14] 🎉 We summarize data evaluation papers and codes in [👋 Awesome Data Evaluation](./Awesome_Data_Evaluation.md) |
39 | | -- [2024-10-14] 🎉 Our first data-centric evaluation system is now open source. |
| 47 | + |
40 | 48 |
|
41 | 49 | ## Installation |
42 | 50 | For environment setup, please using the following commands👇 |
|
0 commit comments