Skip to content

Commit cb1098c

Browse files
committed
[update] revise sections
1 parent 85ff51c commit cb1098c

File tree

1 file changed

+17
-9
lines changed

1 file changed

+17
-9
lines changed

README.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,25 +18,33 @@
1818

1919
</div>
2020

21+
22+
## News
23+
- [2025-07-25] 🎉 We release the dataflow-agent.
24+
- [2025-06-30] 🎉 We release the documentation of dataflow.
25+
<!-- - [2025-05-30] 🎉 We added two data processing pipelines, i.e. knowledge base cleaning, and agentic rag data construction pipeline. -->
26+
<!-- - [2025-04-30] 🎉 We added four data processing pipelines, i.e. text, code, nl2sql, and reasoning data pipeline. -->
27+
<!-- - [2024-12-26] 🎉 Our first data evaluation and processing system is now open source. -->
28+
- [2024-10-14] 🎉 We summarize data evaluation papers and codes in [👋 Awesome Data Evaluation](./Awesome_Data_Evaluation.md)
29+
- [2024-10-14] 🎉 Our first data-centric evaluation system is now open source.
30+
-
2131
## Overview
22-
DataFlow is a data evaluation and processing system designed to 1) evaluate data quality from multiple dimensions; 2) filter out high-quality data and 3) generate chain-of-thought or other types of augmentation. We mainly support SOTA algorithms within academic papers with strong theoretical support.
32+
DataFlow is a data evaluation and processing system designed to:
33+
1. Evaluate data quality from multiple dimensions;
34+
2. Filter out high-quality data and
35+
3. Generate chain-of-thought or other types of augmentation. We mainly support SOTA algorithms within academic papers with strong theoretical support.
2336

2437
<!-- We now support text, image, video, and multimodality data types. -->
2538
Specifically, we first build various `operators` based on rules, LLMs, and LLM APIs, which are then assembled into six `pipelines`. These pipelines form the complete `Dataflow` system. Further, We also build an `agent` that can flexibly compose new pipelines with existing `operators` on demand.
2639

40+
41+
## Pipelines
2742
Current Pipelines in Dataflow are as follows:
2843
- **Reasoning Pipeline**: Enhances existing question–answer pairs with (1) extended chain-of-thought, (2) category classification, and (3) difficulty estimation.
2944
- **Text2SQL Pipeline**: Translates natural language questions into SQL queries, supplemented with explanations, chain-of-thought reasoning, and contextual schema information.
3045

3146

32-
## News
33-
- [2025-07-25] 🎉 We release the dataflow-agent.
34-
- [2025-06-30] 🎉 We release the documentation of dataflow.
35-
- [2025-05-30] 🎉 We added two data processing pipelines, i.e. knowledge base cleaning, and agentic rag data construction pipeline.
36-
- [2025-04-30] 🎉 We added four data processing pipelines, i.e. text, code, nl2sql, and reasoning data pipeline.
37-
- [2024-12-26] 🎉 Our first data evaluation and processing system is now open source.
38-
- [2024-10-14] 🎉 We summarize data evaluation papers and codes in [👋 Awesome Data Evaluation](./Awesome_Data_Evaluation.md)
39-
- [2024-10-14] 🎉 Our first data-centric evaluation system is now open source.
47+
4048

4149
## Installation
4250
For environment setup, please using the following commands👇

0 commit comments

Comments
 (0)