Skip to content

Commit 2c6f978

Browse files
Update README.md
1 parent 31b2a6a commit 2c6f978

File tree

1 file changed

+11
-10
lines changed

1 file changed

+11
-10
lines changed

README.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -17,32 +17,32 @@
1717
[![Discord](https://img.shields.io/discord/1314801574169673738?logo=discord&color=5B5BD6&logoColor=white)](https://discord.com/invite/zpA9S2DR7s)
1818
</div>
1919

20-
CocoIndex is ultra performant data transformation framework, core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either embedding, knowledge graph, or a series of data transformation - and take the real-time data pipeline beyond traditional SQL.
20+
CocoIndex is ultra performant data transformation framework, core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graph, or performing other data transformations - and take the real-time data pipeline beyond traditional SQL.
2121

2222
<p align="center">
2323
<img src="https://cocoindex.io/images/cocoindex-features.png" alt="CocoIndex Features" width="500">
2424
</p>
2525

26-
The philosophy is to have the framework handle the source updates, and having developers only focus on defining a series of data transformation, inspired by spreadsheet.
26+
The philosophy is to have the framework handle the source updates, and having developers only focus on defining a series of data transformation, inspired by spreadsheets.
2727

2828
## Data Flow programming
29-
CocoIndex follows [Data flow](https://en.wikipedia.org/wiki/Dataflow_programming) programming model. Compare with traditional orchestration framework, where data is opaque. In CocoIndex data and data operation are first class citizen, and there's no side effects for each data operation. All data are observable in each transformation, with lineage out of the box.
29+
Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of [Dataflow](https://en.wikipedia.org/wiki/Dataflow_programming) programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each each transformation is observable, with lineage out of the box.
3030

31-
Particularly, user don't define data operations like creation, update, deletion. But rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations like when to create, update, or delete. For example:
31+
Particularly, users don't define data operations like creation, update, deletion. But rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations like when to create, update, or delete. For example:
3232

3333
```python
34-
// ingest
34+
# import
3535
data['content'] = flow_builder.add_source(...)
3636

37-
// transform
37+
# transform
3838
data['out'] = data['content']
3939
.transform(...)
4040
.transform(...)
4141

42-
// collect data
42+
# collect data
4343
collector.collect(...)
4444

45-
// export to db, vector db, graph db ...
45+
# export to db, vector db, graph db ...
4646
collector.export(...)
4747
```
4848

@@ -54,8 +54,9 @@ As a data framework, CocoIndex takes it to the next level on data freshness. **I
5454
</p>
5555

5656
The frameworks takes care of
57-
- Change data capture
58-
- Figuring out what exactly needs to be updated, and only updating that without having to recompute everything throughout.
57+
- Change data capture.
58+
- Figure out what exactly needs to be updated, and only updating that without having to recompute everything.
59+
5960
This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.
6061

6162

0 commit comments

Comments
 (0)