You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CocoIndex is ultra performant data transformation framework, core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either embedding, knowledge graph, or a series of data transformation - and take the real-time data pipeline beyond traditional SQL.
20
+
CocoIndex is ultra performant data transformation framework, core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graph, or performing other data transformations - and take the real-time data pipeline beyond traditional SQL.
The philosophy is to have the framework handle the source updates, and having developers only focus on defining a series of data transformation, inspired by spreadsheet.
26
+
The philosophy is to have the framework handle the source updates, and having developers only focus on defining a series of data transformation, inspired by spreadsheets.
27
27
28
28
## Data Flow programming
29
-
CocoIndexfollows [Data flow](https://en.wikipedia.org/wiki/Dataflow_programming) programming model. Compare with traditional orchestration framework, where data is opaque. In CocoIndex data and data operation are first class citizen, and there's no side effects for each data operation. All data are observable in each transformation, with lineage out of the box.
29
+
Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of [Dataflow](https://en.wikipedia.org/wiki/Dataflow_programming) programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each each transformation is observable, with lineage out of the box.
30
30
31
-
Particularly, user don't define data operations like creation, update, deletion. But rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations like when to create, update, or delete. For example:
31
+
Particularly, users don't define data operations like creation, update, deletion. But rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations like when to create, update, or delete. For example:
32
32
33
33
```python
34
-
// ingest
34
+
# import
35
35
data['content'] = flow_builder.add_source(...)
36
36
37
-
// transform
37
+
# transform
38
38
data['out'] = data['content']
39
39
.transform(...)
40
40
.transform(...)
41
41
42
-
// collect data
42
+
# collect data
43
43
collector.collect(...)
44
44
45
-
// export to db, vector db, graph db ...
45
+
# export to db, vector db, graph db ...
46
46
collector.export(...)
47
47
```
48
48
@@ -54,8 +54,9 @@ As a data framework, CocoIndex takes it to the next level on data freshness. **I
54
54
</p>
55
55
56
56
The frameworks takes care of
57
-
- Change data capture
58
-
- Figuring out what exactly needs to be updated, and only updating that without having to recompute everything throughout.
57
+
- Change data capture.
58
+
- Figure out what exactly needs to be updated, and only updating that without having to recompute everything.
59
+
59
60
This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.
0 commit comments