Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 36 additions & 4 deletions docs/docs/getting_started/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,42 @@ slug: /

# Welcome to CocoIndex

Prepare high quality data that is tailored for the purpose is essential for a successful AI application in production.
CocoIndex is a ultra performant real-time data transformation framework for AI, with incremental processing.

CocoIndex is a data indexing platform for AI use cases - semantic search, RAG, agentic workflow on top of embedding / knowledge graph etc. CocoIndex aims to be the best in class scalable data indexing infrastructure with built in observability and lineage.
As a data framework, CocoIndex takes it to the next level on data freshness. **Incremental processing** is one of the core values provided by CocoIndex.

CocoIndex can help you connecting to all the data sources, identify the best indexing strategy and setup the most robust pipeline - chunking, embedding model, deduping/reconciling, vector stores, knowledge graph etc. And then providing standard API to access the index.
<p align="center">
<img src="https://github.com/user-attachments/assets/f4eb29b3-84ee-4fa0-a1e2-80eedeeabde6" alt="Incremental Processing" width="700" />
</p>


## Programming Model
CocoIndex follows the idea of [Dataflow programming](https://en.wikipedia.org/wiki/Dataflow_programming) model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

The gist of an example data transformation:
```
# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
.transform(...)
.transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)
```


An example dataflow diagram:
<p align="center">
<img width="700" alt="DataFlow" src="https://github.com/user-attachments/assets/22069379-99b1-478b-a131-15e2a9539d35" />
</p>


Get Started:
- [Quick Start](https://cocoindex.io/docs/getting_started/quickstart)

CocoIndex does all the heavy lifting work and plumbing for the data, so you can focus on your business logic and build your AI application on top of robust data indices.