Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/💡-feature-request.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@ assignees: ''

---
❤️ Contributors, please refer to 📙[Contributing Guide](https://cocoindex.io/docs/about/contributing).
Unless the PR can be sent immediately (e.g. just a few lines of code), we recommend you to leave a comment on the issue like **`I'm working on it`** or **`Can I work on this issue?`** to avoid duplicating work. Our [Discord server](https://discord.com/invite/zpA9S2DR7s) is always open and friendly.
Unless the PR can be sent immediately (e.g. just a few lines of code), we recommend you to leave a comment on the issue like **`I'm working on it`** or **`Can I work on this issue?`** to avoid duplicating work. Our [Discord server](https://discord.com/invite/zpA9S2DR7s) is always open and friendly.
2 changes: 1 addition & 1 deletion .github/scripts/update_version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@ else
fi

# Update Cargo.toml
sed "${SED_INLINE[@]}" "s/^version = .*/version = \"$VERSION\"/" Cargo.toml
sed "${SED_INLINE[@]}" "s/^version = .*/version = \"$VERSION\"/" Cargo.toml
71 changes: 71 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
ci:
autofix_prs: false
autoupdate_schedule: 'monthly'

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: check-case-conflict
# Check for files with names that would conflict on a case-insensitive
# filesystem like MacOS HFS+ or Windows FAT.
- id: check-merge-conflict
# Check for files that contain merge conflict strings.
- id: check-symlinks
# Checks for symlinks which do not point to anything.
exclude: ".*(.github.*)$"
- id: detect-private-key
# Checks for the existence of private keys.
- id: end-of-file-fixer
# Makes sure files end in a newline and only a newline.
exclude: ".*(data.*|licenses.*|_static.*|\\.ya?ml|\\.jpe?g|\\.png|\\.svg|\\.webp)$"
- id: trailing-whitespace
# Trims trailing whitespace.
exclude_types: [python] # Covered by Ruff W291.
exclude: ".*(data.*|licenses.*|_static.*|\\.ya?ml|\\.jpe?g|\\.png|\\.svg|\\.webp)$"

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.12.0
hooks:
- id: ruff-format
types: [python]
pass_filenames: true

- repo: https://github.com/christophmeissner/pytest-pre-commit
rev: 1.0.0
hooks:
- id: pytest
language: system
types: [python]
pass_filenames: false
always_run: false

- repo: local
hooks:
- id: mypy-check
name: mypy type check
entry: mypy
language: system
types: [python]
pass_filenames: false

- id: maturin-develop
name: maturin develop
entry: maturin develop
language: system
types: [rust]
pass_filenames: false

- id: cargo-fmt
name: cargo fmt
entry: cargo fmt
language: system
types: [rust]
pass_filenames: false

- id: cargo-test
name: cargo test
entry: cargo test
language: system
types: [rust]
pass_filenames: false
2 changes: 1 addition & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@
],
"editor.formatOnSave": true,
"python.formatting.provider": "ruff"
}
}
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
We love contributions from our community ❤️. Please check out our [contributing guide](https://cocoindex.io/docs/about/contributing).
We love contributions from our community ❤️. Please check out our [contributing guide](https://cocoindex.io/docs/about/contributing).
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,10 @@ Unlike a workflow orchestration framework where data is usually opaque, in CocoI

```python
# import
data['content'] = flow_builder.add_source(...)
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
data['out'] = data['content']
.transform(...)
.transform(...)

Expand All @@ -56,17 +56,17 @@ As a data framework, CocoIndex takes it to the next level on data freshness. **I
The frameworks takes care of
- Change data capture.
- Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.


## Quick Start:
If you're new to CocoIndex, we recommend checking out
If you're new to CocoIndex, we recommend checking out
- 📖 [Documentation](https://cocoindex.io/docs)
- ⚡ [Quick Start Guide](https://cocoindex.io/docs/getting_started/quickstart)
- 🎬 [Quick Start Video Tutorial](https://youtu.be/gv5R8nOXsWU?si=9ioeKYkMEnYevTXT)
- 🎬 [Quick Start Video Tutorial](https://youtu.be/gv5R8nOXsWU?si=9ioeKYkMEnYevTXT)

### Setup
### Setup

1. Install CocoIndex Python library

Expand Down Expand Up @@ -136,8 +136,8 @@ It defines an index flow like this:
| [Google Drive Text Embedding](examples/gdrive_text_embedding) | Index text documents from Google Drive |
| [Docs to Knowledge Graph](examples/docs_to_knowledge_graph) | Extract relationships from Markdown documents and build a knowledge graph |
| [Embeddings to Qdrant](examples/text_embedding_qdrant) | Index documents in a Qdrant collection for semantic search |
| [FastAPI Server with Docker](examples/fastapi_server_docker) | Run the semantic search server in a Dockerized FastAPI setup |
| [Product Recommendation](examples/product_recommendation) | Build real-time product recommendations with LLM and graph database|
| [FastAPI Server with Docker](examples/fastapi_server_docker) | Run the semantic search server in a Dockerized FastAPI setup |
| [Product Recommendation](examples/product_recommendation) | Build real-time product recommendations with LLM and graph database|
| [Image Search with Vision API](examples/image_search) | Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend|

More coming and stay tuned 👀!
Expand All @@ -159,7 +159,7 @@ Join our community here:
- 📜 [Read our blog posts](https://cocoindex.io/blogs/)

## Support us:
We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo [![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex) to stay tuned and help us grow.
We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo [![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex) to stay tuned and help us grow.

## License
CocoIndex is Apache 2.0 licensed.
12 changes: 0 additions & 12 deletions check.sh

This file was deleted.

29 changes: 19 additions & 10 deletions docs/docs/about/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,22 @@ We use [GitHub Issues](https://github.com/cocoindex-io/cocoindex/issues) to trac

We tag issues with the ["good first issue"](https://github.com/cocoindex-io/cocoindex/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) label for beginner contributors.

## How to Contribute
## How to Contribute
- If you decide to work on an issue, unless the PR can be sent immediately (e.g. just a few lines of code), we recommend you to leave a comment on the issue like **`I'm working on it`** or **`Can I work on this issue?`** to avoid duplicating work.
- For larger features, we recommend you to discuss with us first in our [Discord server](https://discord.com/invite/zpA9S2DR7s) to coordinate the design and work.
- Our [Discord server](https://discord.com/invite/zpA9S2DR7s) are constantly open. If you are unsure about anything, it is a good place to discuss! We'd love to collaborate and will always be friendly.

## Start hacking! Setting Up Development Environment
## Start hacking! Setting Up Development Environment
Following the steps below to get cocoindex build on latest codebase locally - if you are making changes to cocoindex funcionality and want to test it out.

- 🦀 [Install Rust](https://rust-lang.org/tools/install)

If you don't have Rust installed, run
```sh
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```
Already have Rust? Make sure it's up to date
```sh
Already have Rust? Make sure it's up to date
```sh
rustup update
```

Expand All @@ -46,14 +46,19 @@ Following the steps below to get cocoindex build on latest codebase locally - if

- Install required tools:
```sh
pip install maturin mypy ruff
pip install maturin mypy pre-commit
```

- Build the library. Run at the root of cocoindex directory:
```sh
maturin develop
```

- Install and enable pre-commit hooks. This ensures all checks run automatically before each commit:
```sh
pre-commit install
```

- Before running a specific example, set extra environment variables, for exposing extra traces, allowing dev UI, etc.
```sh
. ./.env.lib_debug
Expand All @@ -67,10 +72,14 @@ To submit your code:
1. Fork the [CocoIndex repository](https://github.com/cocoindex-io/cocoindex)
2. [Create a new branch](https://docs.github.com/en/desktop/making-changes-in-a-branch/managing-branches-in-github-desktop) on your fork
3. Make your changes
4. Make sure all tests and linting pass by running
```sh
./check.sh
```
4. Run the pre-commit checks (automatically triggered on `git commit`)

:::tip
To run them manually (same as CI):
```sh
pre-commit run --all-files
```
:::

5. [Open a Pull Request (PR)](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork) when your work is ready for review

Expand Down
6 changes: 3 additions & 3 deletions docs/docs/ai/llm.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -136,9 +136,9 @@ pip install 'litellm[proxy]'
**Example for OpenAI:**
```yaml
model_list:
- model_name: "*"
- model_name: "*"
litellm_params:
model: openai/*
model: openai/*
api_key: os.environ/LITELLM_API_KEY
```

Expand Down Expand Up @@ -176,7 +176,7 @@ litellm --config config.yml
```python
cocoindex.LlmSpec(
api_type=cocoindex.LlmApiType.LITE_LLM,
model="deepseek-r1",
model="deepseek-r1",
address="http://127.0.0.1:4000", # default url of LiteLLM
)
```
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/core/basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ An indexing flow, once set up, maintains a long-lived relationship between data

* **One time update**: Once triggered, CocoIndex updates the target data to reflect the version of source data up to the current moment.
* **Live update**: CocoIndex continuously reacts to changes of source data and updates the target data accordingly, based on various **change capture mechanisms** for the source.

See more details in the [build / update target data](flow_methods#build--update-target-data) section.

3. CocoIndex intelligently reprocesses to propagate source changes to target by:
Expand Down Expand Up @@ -101,4 +101,4 @@ As an indexing flow is long-lived, it needs to store intermediate data to keep t
CocoIndex uses internal storage for this purpose.

Currently, CocoIndex uses Postgres database as the internal storage.
See [Settings](settings#databaseconnectionspec) for configuring its location, and `cocoindex setup` CLI command (see [CocoIndex CLI](cli)) creates tables for the internal storage.
See [Settings](settings#databaseconnectionspec) for configuring its location, and `cocoindex setup` CLI command (see [CocoIndex CLI](cli)) creates tables for the internal storage.
2 changes: 1 addition & 1 deletion docs/docs/core/cli.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,4 @@ Use `--help` to see the full list of subcommands, and `subcommand --help` to see
```sh
cocoindex --help # Show all subcommands
cocoindex show --help # Show usage of "show" subcommand
```
```
2 changes: 1 addition & 1 deletion docs/docs/core/custom_function.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ The cocoindex repository contains the following examples of custom functions def
* In the [pdf_embedding](https://github.com/cocoindex-io/cocoindex/blob/main/examples/pdf_embedding/main.py) example, we define a custom function `PdfToMarkdown`
* The `SentenceTransformerEmbed` function shipped with the CocoIndex Python package is defined by Python SDK.
Search for [`SentenceTransformerEmbedExecutor`](https://github.com/search?q=repo%3Acocoindex-io%2Fcocoindex+lang%3Apython+SentenceTransformerEmbedExecutor&type=code) to see the code.

## Parameters for custom functions

Custom functions take the following additional parameters:
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/core/flow_def.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ A data scope has a bunch of fields and collectors, and users can add new fields

### Get or Add a Field

You can get or add a field of a data scope (which is a data slice).
You can get or add a field of a data scope (which is a data slice).

:::note

Expand Down
6 changes: 3 additions & 3 deletions docs/docs/core/flow_methods.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -182,10 +182,10 @@ CocoIndex also provides asynchronous versions of APIs for blocking operations, i
my_updater = cocoindex.FlowLiveUpdater(demo_flow)
# Start the updater.
await my_updater.start_async()

# Perform your own logic (e.g. a query loop).
...

# Print the update stats.
print(my_updater.update_stats())
# Abort the updater.
Expand Down Expand Up @@ -245,4 +245,4 @@ demo_flow.evaluate_and_dump(EvaluateAndDumpOptions(output_dir="./eval_output"))
```

</TabItem>
</Tabs>
</Tabs>
2 changes: 1 addition & 1 deletion docs/docs/core/settings.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -113,4 +113,4 @@ This is the list of environment variables, each of which has a corresponding fie
| `COCOINDEX_DATABASE_URL` | `database.url` | Yes |
| `COCOINDEX_DATABASE_USER` | `database.user` | No |
| `COCOINDEX_DATABASE_PASSWORD` | `database.password` | No |
| `COCOINDEX_APP_NAMESPACE` | `app_namespace` | No |
| `COCOINDEX_APP_NAMESPACE` | `app_namespace` | No |
5 changes: 2 additions & 3 deletions docs/docs/getting_started/installation.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Installation
title: Installation
description: Setup the CocoIndex environment in 0-3 min
---

Expand All @@ -17,7 +17,7 @@ pip install -U cocoindex

## 📦 Install Postgres

You can skip this step if you already have a Postgres database with pgvector extension installed.
You can skip this step if you already have a Postgres database with pgvector extension installed.

If you don't have a Postgres database:

Expand All @@ -31,4 +31,3 @@ docker compose -f <(curl -L https://raw.githubusercontent.com/cocoindex-io/cocoi
## 🎉 All set!

You can now start using CocoIndex.

7 changes: 3 additions & 4 deletions docs/docs/getting_started/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ slug: /

# Welcome to CocoIndex

CocoIndex is an ultra-performant real-time data transformation framework for AI, with incremental processing.
CocoIndex is an ultra-performant real-time data transformation framework for AI, with incremental processing.

As a data framework, CocoIndex takes it to the next level on data freshness. **Incremental processing** is one of the core values provided by CocoIndex.

Expand All @@ -17,10 +17,10 @@ CocoIndex follows the idea of [Dataflow programming](https://en.wikipedia.org/wi
The gist of an example data transformation:
```python
# import
data['content'] = flow_builder.add_source(...)
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
data['out'] = data['content']
.transform(...)
.transform(...)

Expand All @@ -33,4 +33,3 @@ collector.export(...)

Get Started:
- [Quick Start](https://cocoindex.io/docs/getting_started/quickstart)

6 changes: 3 additions & 3 deletions docs/docs/getting_started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ This guide will help you get up and running with CocoIndex in just a few minutes
We'll need to install a bunch of dependencies for this project.

1. Install CocoIndex:

```bash
pip install -U cocoindex
```
Expand Down Expand Up @@ -149,7 +149,7 @@ documents: 3 added, 0 removed, 0 updated

## Step 4 (optional): Run queries against the index

CocoIndex excels at transforming your data and storing it (a.k.a. indexing).
CocoIndex excels at transforming your data and storing it (a.k.a. indexing).
The goal of transforming your data is usually to query against it.
Once you already have your index built, you can directly access the transformed data in the target database.
CocoIndex also provides utilities for you to do this more seamlessly.
Expand Down Expand Up @@ -291,4 +291,4 @@ Next, you may want to:
* Learn about [CocoIndex Basics](../core/basics.md).
* Learn about other examples in the [examples](https://github.com/cocoindex-io/cocoindex/tree/main/examples) directory.
* The `text_embedding` example is this quickstart.
* Pick other examples to learn upon your interest.
* Pick other examples to learn upon your interest.
Loading