Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 136 additions & 0 deletions docs/docs/tutorials/live_updates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
---
title: Live Updates
description: "Keep your indexes up-to-date with live updates in CocoIndex."
---

# Live Updates

CocoIndex is designed to keep your indexes synchronized with your data sources. This is achieved through a feature called **live updates**, which automatically detects changes in your sources and updates your indexes accordingly. This ensures that your search results and data analysis are always based on the most current information.

## How Live Updates Work

Live updates in CocoIndex can be triggered in two main ways:

1. **Refresh Interval:** You can configure a `refresh_interval` for any data source. CocoIndex will then periodically check the source for any new, updated, or deleted data. This is a simple and effective way to keep your index fresh, especially for sources that don't have a built-in change notification system.

2. **Change Capture Mechanisms:** Some data sources offer more sophisticated ways to track changes. For example:
* **Amazon S3:** You can configure an SQS queue to receive notifications whenever a file is added, modified, or deleted in your S3 bucket. CocoIndex can listen to this queue and trigger an update instantly.
* **Google Drive:** The Google Drive source can be configured to poll for recent changes, which is more efficient than a full refresh.

When a change is detected, CocoIndex performs an **incremental update**. This means it only re-processes the data that has been affected by the change, without having to re-index your entire dataset. This makes the update process fast and efficient.

## Implementing Live Updates

You can enable live updates using either the CocoIndex CLI or the Python library.

### Using the CLI

To start a live update process from the command line, use the `update` command with the `-L` or `--live` flag:

```bash
cocoindex update -L your_flow_definition_file.py
```

This will start a long-running process that continuously monitors your data sources for changes and updates your indexes in real-time. You can stop the process by pressing `Ctrl+C`.

### Using the Python Library

For more control over the live update process, you can use the `FlowLiveUpdater` class in your Python code. This is particularly useful when you want to integrate CocoIndex into a larger application.

The `FlowLiveUpdater` can be used as a context manager, which automatically starts the updater when you enter the `with` block and stops it when you exit. The `wait()` method will block until the updater is aborted (e.g., by pressing `Ctrl+C`).

Here's how you can use `FlowLiveUpdater` to start and manage a live update process:

```python
import cocoindex

# Assume you have a flow defined as 'my_flow'
# from my_flows import my_flow

# Create a FlowLiveUpdater instance
with cocoindex.FlowLiveUpdater(my_flow, cocoindex.FlowLiveUpdaterOptions(print_stats=True)) as updater:
print("Live updater started. Press Ctrl+C to stop.")
# The updater runs in the background.
# The wait() method blocks until the updater is stopped.
updater.wait()

print("Live updater stopped.")
```

#### Getting Status Updates

You can also get status updates from the `FlowLiveUpdater` to monitor the update process. The `next_status_updates()` method blocks until there is a new status update.

```python
import cocoindex

# Assume you have a flow defined as 'my_flow'
# from my_flows import my_flow

updater = cocoindex.FlowLiveUpdater(my_flow)
updater.start()

while True:
updates = updater.next_status_updates()

if not updates.active_sources:
print("All sources have finished processing.")
break

for source_name in updates.updated_sources:
print(f"Source '{source_name}' has been updated.")

updater.wait()
```

This allows you to react to updates in your application, for example, by notifying users or triggering downstream processes.

## Example

Let's walk through an example of how to set up a live update flow. For the complete, runnable code, see the [live updates example](https://github.com/cocoindex-io/cocoindex/tree/main/examples/live_updates) in the CocoIndex repository.

### Setting up the Source

The first step is to define a source and configure a `refresh_interval`. In this example, we'll use a `LocalFile` source to monitor a directory named `data`.

```python
@cocoindex.flow_def(name="LiveUpdateExample")
def live_update_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
# Source: local files in the 'data' directory
data_scope["documents"] = flow_builder.add_source(
cocoindex.sources.LocalFile(path="data"),
refresh_interval=cocoindex.timedelta(seconds=5),
)
# ...
```

By setting `refresh_interval` to 5 seconds, we're telling CocoIndex to check for changes in the `data` directory every 5 seconds.

### Running the Live Updater

Once the flow is defined, you can use the `FlowLiveUpdater` to start the live update process.

```python
def main():
# Initialize CocoIndex
cocoindex.init()

# Setup the flow
live_update_flow.setup(report_to_stdout=True)

# Start the live updater
with cocoindex.FlowLiveUpdater(live_update_flow, cocoindex.FlowLiveUpdaterOptions(print_stats=True)) as updater:
print("Live updater started. Watching for changes in the 'data' directory.")
updater.wait()

if __name__ == "__main__":
main()
```

The `FlowLiveUpdater` will run in the background, and the `updater.wait()` call will block until the process is stopped.

## Conclusion

Live updates is a powerful feature of CocoIndex that ensures your indexes are always fresh. By using a combination of refresh intervals and source-specific change capture mechanisms, you can build responsive, real-time applications that are always in sync with your data.

For more detailed information on the `FlowLiveUpdater` and other live update options, please refer to the [Run a Flow documentation](https://cocoindex.io/docs/core/flow_methods#live-update).
8 changes: 8 additions & 0 deletions docs/sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,14 @@ const sidebars: SidebarsConfig = {
'getting_started/installation',
],
},
{
type: 'category',
label: 'Tutorials',
collapsed: false,
items: [
'tutorials/live_updates',
],
},
{
type: 'category',
label: 'CocoIndex Core',
Expand Down
1 change: 1 addition & 0 deletions examples/live_updates/.env
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
COCOINDEX_DATABASE_URL=postgres://cocoindex:cocoindex@localhost/cocoindex
58 changes: 58 additions & 0 deletions examples/live_updates/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Applying Live Updates to CocoIndex Flow Example
[![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex)

We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful.

This example demonstrates how to use CocoIndex's live update feature to keep an index synchronized with a local directory.

## How it Works

The `main.py` script defines a CocoIndex flow that:

1. **Sources** data from a local directory named `data`. It uses a `refresh_interval` of 5 seconds to check for changes.
2. **Collects** the `filename` and `content` of each file.
3. **Exports** the collected data to a Postgres database table.

The script then starts a `FlowLiveUpdater`, which runs in the background and continuously monitors the `data` directory for changes.

## Running the Example

1. [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.

2. **Install the dependencies:**

```bash
pip install -e .
```

3. **Run the example:**

You can run the live update example in two ways:

**Option 1: Using the Python script**

This method uses CocoIndex [Library API](https://cocoindex.io/docs/core/flow_methods#library-api-2) to perform live updates.

```bash
python main.py
```

**Option 2: Using the CocoIndex CLI**

This method is useful for managing your indexes from the command line, through CocoIndex [CLI](https://cocoindex.io/docs/core/flow_methods#cli-2).

```bash
cocoindex update main.py -L --setup
```

4. **Test the live updates:**

While the script is running, you can try adding, modifying, or deleting files in the `data` directory. You will see the changes reflected in the logs as CocoIndex updates the index.

## Cleaning Up

To remove the database table created by this example, you can run:

```bash
cocoindex drop main.py
```
21 changes: 21 additions & 0 deletions examples/live_updates/data/bizarre_animals.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
In the spirit of Project Zeta’s innovative chaos, here’s a collection of absurdly true facts about the weirdest animals you’ve never heard of:

1. **Tardigrade (Water Bear)**: This microscopic beast can survive outer space, radiation, and being boiled alive. It once crashed a team meeting by stowing away in Bob’s coffee mug and demanding admin access to the server.

2. **Aye-Aye**: A Madagascar primate with a creepy long finger it uses to tap trees for grubs. It tried to “debug” our codebase by tapping the keyboard, resulting in 47 nested for-loops.

3. **Saiga Antelope**: This goofy-nosed critter looks like it’s auditioning for a sci-fi flick. Its sneezes are so powerful they once blew out the office Wi-Fi during a sprint review.

4. **Glaucus Atlanticus (Blue Dragon Sea Slug)**: This tiny ocean dragon steals venom from jellyfish and uses it like a borrowed superpower. It infiltrated our water cooler and left behind a sparkly, toxic trail.

5. **Pink Fairy Armadillo**: A palm-sized digger that looks like a cotton candy tank. It burrowed into the office carpet, mistaking it for a desert, and now we have a “no armadillos” policy.

6. **Dumbo Octopus**: A deep-sea octopus with ear-like fins, flapping around like it’s late for a Zoom call. It once rewired our projector to display memes of itself across the office.

7. **Jerboa**: A hopping desert rodent with kangaroo vibes. It stole the team’s snacks and leaped over three cubicles before anyone noticed, earning the codename "Snack Bandit."

8. **Mantis Shrimp**: This crustacean sees more colors than our graphic designer and punches harder than a failing CI pipeline. It shattered a monitor when we tried to pair-program with it.

9. **Okapi**: A zebra-giraffe hybrid that looks like a Photoshop error. It wandered into our sprint planning and suggested we pivot to a “forest-themed” microservices architecture.

10. **Blobfish**: The ocean’s saddest-looking blob, voted “Most Likely to Crash a Stand-Up” by the team. Its mere presence caused our morale bot to send 200 crying emojis.
19 changes: 19 additions & 0 deletions examples/live_updates/data/chunk_norris.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Chuck Norris Project Facts
Date: 2025-07-20
Author: Anonymous (because Chuck Norris knows who you are)

Here are some totally true facts about Chuck Norris's involvement in Project Omega:

1. Chuck Norris doesn't write code; he stares at the computer until it writes itself out of fear.
2. The project deadline was yesterday, but time rescheduled itself to accommodate Chuck Norris.
3. Chuck Norris's code never has bugs—just "features" that are too scared to misbehave.
4. When the database crashed, Chuck Norris roundhouse-kicked the server, and it apologized.
5. The team tried to use Agile, but Chuck Norris declared, "I am the only methodology you need."
6. Version control? Chuck Norris is the only version that matters.
7. The project scope expanded because Chuck Norris added "world domination" as a deliverable.
8. When the CI/CD pipeline failed, Chuck Norris rebuilt it with a single grunt.
9. The codebase is 100% documented because no one dares ask Chuck Norris, "What does this do?"
10. Chuck Norris doesn't deploy to production; production deploys to Chuck Norris.

Last updated: 2025-07-20 06:36 AM MST
Note: If you modify this file, Chuck Norris will know... and he’ll find you.
54 changes: 54 additions & 0 deletions examples/live_updates/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import datetime

import cocoindex
from dotenv import load_dotenv


@cocoindex.flow_def(name="LiveUpdates")
def live_update_flow(
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
):
# Source: local files in the 'data' directory
data_scope["documents"] = flow_builder.add_source(
cocoindex.sources.LocalFile(path="data"),
refresh_interval=datetime.timedelta(seconds=5),
)

# Collector
collector = data_scope.add_collector()
with data_scope["documents"].row() as doc:
collector.collect(
filename=doc["filename"],
content=doc["content"],
)

# Target: Postgres database
collector.export(
"documents_index",
cocoindex.targets.Postgres(),
primary_key_fields=["filename"],
)


def main():
# Setup the flow
live_update_flow.setup(report_to_stdout=True)

# Start the live updater
print("Starting live updater...")
with cocoindex.FlowLiveUpdater(
live_update_flow, cocoindex.FlowLiveUpdaterOptions(print_stats=True)
) as updater:
print("Live updater started. Watching for changes in the 'data' directory.")
print("Try adding, modifying, or deleting files in the 'data' directory.")
print("Press Ctrl+C to stop.")
try:
updater.wait()
except KeyboardInterrupt: # handle graceful shutdown
print("Stopping live updater...")


if __name__ == "__main__":
load_dotenv()
cocoindex.init()
main()
12 changes: 12 additions & 0 deletions examples/live_updates/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[project]
name = "live-updates-example"
version = "0.1.0"
description = "Simple example for cocoindex: perform live updates based on local markdown files."
requires-python = ">=3.11"
dependencies = [
"cocoindex>=0.1.70",
"python-dotenv>=1.1.0",
]

[tools.setuptools]
packages = []