Skip to content

Commit 0165dde

Browse files
authored
example: add the tutorial of performing live updates (#781)
* example: add the tutorial of performing live updates * fix: improve live updater control flow with error handling * docs: add live updates guide * docs(tutorials): move live-updates guide into a new category * docs: add example for setting up live updates with refresh interval
1 parent 7bc6a70 commit 0165dde

File tree

8 files changed

+330
-0
lines changed

8 files changed

+330
-0
lines changed
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
---
2+
title: Live Updates
3+
description: "Keep your indexes up-to-date with live updates in CocoIndex."
4+
---
5+
6+
# Live Updates
7+
8+
CocoIndex is designed to keep your indexes synchronized with your data sources. This is achieved through a feature called **live updates**, which automatically detects changes in your sources and updates your indexes accordingly. This ensures that your search results and data analysis are always based on the most current information.
9+
10+
## How Live Updates Work
11+
12+
Live updates in CocoIndex can be triggered in two main ways:
13+
14+
1. **Refresh Interval:** You can configure a `refresh_interval` for any data source. CocoIndex will then periodically check the source for any new, updated, or deleted data. This is a simple and effective way to keep your index fresh, especially for sources that don't have a built-in change notification system.
15+
16+
2. **Change Capture Mechanisms:** Some data sources offer more sophisticated ways to track changes. For example:
17+
* **Amazon S3:** You can configure an SQS queue to receive notifications whenever a file is added, modified, or deleted in your S3 bucket. CocoIndex can listen to this queue and trigger an update instantly.
18+
* **Google Drive:** The Google Drive source can be configured to poll for recent changes, which is more efficient than a full refresh.
19+
20+
When a change is detected, CocoIndex performs an **incremental update**. This means it only re-processes the data that has been affected by the change, without having to re-index your entire dataset. This makes the update process fast and efficient.
21+
22+
Here's an example of how to set up a source with a `refresh_interval`:
23+
24+
```python
25+
@cocoindex.flow_def(name="LiveUpdateExample")
26+
def live_update_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
27+
# Source: local files in the 'data' directory
28+
data_scope["documents"] = flow_builder.add_source(
29+
cocoindex.sources.LocalFile(path="data"),
30+
refresh_interval=cocoindex.timedelta(seconds=5),
31+
)
32+
# ...
33+
```
34+
35+
By setting `refresh_interval` to 5 seconds, we're telling CocoIndex to check for changes in the `data` directory every 5 seconds.
36+
37+
## Implementing Live Updates
38+
39+
You can enable live updates using either the CocoIndex CLI or the Python library.
40+
41+
### Using the CLI
42+
43+
To start a live update process from the command line, use the `update` command with the `-L` or `--live` flag:
44+
45+
```bash
46+
cocoindex update -L your_flow_definition_file.py
47+
```
48+
49+
This will start a long-running process that continuously monitors your data sources for changes and updates your indexes in real-time. You can stop the process by pressing `Ctrl+C`.
50+
51+
### Using the Python Library
52+
53+
For more control over the live update process, you can use the `FlowLiveUpdater` class in your Python code. This is particularly useful when you want to integrate CocoIndex into a larger application.
54+
55+
The `FlowLiveUpdater` can be used as a context manager, which automatically starts the updater when you enter the `with` block and stops it when you exit. The `wait()` method will block until the updater is aborted (e.g., by pressing `Ctrl+C`).
56+
57+
Here's how you can use `FlowLiveUpdater` to start and manage a live update process:
58+
59+
```python
60+
import cocoindex
61+
62+
# Create a FlowLiveUpdater instance
63+
with cocoindex.FlowLiveUpdater(live_update_flow, cocoindex.FlowLiveUpdaterOptions(print_stats=True)) as updater:
64+
print("Live updater started. Press Ctrl+C to stop.")
65+
# The updater runs in the background.
66+
# The wait() method blocks until the updater is stopped.
67+
updater.wait()
68+
69+
print("Live updater stopped.")
70+
```
71+
72+
#### Getting Status Updates
73+
74+
You can also get status updates from the `FlowLiveUpdater` to monitor the update process. The `next_status_updates()` method blocks until there is a new status update.
75+
76+
```python
77+
import cocoindex
78+
79+
updater = cocoindex.FlowLiveUpdater(live_update_flow)
80+
updater.start()
81+
82+
while True:
83+
updates = updater.next_status_updates()
84+
85+
if not updates.active_sources:
86+
print("All sources have finished processing.")
87+
break
88+
89+
for source_name in updates.updated_sources:
90+
print(f"Source '{source_name}' has been updated.")
91+
92+
updater.wait()
93+
```
94+
95+
This allows you to react to updates in your application, for example, by notifying users or triggering downstream processes.
96+
97+
## Example
98+
99+
Let's walk through an example of how to set up a live update flow. For the complete, runnable code, see the [live updates example](https://github.com/cocoindex-io/cocoindex/tree/main/examples/live_updates) in the CocoIndex repository.
100+
101+
### 1. Setting up the Source
102+
103+
The first step is to define a source and configure a `refresh_interval`. In this example, we'll use a `LocalFile` source to monitor a directory named `data`.
104+
105+
```python
106+
@cocoindex.flow_def(name="LiveUpdateExample")
107+
def live_update_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
108+
# Source: local files in the 'data' directory
109+
data_scope["documents"] = flow_builder.add_source(
110+
cocoindex.sources.LocalFile(path="data"),
111+
refresh_interval=cocoindex.timedelta(seconds=5),
112+
)
113+
114+
# Collector
115+
collector = data_scope.add_collector()
116+
with data_scope["documents"].row() as doc:
117+
collector.collect(filename=doc["filename"], content=doc["content"])
118+
119+
# Target: Postgres database
120+
collector.export(
121+
"documents_index",
122+
cocoindex.targets.Postgres(),
123+
primary_key_fields=["filename"]
124+
)
125+
```
126+
127+
By setting `refresh_interval` to 5 seconds, we're telling CocoIndex to check for changes in the `data` directory every 5 seconds.
128+
129+
### 2. Running the Live Updater
130+
131+
Once the flow is defined, you can use the `FlowLiveUpdater` to start the live update process.
132+
133+
```python
134+
def main():
135+
# Initialize CocoIndex
136+
cocoindex.init()
137+
138+
# Setup the flow
139+
live_update_flow.setup(report_to_stdout=True)
140+
141+
# Start the live updater
142+
with cocoindex.FlowLiveUpdater(live_update_flow, cocoindex.FlowLiveUpdaterOptions(print_stats=True)) as updater:
143+
print("Live updater started. Watching for changes in the 'data' directory.")
144+
updater.wait()
145+
146+
if __name__ == "__main__":
147+
main()
148+
```
149+
150+
The `FlowLiveUpdater` will run in the background, and the `updater.wait()` call will block until the process is stopped.
151+
152+
## Conclusion
153+
154+
Live updates is a powerful feature of CocoIndex that ensures your indexes are always fresh. By using a combination of refresh intervals and source-specific change capture mechanisms, you can build responsive, real-time applications that are always in sync with your data.
155+
156+
For more detailed information on the `FlowLiveUpdater` and other live update options, please refer to the [Run a Flow documentation](https://cocoindex.io/docs/core/flow_methods#live-update).

docs/sidebars.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,14 @@ const sidebars: SidebarsConfig = {
1212
'getting_started/installation',
1313
],
1414
},
15+
{
16+
type: 'category',
17+
label: 'Tutorials',
18+
collapsed: false,
19+
items: [
20+
'tutorials/live_updates',
21+
],
22+
},
1523
{
1624
type: 'category',
1725
label: 'CocoIndex Core',

examples/live_updates/.env

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
COCOINDEX_DATABASE_URL=postgres://cocoindex:cocoindex@localhost/cocoindex

examples/live_updates/README.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Applying Live Updates to CocoIndex Flow Example
2+
[![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex)
3+
4+
We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful.
5+
6+
This example demonstrates how to use CocoIndex's live update feature to keep an index synchronized with a local directory.
7+
8+
## How it Works
9+
10+
The `main.py` script defines a CocoIndex flow that:
11+
12+
1. **Sources** data from a local directory named `data`. It uses a `refresh_interval` of 5 seconds to check for changes.
13+
2. **Collects** the `filename` and `content` of each file.
14+
3. **Exports** the collected data to a Postgres database table.
15+
16+
The script then starts a `FlowLiveUpdater`, which runs in the background and continuously monitors the `data` directory for changes.
17+
18+
## Running the Example
19+
20+
1. [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
21+
22+
2. **Install the dependencies:**
23+
24+
```bash
25+
pip install -e .
26+
```
27+
28+
3. **Run the example:**
29+
30+
You can run the live update example in two ways:
31+
32+
**Option 1: Using the Python script**
33+
34+
This method uses CocoIndex [Library API](https://cocoindex.io/docs/core/flow_methods#library-api-2) to perform live updates.
35+
36+
```bash
37+
python main.py
38+
```
39+
40+
**Option 2: Using the CocoIndex CLI**
41+
42+
This method is useful for managing your indexes from the command line, through CocoIndex [CLI](https://cocoindex.io/docs/core/flow_methods#cli-2).
43+
44+
```bash
45+
cocoindex update main.py -L --setup
46+
```
47+
48+
4. **Test the live updates:**
49+
50+
While the script is running, you can try adding, modifying, or deleting files in the `data` directory. You will see the changes reflected in the logs as CocoIndex updates the index.
51+
52+
## Cleaning Up
53+
54+
To remove the database table created by this example, you can run:
55+
56+
```bash
57+
cocoindex drop main.py
58+
```
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
In the spirit of Project Zeta’s innovative chaos, here’s a collection of absurdly true facts about the weirdest animals you’ve never heard of:
2+
3+
1. **Tardigrade (Water Bear)**: This microscopic beast can survive outer space, radiation, and being boiled alive. It once crashed a team meeting by stowing away in Bob’s coffee mug and demanding admin access to the server.
4+
5+
2. **Aye-Aye**: A Madagascar primate with a creepy long finger it uses to tap trees for grubs. It tried to “debug” our codebase by tapping the keyboard, resulting in 47 nested for-loops.
6+
7+
3. **Saiga Antelope**: This goofy-nosed critter looks like it’s auditioning for a sci-fi flick. Its sneezes are so powerful they once blew out the office Wi-Fi during a sprint review.
8+
9+
4. **Glaucus Atlanticus (Blue Dragon Sea Slug)**: This tiny ocean dragon steals venom from jellyfish and uses it like a borrowed superpower. It infiltrated our water cooler and left behind a sparkly, toxic trail.
10+
11+
5. **Pink Fairy Armadillo**: A palm-sized digger that looks like a cotton candy tank. It burrowed into the office carpet, mistaking it for a desert, and now we have a “no armadillos” policy.
12+
13+
6. **Dumbo Octopus**: A deep-sea octopus with ear-like fins, flapping around like it’s late for a Zoom call. It once rewired our projector to display memes of itself across the office.
14+
15+
7. **Jerboa**: A hopping desert rodent with kangaroo vibes. It stole the team’s snacks and leaped over three cubicles before anyone noticed, earning the codename "Snack Bandit."
16+
17+
8. **Mantis Shrimp**: This crustacean sees more colors than our graphic designer and punches harder than a failing CI pipeline. It shattered a monitor when we tried to pair-program with it.
18+
19+
9. **Okapi**: A zebra-giraffe hybrid that looks like a Photoshop error. It wandered into our sprint planning and suggested we pivot to a “forest-themed” microservices architecture.
20+
21+
10. **Blobfish**: The ocean’s saddest-looking blob, voted “Most Likely to Crash a Stand-Up” by the team. Its mere presence caused our morale bot to send 200 crying emojis.
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Chuck Norris Project Facts
2+
Date: 2025-07-20
3+
Author: Anonymous (because Chuck Norris knows who you are)
4+
5+
Here are some totally true facts about Chuck Norris's involvement in Project Omega:
6+
7+
1. Chuck Norris doesn't write code; he stares at the computer until it writes itself out of fear.
8+
2. The project deadline was yesterday, but time rescheduled itself to accommodate Chuck Norris.
9+
3. Chuck Norris's code never has bugs—just "features" that are too scared to misbehave.
10+
4. When the database crashed, Chuck Norris roundhouse-kicked the server, and it apologized.
11+
5. The team tried to use Agile, but Chuck Norris declared, "I am the only methodology you need."
12+
6. Version control? Chuck Norris is the only version that matters.
13+
7. The project scope expanded because Chuck Norris added "world domination" as a deliverable.
14+
8. When the CI/CD pipeline failed, Chuck Norris rebuilt it with a single grunt.
15+
9. The codebase is 100% documented because no one dares ask Chuck Norris, "What does this do?"
16+
10. Chuck Norris doesn't deploy to production; production deploys to Chuck Norris.
17+
18+
Last updated: 2025-07-20 06:36 AM MST
19+
Note: If you modify this file, Chuck Norris will know... and he’ll find you.

examples/live_updates/main.py

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
import datetime
2+
3+
import cocoindex
4+
from dotenv import load_dotenv
5+
6+
7+
# Define the flow
8+
@cocoindex.flow_def(name="LiveUpdates")
9+
def live_update_flow(
10+
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
11+
) -> None:
12+
# Source: local files in the 'data' directory
13+
data_scope["documents"] = flow_builder.add_source(
14+
cocoindex.sources.LocalFile(path="data"),
15+
refresh_interval=datetime.timedelta(seconds=5),
16+
)
17+
18+
# Collector
19+
collector = data_scope.add_collector()
20+
with data_scope["documents"].row() as doc:
21+
collector.collect(
22+
filename=doc["filename"],
23+
content=doc["content"],
24+
)
25+
26+
# Target: Postgres database
27+
collector.export(
28+
"documents_index",
29+
cocoindex.targets.Postgres(),
30+
primary_key_fields=["filename"],
31+
)
32+
33+
34+
def main() -> None:
35+
# Setup the flow
36+
live_update_flow.setup(report_to_stdout=True)
37+
38+
# Start the live updater
39+
print("Starting live updater...")
40+
with cocoindex.FlowLiveUpdater(
41+
live_update_flow, cocoindex.FlowLiveUpdaterOptions(print_stats=True)
42+
) as updater:
43+
print("Live updater started. Watching for changes in the 'data' directory.")
44+
print("Try adding, modifying, or deleting files in the 'data' directory.")
45+
print("Press Ctrl+C to stop.")
46+
try:
47+
updater.wait()
48+
except KeyboardInterrupt: # handle graceful shutdown
49+
print("Stopping live updater...")
50+
51+
52+
if __name__ == "__main__":
53+
load_dotenv()
54+
cocoindex.init()
55+
main()
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
[project]
2+
name = "live-updates-example"
3+
version = "0.1.0"
4+
description = "Simple example for cocoindex: perform live updates based on local markdown files."
5+
requires-python = ">=3.11"
6+
dependencies = [
7+
"cocoindex>=0.1.70",
8+
"python-dotenv>=1.1.0",
9+
]
10+
11+
[tools.setuptools]
12+
packages = []

0 commit comments

Comments
 (0)