|
| 1 | +--- |
| 2 | +title: Live Updates |
| 3 | +description: "Keep your indexes up-to-date with live updates in CocoIndex." |
| 4 | +--- |
| 5 | + |
| 6 | +# Live Updates |
| 7 | + |
| 8 | +CocoIndex is designed to keep your indexes synchronized with your data sources. This is achieved through a feature called **live updates**, which automatically detects changes in your sources and updates your indexes accordingly. This ensures that your search results and data analysis are always based on the most current information. |
| 9 | + |
| 10 | +## How Live Updates Work |
| 11 | + |
| 12 | +Live updates in CocoIndex can be triggered in two main ways: |
| 13 | + |
| 14 | +1. **Refresh Interval:** You can configure a `refresh_interval` for any data source. CocoIndex will then periodically check the source for any new, updated, or deleted data. This is a simple and effective way to keep your index fresh, especially for sources that don't have a built-in change notification system. |
| 15 | + |
| 16 | +2. **Change Capture Mechanisms:** Some data sources offer more sophisticated ways to track changes. For example: |
| 17 | + * **Amazon S3:** You can configure an SQS queue to receive notifications whenever a file is added, modified, or deleted in your S3 bucket. CocoIndex can listen to this queue and trigger an update instantly. |
| 18 | + * **Google Drive:** The Google Drive source can be configured to poll for recent changes, which is more efficient than a full refresh. |
| 19 | + |
| 20 | +When a change is detected, CocoIndex performs an **incremental update**. This means it only re-processes the data that has been affected by the change, without having to re-index your entire dataset. This makes the update process fast and efficient. |
| 21 | + |
| 22 | +## Implementing Live Updates |
| 23 | + |
| 24 | +You can enable live updates using either the CocoIndex CLI or the Python library. |
| 25 | + |
| 26 | +### Using the CLI |
| 27 | + |
| 28 | +To start a live update process from the command line, use the `update` command with the `-L` or `--live` flag: |
| 29 | + |
| 30 | +```bash |
| 31 | +cocoindex update -L your_flow_definition_file.py |
| 32 | +``` |
| 33 | + |
| 34 | +This will start a long-running process that continuously monitors your data sources for changes and updates your indexes in real-time. You can stop the process by pressing `Ctrl+C`. |
| 35 | + |
| 36 | +### Using the Python Library |
| 37 | + |
| 38 | +For more control over the live update process, you can use the `FlowLiveUpdater` class in your Python code. This is particularly useful when you want to integrate CocoIndex into a larger application. |
| 39 | + |
| 40 | +The `FlowLiveUpdater` can be used as a context manager, which automatically starts the updater when you enter the `with` block and stops it when you exit. The `wait()` method will block until the updater is aborted (e.g., by pressing `Ctrl+C`). |
| 41 | + |
| 42 | +Here's how you can use `FlowLiveUpdater` to start and manage a live update process: |
| 43 | + |
| 44 | +```python |
| 45 | +import cocoindex |
| 46 | + |
| 47 | +# Assume you have a flow defined as 'my_flow' |
| 48 | +# from my_flows import my_flow |
| 49 | + |
| 50 | +# Create a FlowLiveUpdater instance |
| 51 | +with cocoindex.FlowLiveUpdater(my_flow, cocoindex.FlowLiveUpdaterOptions(print_stats=True)) as updater: |
| 52 | + print("Live updater started. Press Ctrl+C to stop.") |
| 53 | + # The updater runs in the background. |
| 54 | + # The wait() method blocks until the updater is stopped. |
| 55 | + updater.wait() |
| 56 | + |
| 57 | +print("Live updater stopped.") |
| 58 | +``` |
| 59 | + |
| 60 | +#### Getting Status Updates |
| 61 | + |
| 62 | +You can also get status updates from the `FlowLiveUpdater` to monitor the update process. The `next_status_updates()` method blocks until there is a new status update. |
| 63 | + |
| 64 | +```python |
| 65 | +import cocoindex |
| 66 | + |
| 67 | +# Assume you have a flow defined as 'my_flow' |
| 68 | +# from my_flows import my_flow |
| 69 | + |
| 70 | +updater = cocoindex.FlowLiveUpdater(my_flow) |
| 71 | +updater.start() |
| 72 | + |
| 73 | +while True: |
| 74 | + updates = updater.next_status_updates() |
| 75 | + |
| 76 | + if not updates.active_sources: |
| 77 | + print("All sources have finished processing.") |
| 78 | + break |
| 79 | + |
| 80 | + for source_name in updates.updated_sources: |
| 81 | + print(f"Source '{source_name}' has been updated.") |
| 82 | + |
| 83 | +updater.wait() |
| 84 | +``` |
| 85 | + |
| 86 | +This allows you to react to updates in your application, for example, by notifying users or triggering downstream processes. |
| 87 | + |
| 88 | +## Example |
| 89 | + |
| 90 | +For a complete, runnable example of how to use live updates, see the [live updates example](https://github.com/cocoindex-io/cocoindex/tree/main/examples/live_updates) in the CocoIndex repository. |
| 91 | + |
| 92 | +## Conclusion |
| 93 | + |
| 94 | +Live updates are a powerful feature of CocoIndex that ensures your indexes are always fresh. By using a combination of refresh intervals and source-specific change capture mechanisms, you can build responsive, real-time applications that are always in sync with your data. |
0 commit comments