You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/core/basics.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -70,15 +70,17 @@ An indexing flow, once set up, maintains a long-lived relationship between data
70
70
on certain pace, according to the update mode:
71
71
72
72
***One time update**: Once triggered, CocoIndex updates the target data to reflect the version of source data up to the current moment.
73
-
***Live update**: CocoIndex continuously watches the source data and updates the target data accordingly.
73
+
***Live update**: CocoIndex continuously reacts to changes of source data and updates the target data accordingly, based on various **change capture mechanisms** for the source.
74
74
75
75
See more details in the [build / update target data](flow_methods#build--update-target-data) section.
76
76
77
-
3. CocoIndex intelligently manages these updates by:
77
+
3. CocoIndex intelligently reprocesses to propagate source changes to target by:
78
+
78
79
* Determining which parts of the target data need to be recomputed
79
80
* Reusing existing computations where possible
80
81
* Only reprocessing the minimum necessary data
81
82
83
+
This is known as **incremental processing**.
82
84
83
85
You can think of an indexing flow similar to formulas in a spreadsheet:
Live update is *eligible* for certain data sources, including:
93
+
A data source may enable one or multiple *change capture mechanisms*:
94
94
95
-
* Data sources configured with a [refresh interval](flow_def#refresh-interval).
96
-
* Data sources provides a **change stream**.
95
+
* Configured with a [refresh interval](flow_def#refresh-interval), which is generally applicable to all data sources.
96
+
* Specific data sources also provide their specific change capture mechanisms.
97
+
See documentations for specific data sources for details.
98
+
99
+
Change capture mechanisms enables CocoIndex to continuously capture changes from the source data and update the target data accordingly, under live update mode.
97
100
98
101
To perform live update, you need to create a `cocoindex.FlowLiveUpdater` object using the `cocoindex.Flow` object.
99
102
It takes an optional `cocoindex.FlowLiveUpdaterOptions` option, with the following fields:
100
103
101
104
*`live_mode` (type: `bool`, default: `True`):
102
-
Whether to perform live update for eligible data sources.
105
+
Whether to perform live update for data sources with change capture mechanisms.
106
+
It has no effect for data sources without any change capture mechanism.
103
107
104
108
*`print_stats` (type: `bool`, default: `False`): Whether to print stats during update.
105
109
106
-
For data sources ineligible for live updates, or when the `live_mode` is `False`,
107
-
the `FlowLiveUpdater`only performs a one-time update, i.e. similar to the one-time update (`update()` method) above,
108
-
under a unified interface.
110
+
Note that `cocoindex.FlowLiveUpdater` provides a unified interface for both one-time update and live update.
111
+
It only performs live update when `live_mode` is `True`, and only for sources with change capture mechanisms enabled.
112
+
If a source has multiple change capture mechanisms enabled, all will take effect to trigger updates.
109
113
110
114
<Tabs>
111
115
<TabItemvalue="python"label="Python"default>
@@ -126,7 +130,7 @@ A `FlowLiveUpdater` object supports the following methods:
126
130
*`wait()` (async): Wait for the updater to finish. It only unblocks in one of the following cases:
127
131
* The updater was aborted.
128
132
* A one time update is done, and live update is not enabled:
129
-
either `live_mode` is `False`, or all data sources are ineligible for live updates.
133
+
either `live_mode` is `False`, or all data sources have no change capture mechanisms enabled.
130
134
*`update_stats()`: It returns the stats of the updater.
Copy file name to clipboardExpand all lines: docs/docs/ops/sources.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,7 +59,7 @@ The spec takes the following fields:
59
59
*`service_account_credential_path` (type: `str`, required): full path to the service account credential file in JSON format.
60
60
*`root_folder_ids` (type: `list[str]`, required): a list of Google Drive folder IDs to import files from.
61
61
*`binary` (type: `bool`, optional): whether reading files as binary (instead of text).
62
-
*`recent_changes_poll_interval` (type: `datetime.timedelta`, optional): when set, this source provides a *change stream* by polling Google Drive for recent modified files periodically.
62
+
*`recent_changes_poll_interval` (type: `datetime.timedelta`, optional): when set, this source provides a *change capture mechanism* by polling Google Drive for recent modified files periodically.
63
63
64
64
:::info
65
65
@@ -70,8 +70,8 @@ The spec takes the following fields:
70
70
On the other hand, this only detects changes for files still exists.
71
71
If the file is deleted (or the current account no longer has access to), this change will not be detected by this change stream.
72
72
73
-
So when a source is configured with a change stream, it's still recommended to set a `refresh_interval`, with a larger value.
74
-
So for most changes can be covered by the change stream (with low latency), and remaining changes (files no longer exist or accessible) will still be covered (with a higher latency).
73
+
So when a `GoogleDrive` source enabled `recent_changes_poll_interval`, it's still recommended to set a `refresh_interval`, with a larger value.
74
+
So that most changes can be covered by polling recent changes (with low latency), and remaining changes (files no longer exist or accessible) will still be covered (with a higher latency).
0 commit comments