cocoindex-io · badmonster0 · Apr 3, 2025 · Apr 3, 2025
diff --git a/docs/docs/core/basics.md b/docs/docs/core/basics.md
@@ -1,6 +1,6 @@
 ---
 title: Basics
-description: CocoIndex Basics
+description: "CocoIndex basic concepts: indexing flow, data, operations, data updates, etc."
 ---
 
 # CocoIndex Basics
@@ -9,7 +9,7 @@ An **index** is a collection of data stored in a way that is easy for retrieval.
 
 CocoIndex is an ETL framework for building indexes from specified data sources, a.k.a. indexing. It also offers utilities for users to retrieve data from the indexes.
 
-## Indexing Flow
+## Indexing flow
 
 An indexing flow extracts data from speicfied data sources, upon specified transformations, and puts the transformed data into specified storage for later retrieval.
 
@@ -36,7 +36,7 @@ An **operation** in an indexing flow defines a step in the flow. An operation is
 *   **Action**, which defines the behavior of the operation, e.g. *import*, *transform*, *for each*, *collect* and *export*.
     See [Flow Definition](flow_def) for more details for each action.
 
-*   Some actions (i.e. "import", "transform" and "export") require an **Operation Spec**, which describes the specific behavior of the operation, e.g. a source to import from, a function describing the transformation behavior, a storage to export to as an index.
+*   Some actions (i.e. "import", "transform" and "export") require an **Operation Spec**, which describes the specific behavior of the operation, e.g. a source to import from, a function describing the transformation behavior, a target storage to export to (as an index).
     *   Each operation spec has a **operation type**, e.g. `LocalFile` (data source), `SplitRecursively` (function), `SentenceTransformerEmbed` (function), `Postgres` (storage).
     *   CocoIndex framework maintains a set of supported operation types. Users can also implement their own.
 
@@ -60,31 +60,40 @@ This shows schema and example data for the indexing flow:
 
 ![Data Example](data_example.svg)
 
-### Life Cycle of an Indexing Flow
+### Life cycle of an indexing flow
 
-An indexing flow, once set up, maintains a long-lived relationship between source data and indexes. This means:
+An indexing flow, once set up, maintains a long-lived relationship between data source and data in target storage. This means:
+
+1.  The target storage created by the flow remain available for querying at any time
+
+2.  As source data changes (new data added, existing data updated or deleted), data in the target storage are updated to reflect those changes,
+    on certain pace, according to the update mode:
+
+    *   **One time update**: Once triggered, CocoIndex updates the target data to reflect the version of source data up to the current moment.
+    *   **Live update**: CocoIndex continuously watches the source data and updates the target data accordingly.
+
+    See more details in the [build / update target data](flow_methods#build--update-target-data) section.
+
+3.  CocoIndex intelligently manages these updates by:
+    *   Determining which parts of the target data need to be recomputed
+    *   Reusing existing computations where possible
+    *   Only reprocessing the minimum necessary data
 
-1. The indexes created by the flow remain available for querying at any time
-2. When source data changes, the indexes are automatically updated to reflect those changes
-3. CocoIndex intelligently manages these updates by:
-   - Determining which parts of the index need to be recomputed
-   - Reusing existing computations where possible
-   - Only reprocessing the minimum necessary data
 
 You can think of an indexing flow similar to formulas in a spreadsheet:
 
-- In a spreadsheet, you define formulas that transform input cells into output cells
-- When input values change, the spreadsheet automatically recalculates affected outputs
-- You focus on defining the transformation logic, not managing updates
+*   In a spreadsheet, you define formulas that transform input cells into output cells
+*   When input values change, the spreadsheet recalculates affected outputs
+*   You focus on defining the transformation logic, not managing updates
 
 CocoIndex works the same way, but with more powerful capabilities:
 
-- Instead of flat tables, CocoIndex models data in nested data structures, making it more natural to model complex data
-- Instead of simple cell-level formulas, you have operations like "for each" to apply the same formula across rows without repeating yourself
+* Instead of flat tables, CocoIndex models data in nested data structures, making it more natural to model complex data
+* Instead of simple cell-level formulas, you have operations like "for each" to apply the same formula across rows without repeating yourself
 
-This means when writing your flow operations, you can treat source data as if it were static - focusing purely on defining the transformation logic. CocoIndex takes care of maintaining the dynamic relationship between sources and indexes behind the scenes.
+This means when writing your flow operations, you can treat source data as if it were static - focusing purely on defining the transformation logic. CocoIndex takes care of maintaining the dynamic relationship between sources and target data behind the scenes.
 
-### Internal Storage
+### Internal storage
 
 As an indexing flow is long-lived, it needs to store intermediate data to keep track of the states.
 CocoIndex uses internal storage for this purpose.
@@ -94,9 +103,9 @@ See [Initialization](initialization) for configuring its location, and `cocoinde
 
 ## Retrieval
 
-There are two ways to retrieve data from indexes built by an indexing flow:
+There are two ways to retrieve data from target storage built by an indexing flow:
 
-*   Query the underlying index storage directly for maximum flexibility.
-*   Use CocoIndex *query handlers* for a more convenient experience with built-in tooling support (e.g. CocoInsight) to understand query performance against the index.
+*   Query the underlying target storage directly for maximum flexibility.
+*   Use CocoIndex *query handlers* for a more convenient experience with built-in tooling support (e.g. CocoInsight) to understand query performance against the target data.
 
-Query handlers are tied to specific indexing flows. They accept query inputs, transform them by defined operations, and retrieve matching data from the index storage that was created by the flow.
+Query handlers are tied to specific indexing flows. They accept query inputs, transform them by defined operations, and retrieve matching data from the target storage that was created by the flow.
diff --git a/docs/docs/core/flow_def.mdx b/docs/docs/core/flow_def.mdx
@@ -1,14 +1,15 @@
 ---
 title: Flow Definition
-description: CocoIndex Flow Definition
+description: Define a CocoIndex flow, by specifying source, transformations and storages, and connect input/output data of them.
 ---
 
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
 
 # CocoIndex Flow Definition
 
-In CocoIndex, to define an indexing flow, you provide a function to construct the flow, by adding operations and connecting them with fields.
+In CocoIndex, to define an indexing flow, you provide a function to import source, transform data and put them into target storage (sinks).
+You connect input/output of these operations with fields of data scopes.
 
 ## Entry Point
 
@@ -43,7 +44,7 @@ demo_flow = cocoindex.flow.add_flow_def("DemoFlow", demo_flow_def)
 ```
 
 In both cases, `demo_flow` will be an object with `cocoindex.Flow` class type.
-See [Flow Methods](/docs/core/flow_methods) for more details on it.
+See [Flow Running](/docs/core/flow_methods) for more details on it.
 
 </TabItem>
 </Tabs>
@@ -52,7 +53,7 @@ See [Flow Methods](/docs/core/flow_methods) for more details on it.
 
 The `FlowBuilder` object is the starting point to construct a flow.
 
-### Import From Source
+### Import from source
 
 `FlowBuilder` provides a `add_source()` method to import data from external sources.
 A *source spec* needs to be provided for any import operation, to describe the source and parameters related to the source.
@@ -64,7 +65,7 @@ Import must happen at the top level, and the field created by import must be in
 ```python
 @cocoindex.flow_def(name="DemoFlow")
 def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
-  data_scope["documents"] = flow_builder.add_source(DemoSourceSpec(...))
+    data_scope["documents"] = flow_builder.add_source(DemoSourceSpec(...))
   ......
 ```
 
@@ -74,17 +75,56 @@ def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataSco
 `add_source()` returns a `DataSlice`. Once external data sources are imported, you can further transform them using methods exposed by these data objects, as discussed in the following sections.
 
 We'll describe different data objects in next few sections.
-Note that the actual value of data is not available at the time when we define the flow: it's only available at runtime.
+
+:::note
+
+The actual value of data is not available at the time when we define the flow: it's only available at runtime.
 In a flow definition, you can use a data representation as input for operations, but not access the actual value.
 
+:::
+
+#### Refresh interval
+
+You can provide a `refresh_interval` argument.
+When present, in the [live update mode](/docs/core/flow_methods#live-update), the data source will be refreshed by specified interval.
+
+<Tabs>
+<TabItem value="python" label="Python" default>
+
+The `refresh_interval` argument is of type `datetime.timedelta`. For example, this refreshes the data source every 1 minute:
+
+```python
+@cocoindex.flow_def(name="DemoFlow")
+def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
+    data_scope["documents"] = flow_builder.add_source(
+        DemoSourceSpec(...), refresh_interval=datetime.timedelta(minutes=1))
+    ......
+```
+
+</TabItem>
+</Tabs>
+
+:::info
+
+In live update mode, for each refresh, CocoIndex will traverse the data source to figure out the changes,
+and only perform transformations on changed source keys.
+
+:::
+
 ## Data Scope
 
 A **data scope** represents data for a certain unit, e.g. the top level scope (involving all data for a flow), for a document, or for a chunk.
 A data scope has a bunch of fields and collectors, and users can add new fields and collectors to it.
 
 ### Get or Add a Field
 
-Get or add a field of a data scope (which is a data slice). Note that you cannot override an existing field.
+You can get or add a field of a data scope (which is a data slice). 
+
+:::note
+
+You cannot override an existing field.
+
+:::
 
 <Tabs>
 <TabItem value="python" label="Python" default>
@@ -95,20 +135,20 @@ Getting and setting a field of a data scope is done by the `[]` operator with a
 @cocoindex.flow_def(name="DemoFlow")
 def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
 
-  # Add "documents" to the top-level data scope.
-  data_scope["documents"] = flow_builder.add_source(DemoSourceSpec(...))
+    # Add "documents" to the top-level data scope.
+    data_scope["documents"] = flow_builder.add_source(DemoSourceSpec(...))
 
-  # Each row of "documents" is a child scope.
-  with data_scope["documents"].row() as document:
+    # Each row of "documents" is a child scope.
+    with data_scope["documents"].row() as document:
 
-    # Get "content" from the document scope, transform, and add "summary" to scope.
-    document["summary"] = field1_row["content"].transform(DemoFunctionSpec(...))
+        # Get "content" from the document scope, transform, and add "summary" to scope.
+        document["summary"] = field1_row["content"].transform(DemoFunctionSpec(...))
 ```
 
 </TabItem>
 </Tabs>
 
-### Add a Collector
+### Add a collector
 
 See [Data Collector](#data-collector) below for more details.
 
@@ -132,17 +172,17 @@ Other arguments can be passed in as positional arguments or keyword arguments, a
 ```python
 @cocoindex.flow_def(name="DemoFlow")
 def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
-  ...
-  data_scope["field2"] = data_scope["field1"].transform(
-                             DemoFunctionSpec(...),
-                             arg1, arg2, ..., key0=kwarg0, key1=kwarg1, ...)
-  ...
+    ...
+    data_scope["field2"] = data_scope["field1"].transform(
+                               DemoFunctionSpec(...),
+                               arg1, arg2, ..., key0=kwarg0, key1=kwarg1, ...)
+    ...
 ```
 
 </TabItem>
 </Tabs>
 
-### For Each Row
+### For each row
 
 If the data slice has `Table` type, you can call `row()` method to obtain a child scope representing each row, to apply operations on each row.
 
@@ -161,7 +201,7 @@ def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataSco
 </TabItem>
 </Tabs>
 
-### Get a Sub Field
+### Get a sub field
 
 If the data slice has `Struct` type, you can obtain a data slice on a specific sub field of it, similar to getting a field of a data scope.
 
@@ -192,14 +232,14 @@ For example,
 ```python
 @cocoindex.flow_def(name="DemoFlow")
 def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
-  ...
-  demo_collector = data_scope.add_collector()
-  with data_scope["documents"].row() as document:
     ...
-    demo_collector.collect(id=cocoindex.GeneratedField.UUID,
-                           filename=document["filename"],
-                           summary=document["summary"])
-  ...
+    demo_collector = data_scope.add_collector()
+    with data_scope["documents"].row() as document:
+        ...
+        demo_collector.collect(id=cocoindex.GeneratedField.UUID,
+                               filename=document["filename"],
+                               summary=document["summary"])
+    ...
 ```
 
 </TabItem>
@@ -228,13 +268,13 @@ Export must happen at the top level of a flow, i.e. not within any child scopes
 ```python
 @cocoindex.flow_def(name="DemoFlow")
 def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
-  ...
-  demo_collector = data_scope.add_collector()
-  ...
-  demo_collector.export(
-      "demo_storage", DemoStorageSpec(...),
-      primary_key_fields=["field1"],
-      vector_index=[("field2", cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])
+    ...
+    demo_collector = data_scope.add_collector()
+    ...
+    demo_collector.export(
+        "demo_storage", DemoStorageSpec(...),
+        primary_key_fields=["field1"],
+        vector_index=[("field2", cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])
 ```
 
 </TabItem>