cocoindex-io
diff --git a/‎Cargo.toml‎
Lines changed: 1 addition & 1 deletion b/‎Cargo.toml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 2 additions & 1 deletion b/‎README.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/docs/core/basics.md‎
Lines changed: 10 additions & 19 deletions b/‎docs/docs/core/basics.md‎
Lines changed: 10 additions & 19 deletions
diff --git a/‎docs/docs/core/cli.mdx‎
Lines changed: 2 additions & 2 deletions b/‎docs/docs/core/cli.mdx‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/docs/core/flow_def.mdx‎
Lines changed: 34 additions & 2 deletions b/‎docs/docs/core/flow_def.mdx‎
Lines changed: 34 additions & 2 deletions
diff --git a/‎docs/docs/core/flow_methods.mdx‎
Lines changed: 1 addition & 1 deletion b/‎docs/docs/core/flow_methods.mdx‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/docs/core/initialization.mdx‎
Lines changed: 13 additions & 0 deletions b/‎docs/docs/core/initialization.mdx‎
Lines changed: 13 additions & 0 deletions
@@ -3,7 +3,7 @@ name = "cocoindex"
 # Version used for local development is always higher than others to take precedence.
 # Will be overridden for specific release versions.
 version = "999.0.0"
-edition = "2021"
+edition = "2024"
 
 [profile.release]
 codegen-units = 1
 
@@ -132,11 +132,12 @@ It defines an index flow like this:
 | [Code Embedding](examples/code_embedding) | Index code embeddings for semantic search |
 | [PDF Embedding](examples/pdf_embedding) | Parse PDF and index text embeddings for semantic search |
 | [Manuals LLM Extraction](examples/manuals_llm_extraction) | Extract structured information from a manual using LLM |
+| [Amazon S3 Embedding](examples/amazon_s3_embedding) | Index text documents from Amazon S3 |
 | [Google Drive Text Embedding](examples/gdrive_text_embedding) | Index text documents from Google Drive |
 | [Docs to Knowledge Graph](examples/docs_to_knowledge_graph) | Extract relationships from Markdown documents and build a knowledge graph |
 | [Embeddings to Qdrant](examples/text_embedding_qdrant) | Index documents in a Qdrant collection for semantic search |
 | [FastAPI Server with Docker](examples/fastapi_server_docker) | Run the semantic search server in a Dockerized FastAPI setup | 
-| [Product_Taxonomy_Knowledge_Graph](examples/product_taxonomy_knowledge_graph) | Build knowledge graph for product recommendations | 
+| [Product Recommendation](examples/product_recommendation) | Build real-time product recommendations with LLM and graph database| 
 | [Image Search with Vision API](examples/image_search_example) | Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend|
 
 More coming and stay tuned 👀!
 
@@ -1,17 +1,17 @@
 ---
-title: Basics
-description: "CocoIndex basic concepts: indexing flow, data, operations, data updates, etc."
+title: Indexing Basics
+description: "CocoIndex basic concepts for indexing: indexing flow, data, operations, data updates, etc."
 ---
 
-# CocoIndex Basics
+# CocoIndex Indexing Basics
 
 An **index** is a collection of data stored in a way that is easy for retrieval.
 
-CocoIndex is an ETL framework for building indexes from specified data sources, a.k.a. indexing. It also offers utilities for users to retrieve data from the indexes.
+CocoIndex is an ETL framework for building indexes from specified data sources, a.k.a. **indexing**. It also offers utilities for users to retrieve data from the indexes.
 
-## Indexing flow
+An **indexing flow** extracts data from specified data sources, upon specified transformations, and puts the transformed data into specified storage for later retrieval.
 
-An indexing flow extracts data from specified data sources, upon specified transformations, and puts the transformed data into specified storage for later retrieval.
+## Indexing flow elements
 
 An indexing flow has two aspects: data and operations on data.
 
@@ -42,7 +42,7 @@ An **operation** in an indexing flow defines a step in the flow. An operation is
 
 "import" and "transform" operations produce output data, whose data type is determined based on the operation spec and data types of input data (for "transform" operation only).
 
-### Example
+## An indexing flow example
 
 For the example shown in the [Quickstart](../getting_started/quickstart) section, the indexing flow is as follows:
 
@@ -60,7 +60,7 @@ This shows schema and example data for the indexing flow:
 
 ![Data Example](data_example.svg)
 
-### Life cycle of an indexing flow
+## Life cycle of an indexing flow
 
 An indexing flow, once set up, maintains a long-lived relationship between data source and data in target storage. This means:
 
@@ -95,19 +95,10 @@ CocoIndex works the same way, but with more powerful capabilities:
 
 This means when writing your flow operations, you can treat source data as if it were static - focusing purely on defining the transformation logic. CocoIndex takes care of maintaining the dynamic relationship between sources and target data behind the scenes.
 
-### Internal storage
+## Internal storage
 
 As an indexing flow is long-lived, it needs to store intermediate data to keep track of the states.
 CocoIndex uses internal storage for this purpose.
 
 Currently, CocoIndex uses Postgres database as the internal storage.
-See [Initialization](initialization) for configuring its location, and `cocoindex setup` CLI command (see [CocoIndex CLI](cli)) creates tables for the internal storage.
-
-## Retrieval
-
-There are two ways to retrieve data from target storage built by an indexing flow:
-
-*   Query the underlying target storage directly for maximum flexibility.
-*   Use CocoIndex *query handlers* for a more convenient experience with built-in tooling support (e.g. CocoInsight) to understand query performance against the target data.
-
-Query handlers are tied to specific indexing flows. They accept query inputs, transform them by defined operations, and retrieve matching data from the target storage that was created by the flow.
+See [Initialization](initialization) for configuring its location, and `cocoindex setup` CLI command (see [CocoIndex CLI](cli)) creates tables for the internal storage.
@@ -41,7 +41,7 @@ You may also provide a `cocoindex_cmd` argument to the `main_fn` decorator to ch
 
 ### Explicitly CLI Invoke
 
-An alterntive way is to use `cocoindex.cli.cli` (with type [`click.Group`](https://click.palletsprojects.com/en/stable/api/#click.Group)).
+An alternative way is to use `cocoindex.cli.cli` (with type [`click.Group`](https://click.palletsprojects.com/en/stable/api/#click.Group)).
 For example, you may invoke the CLI explicitly with additional arguments:
 
 <Tabs>
@@ -60,7 +60,7 @@ The following subcommands are available:
 
 | Subcommand | Description |
 | ---------- | ----------- |
-| `ls` | List all flows. |
+| `ls` | List all flows present in the current process. Or list all persisted flows under the current app namespace if `--all` is specified. |
 | `show` | Show the spec for a specific flow. |
 | `setup` | Check and apply backend setup changes for flows, including the internal and target storage (to export). |
 | `drop` | Drop the backend setup for specified flows. |
 
@@ -1,7 +1,6 @@
 ---
 title: Flow Definition
 description: Define a CocoIndex flow, by specifying source, transformations and storages, and connect input/output data of them.
-toc_max_heading_level: 4
 ---
 
 import Tabs from '@theme/Tabs';
@@ -146,8 +145,9 @@ def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataSco
 
 :::info
 
-In live update mode, for each refresh, CocoIndex will traverse the data source to figure out the changes,
+In live update mode, for each refresh, CocoIndex will list rows in the data source to figure out the changes based on metadata such as last modified time,
 and only perform transformations on changed source keys.
+If nothing changed during the last refresh cycle, only list operations will be performed, which is usually cheap for most data sources.
 
 :::
 
@@ -311,6 +311,38 @@ Following metrics are supported:
 
 ## Miscellaneous
 
+### Getting App Namespace
+
+You can use the [`app_namespace` setting](initialization#app-namespace) or `COCOINDEX_APP_NAMESPACE` environment variable to specify the app namespace,
+to organize flows across different environments (e.g., dev, staging, production), team members, etc.
+
+In the code, You can call `flow.get_app_namespace()` to get the app namespace, and use it to name certain backends. It takes the following arguments:
+
+*   `trailing_delimiter` (optional): a string to append to the app namespace when it's not empty.
+
+e.g. when the current app namespace is `Staging`, `flow.get_app_namespace(trailing_delimiter='.')` will return `Staging.`.
+
+For example,
+
+<Tabs>
+<TabItem value="python" label="Python" default>
+
+```python
+doc_embeddings.export(
+    "doc_embeddings",
+    cocoindex.storages.Qdrant(
+        collection_name=cocoindex.get_app_namespace(trailing_delimiter='__') + "doc_embeddings",
+        ...
+    ),
+    ...
+)
+```
+
+</TabItem>
+</Tabs>
+
+It will use `Staging__doc_embeddings` as the collection name if the current app namespace is `Staging`, and use `doc_embeddings` if the app namespace is empty.
+
 ### Target Declarations
 
 Most time a target storage is created by calling `export()` method on a collector, and this `export()` call comes with configurations needed for the target storage, e.g. options for storage indexes.
 
@@ -105,7 +105,7 @@ A data source may enable one or multiple *change capture mechanisms*:
 *   Configured with a [refresh interval](flow_def#refresh-interval), which is generally applicable to all data sources.
 
 *   Specific data sources also provide their specific change capture mechanisms.
-    For example, [`GoogleDrive` source](../ops/sources#googledrive) allows polling recent modified files.
+    For example, [`AmazonS3` source](../ops/sources/#amazons3) watches S3 bucket's change events, and [`GoogleDrive` source](../ops/sources#googledrive) allows polling recent modified files.
     See documentations for specific data sources.
 
 Change capture mechanisms enable CocoIndex to continuously capture changes from the source data and update the target data accordingly, under live update mode.
 
@@ -83,8 +83,20 @@ if __name__ == "__main__":
 
 `cocoindex.Settings` is used to configure the CocoIndex library.  It's a dataclass that contains the following fields:
 
+*   `app_namespace` (type: `str`, required): The namespace of the application.
 *   `database` (type: `DatabaseConnectionSpec`, required): The connection to the Postgres database.
 
+### App Namespace
+
+The `app_namespace` field helps organize flows across different environments (e.g., dev, staging, production), team members, etc. When set, it prefixes flow names with the namespace.
+
+For example, if the namespace is `Staging`, for a flow with name specified as `Flow1` in code, the full name of the flow will be `Staging.Flow1`.
+You can also get the current app namespace by calling `cocoindex.get_app_namespace()` (see [Getting App Namespace](flow_def#getting-app-namespace) for more details).
+
+If not set, all flows are in a default unnamed namespace.
+
+You can also control it by the `COCOINDEX_APP_NAMESPACE` environment variable.
+
 ### DatabaseConnectionSpec
 
 `DatabaseConnectionSpec` configures the connection to a database. Only Postgres is supported for now. It has the following fields:
@@ -116,6 +128,7 @@ Each setting field has a corresponding environment variable:
 
 | environment variable | corresponding field in `Settings` | required? |
 |---------------------|-------------------|----------|
+| `COCOINDEX_APP_NAMESPACE` | `app_namespace` | No |
 | `COCOINDEX_DATABASE_URL` | `database.url` | Yes |
 | `COCOINDEX_DATABASE_USER` | `database.user` | No |
 | `COCOINDEX_DATABASE_PASSWORD` | `database.password` | No |