From e6e897a3affa1ae459732a6a3425ee62b957717c Mon Sep 17 00:00:00 2001 From: Linghua Jin Date: Fri, 10 Oct 2025 00:24:33 -0700 Subject: [PATCH 1/6] fix links in examples --- docs/docs/core/flow_methods.mdx | 2 +- docs/docs/examples/examples/image_search.md | 2 +- docs/docs/examples/examples/multi_format_index.md | 2 +- docs/docs/examples/examples/photo_search.md | 4 ++-- docs/docs/examples/examples/postgres_source.md | 2 +- examples/amazon_s3_embedding/README.md | 2 +- examples/azure_blob_embedding/README.md | 2 +- examples/gdrive_text_embedding/README.md | 2 +- 8 files changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/docs/core/flow_methods.mdx b/docs/docs/core/flow_methods.mdx index c90986898..39374fc64 100644 --- a/docs/docs/core/flow_methods.mdx +++ b/docs/docs/core/flow_methods.mdx @@ -210,7 +210,7 @@ A data source may enable one or multiple *change capture mechanisms*: * Configured with a [refresh interval](flow_def#refresh-interval), which is generally applicable to all data sources. * Specific data sources also provide their specific change capture mechanisms. - For example, [`Postgres` source](../sources/#postgres) listens to PostgreSQL's change notifications, [`AmazonS3` source](../sources/#amazons3) watches S3 bucket's change events, and [`GoogleDrive` source](../sources#googledrive) allows polling recent modified files. + For example, [`Postgres` source](../sources/postgres) listens to PostgreSQL's change notifications, [`AmazonS3` source](../sources/amazons3) watches S3 bucket's change events, and [`GoogleDrive` source](../sources/googledrive) allows polling recent modified files. See documentations for specific data sources. Change capture mechanisms enable CocoIndex to continuously capture changes from the source data and update the target data accordingly, under live update mode. diff --git a/docs/docs/examples/examples/image_search.md b/docs/docs/examples/examples/image_search.md index 783108c00..3e4687841 100644 --- a/docs/docs/examples/examples/image_search.md +++ b/docs/docs/examples/examples/image_search.md @@ -66,7 +66,7 @@ def image_object_embedding_flow(flow_builder, data_scope): The `add_source` function sets up a table with fields like `filename` and `content`. Images are automatically re-scanned every minute. - + ## Process Each Image and Collect the Embedding diff --git a/docs/docs/examples/examples/multi_format_index.md b/docs/docs/examples/examples/multi_format_index.md index b062fa296..2b0e9e319 100644 --- a/docs/docs/examples/examples/multi_format_index.md +++ b/docs/docs/examples/examples/multi_format_index.md @@ -52,7 +52,7 @@ data_scope["documents"] = flow_builder.add_source( cocoindex.sources.LocalFile(path="source_files", binary=True) ) ``` - + ## Convert Files to Pages diff --git a/docs/docs/examples/examples/photo_search.md b/docs/docs/examples/examples/photo_search.md index 1474b6a2e..d17c998cf 100644 --- a/docs/docs/examples/examples/photo_search.md +++ b/docs/docs/examples/examples/photo_search.md @@ -65,8 +65,8 @@ def face_recognition_flow(flow_builder, data_scope): This creates a table with `filename` and `content` fields. 📂 -You can connect it to your [S3 Buckets](https://cocoindex.io/docs/ops/sources#amazons3) (with SQS integration, [example](https://cocoindex.io/blogs/s3-incremental-etl)) -or [Azure Blob store](https://cocoindex.io/docs/ops/sources#azureblob). +You can connect it to your [S3 Buckets](https://cocoindex.io/docs/ops/sources/amazons3) (with SQS integration, [example](https://cocoindex.io/blogs/s3-incremental-etl)) +or [Azure Blob store](https://cocoindex.io/docs/ops/sources/azureblob). ## Detect and Extract Faces diff --git a/docs/docs/examples/examples/postgres_source.md b/docs/docs/examples/examples/postgres_source.md index 00cf99e51..5f3d49141 100644 --- a/docs/docs/examples/examples/postgres_source.md +++ b/docs/docs/examples/examples/postgres_source.md @@ -59,7 +59,7 @@ CocoIndex incrementally sync data from Postgres. When new or updated rows are fo - `notification` enables change capture based on Postgres LISTEN/NOTIFY. Each change triggers an incremental processing on the specific row immediately. - Regardless if `notification` is provided or not, CocoIndex still needs to scan the full table to detect changes in some scenarios (e.g. between two `update` invocation), and the `ordinal_column` provides a field that CocoIndex can use to quickly detect which row has changed without reading value columns. -Check [Postgres source](https://cocoindex.io/docs/ops/sources#postgres) for more details. +Check [Postgres source](https://cocoindex.io/docs/ops/sources/postgres) for more details. If you use the Postgres database hosted by Supabase, please click Connect on your project dashboard and find the URL there. Check [DatabaseConnectionSpec](https://cocoindex.io/docs/core/settings#databaseconnectionspec) for more details. diff --git a/examples/amazon_s3_embedding/README.md b/examples/amazon_s3_embedding/README.md index bae588f4e..4224498d0 100644 --- a/examples/amazon_s3_embedding/README.md +++ b/examples/amazon_s3_embedding/README.md @@ -9,7 +9,7 @@ Before running the example, you need to: 1. [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one. 2. Prepare for Amazon S3. - See [Setup for AWS S3](https://cocoindex.io/docs/ops/sources#setup-for-amazon-s3) for more details. + See [Setup for AWS S3](https://cocoindex.io/docs/sources/amazons3#setup-for-amazon-s3) for more details. 3. Create a `.env` file with your Amazon S3 bucket name and (optionally) prefix. Start from copying the `.env.example`, and then edit it to fill in your bucket name and prefix. diff --git a/examples/azure_blob_embedding/README.md b/examples/azure_blob_embedding/README.md index c5d250e2d..582b1b088 100644 --- a/examples/azure_blob_embedding/README.md +++ b/examples/azure_blob_embedding/README.md @@ -9,7 +9,7 @@ Before running the example, you need to: 1. [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one. 2. Prepare for Azure Blob Storage. - See [Setup for Azure Blob Storage](https://cocoindex.io/docs/ops/sources#setup-for-azure-blob-storage) for more details. + See [Setup for Azure Blob Storage](https://cocoindex.io/docs/sources/azureblob#setup-for-azure-blob-storage) for more details. 3. Create a `.env` file with your Azure Blob Storage container name and (optionally) prefix. Start from copying the `.env.example`, and then edit it to fill in your bucket name and prefix. diff --git a/examples/gdrive_text_embedding/README.md b/examples/gdrive_text_embedding/README.md index 6cb4cfa74..55bac06d8 100644 --- a/examples/gdrive_text_embedding/README.md +++ b/examples/gdrive_text_embedding/README.md @@ -30,7 +30,7 @@ Before running the example, you need to: - Setup a service account in Google Cloud, and download the credential file. - Share folders containing files you want to import with the service account's email address. - See [Setup for Google Drive](https://cocoindex.io/docs/ops/sources#setup-for-google-drive) for more details. + See [Setup for Google Drive](https://cocoindex.io/docs/sources/googledrive#setup-for-google-drive) for more details. 3. Create `.env` file with your credential file and folder IDs. Starting from copying the `.env.example`, and then edit it to fill in your credential file path and folder IDs. From b546900a16f772effb2f61bf18be67db7b2e8420 Mon Sep 17 00:00:00 2001 From: Linghua Jin Date: Sun, 12 Oct 2025 16:25:52 -0700 Subject: [PATCH 2/6] update documentations on kuzu --- docs/docs/targets/index.md | 1 - docs/docs/targets/kuzu.md | 4 +++- examples/docs_to_knowledge_graph/main.py | 18 ------------------ examples/product_recommendation/README.md | 6 ++---- examples/product_recommendation/main.py | 17 ----------------- 5 files changed, 5 insertions(+), 41 deletions(-) diff --git a/docs/docs/targets/index.md b/docs/docs/targets/index.md index 36d117b78..c90d76545 100644 --- a/docs/docs/targets/index.md +++ b/docs/docs/targets/index.md @@ -18,7 +18,6 @@ The way to map data from a data collector to a target depends on data model of t | [Qdrant](/docs/targets/qdrant) | Vector Database, Keyword Search | | [LanceDB](/docs/targets/lancedb) | Vector Database, Keyword Search | | [Neo4j](/docs/targets/neo4j) | [Property graph](#property-graph-targets) | -| [Kuzu](/docs/targets/kuzu) | [Property graph](#property-graph-targets) | If you are looking for targets beyond here, you can always use [custom targets](/docs/custom_ops/custom_targets) as building blocks. diff --git a/docs/docs/targets/kuzu.md b/docs/docs/targets/kuzu.md index ae129ef3e..441e9e784 100644 --- a/docs/docs/targets/kuzu.md +++ b/docs/docs/targets/kuzu.md @@ -5,7 +5,9 @@ toc_max_heading_level: 4 --- import { ExampleButton } from '../../src/components/GitHubButton'; -# Kuzu +# Kuzu (Archived) + +Note:[Kuzu](https://github.com/kuzudb/kuzu) - embedded graph database is no longer maintained. Exports data to a [Kuzu](https://kuzu.com/) graph database. diff --git a/examples/docs_to_knowledge_graph/main.py b/examples/docs_to_knowledge_graph/main.py index 7150809e0..e0fcd17d9 100644 --- a/examples/docs_to_knowledge_graph/main.py +++ b/examples/docs_to_knowledge_graph/main.py @@ -14,30 +14,12 @@ password="cocoindex", ), ) -kuzu_conn_spec = cocoindex.add_auth_entry( - "KuzuConnection", - cocoindex.targets.KuzuConnection( - api_server_url="http://localhost:8123", - ), -) - -# SELECT ONE GRAPH DATABASE TO USE -# This example can use either Neo4j or Kuzu as the graph database. -# Please make sure only one branch is live and others are commented out. -# Use Neo4j GraphDbSpec = cocoindex.targets.Neo4j GraphDbConnection = cocoindex.targets.Neo4jConnection GraphDbDeclaration = cocoindex.targets.Neo4jDeclaration conn_spec = neo4j_conn_spec -# Use Kuzu -# GraphDbSpec = cocoindex.targets.Kuzu -# GraphDbConnection = cocoindex.targets.KuzuConnection -# GraphDbDeclaration = cocoindex.targets.KuzuDeclaration -# conn_spec = kuzu_conn_spec - - @dataclasses.dataclass class DocumentSummary: """Describe a summary of a document.""" diff --git a/examples/product_recommendation/README.md b/examples/product_recommendation/README.md index c2e6e7f94..b348ae5fe 100644 --- a/examples/product_recommendation/README.md +++ b/examples/product_recommendation/README.md @@ -8,9 +8,8 @@ Please drop [CocoIndex on Github](https://github.com/cocoindex-io/cocoindex) a s ## Prerequisite -* [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one. -* Install [Neo4j](https://cocoindex.io/docs/ops/targets#neo4j-dev-instance) or [Kuzu](https://cocoindex.io/docs/ops/targets#kuzu-dev-instance) if you don't have one. - * The example uses Neo4j by default for now. If you want to use Kuzu, find out the "SELECT ONE GRAPH DATABASE TO USE" section and switch the active branch. +* [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) +* Install [Neo4j](https://cocoindex.io/docs/ops/targets#neo4j-dev-instance) * [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). ## Documentation @@ -43,7 +42,6 @@ cocoindex update main After the knowledge graph is built, you can explore the knowledge graph. * If you're using Neo4j, you can open the explorer at [http://localhost:7474](http://localhost:7474), with username `neo4j` and password `cocoindex`. -* If you're using Kuzu, you can start a Kuzu explorer locally. See [Kuzu dev instance](https://cocoindex.io/docs/ops/targets#kuzu-dev-instance) for more details. You can run the following Cypher query to get all relationships: diff --git a/examples/product_recommendation/main.py b/examples/product_recommendation/main.py index 4c2b91238..b63cf1436 100644 --- a/examples/product_recommendation/main.py +++ b/examples/product_recommendation/main.py @@ -15,29 +15,12 @@ password="cocoindex", ), ) -kuzu_conn_spec = cocoindex.add_auth_entry( - "KuzuConnection", - cocoindex.targets.KuzuConnection( - api_server_url="http://localhost:8123", - ), -) -# SELECT ONE GRAPH DATABASE TO USE -# This example can use either Neo4j or Kuzu as the graph database. -# Please make sure only one branch is live and others are commented out. - -# Use Neo4j GraphDbSpec = cocoindex.targets.Neo4j GraphDbConnection = cocoindex.targets.Neo4jConnection GraphDbDeclaration = cocoindex.targets.Neo4jDeclaration conn_spec = neo4j_conn_spec -# Use Kuzu -# GraphDbSpec = cocoindex.targets.Kuzu -# GraphDbConnection = cocoindex.targets.KuzuConnection -# GraphDbDeclaration = cocoindex.targets.KuzuDeclaration -# conn_spec = kuzu_conn_spec - # Template for rendering product information as markdown to provide information to LLMs PRODUCT_TEMPLATE = """ From a82935efc2ce71b34e45917b9d454d341cd55fe3 Mon Sep 17 00:00:00 2001 From: Linghua Jin Date: Sun, 12 Oct 2025 16:29:40 -0700 Subject: [PATCH 3/6] sources link --- docs/docs/examples/examples/academic_papers_index.md | 2 +- docs/docs/examples/examples/codebase_index.md | 2 +- docs/docs/examples/examples/custom_targets.md | 2 +- docs/docs/examples/examples/docs_to_knowledge_graph.md | 2 +- docs/docs/examples/examples/document_ai.md | 4 ++-- docs/docs/examples/examples/image_search.md | 4 ++-- docs/docs/examples/examples/manual_extraction.md | 2 +- docs/docs/examples/examples/multi_format_index.md | 4 ++-- docs/docs/examples/examples/patient_form_extraction.md | 4 ++-- docs/docs/examples/examples/photo_search.md | 4 ++-- docs/docs/examples/examples/postgres_source.md | 2 +- docs/docs/examples/examples/product_recommendation.md | 2 +- docs/docs/examples/examples/simple_vector_index.md | 2 +- docs/docs/getting_started/quickstart.md | 2 +- examples/docs_to_knowledge_graph/README.md | 6 +++--- examples/patient_intake_extraction/README.md | 2 +- examples/product_recommendation/README.md | 4 ++-- examples/text_embedding_qdrant/README.md | 2 +- 18 files changed, 26 insertions(+), 26 deletions(-) diff --git a/docs/docs/examples/examples/academic_papers_index.md b/docs/docs/examples/examples/academic_papers_index.md index 0db8f1566..278a1e4e7 100644 --- a/docs/docs/examples/examples/academic_papers_index.md +++ b/docs/docs/examples/examples/academic_papers_index.md @@ -64,7 +64,7 @@ def paper_metadata_flow( ``` `flow_builder.add_source` will create a table with sub fields (`filename`, `content`). - + ## Extract and collect metadata diff --git a/docs/docs/examples/examples/codebase_index.md b/docs/docs/examples/examples/codebase_index.md index 0e1caa672..9863b1dbe 100644 --- a/docs/docs/examples/examples/codebase_index.md +++ b/docs/docs/examples/examples/codebase_index.md @@ -70,7 +70,7 @@ def code_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoind - Exclude files and directories starting `.`, `target` in the root and `node_modules` under any directory. `flow_builder.add_source` will create a table with sub fields (`filename`, `content`). - + ## Process each file and collect the information diff --git a/docs/docs/examples/examples/custom_targets.md b/docs/docs/examples/examples/custom_targets.md index 3094f1a79..fa53b87de 100644 --- a/docs/docs/examples/examples/custom_targets.md +++ b/docs/docs/examples/examples/custom_targets.md @@ -36,7 +36,7 @@ flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope ) ``` This ingestion creates a table with `filename` and `content` fields. - + ## Process each file and collect diff --git a/docs/docs/examples/examples/docs_to_knowledge_graph.md b/docs/docs/examples/examples/docs_to_knowledge_graph.md index 9d90c196f..d301ca461 100644 --- a/docs/docs/examples/examples/docs_to_knowledge_graph.md +++ b/docs/docs/examples/examples/docs_to_knowledge_graph.md @@ -66,7 +66,7 @@ def docs_to_kg_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.D Here `flow_builder.add_source` creates a [KTable](https://cocoindex.io/docs/core/data_types#KTable). `filename` is the key of the KTable. - + ### Add data collectors diff --git a/docs/docs/examples/examples/document_ai.md b/docs/docs/examples/examples/document_ai.md index 6ef86de89..d35fa06d6 100644 --- a/docs/docs/examples/examples/document_ai.md +++ b/docs/docs/examples/examples/document_ai.md @@ -98,7 +98,7 @@ data_scope["documents"] = flow_builder.add_source( doc_embeddings = data_scope.add_collector() ``` - + @@ -154,4 +154,4 @@ For a step-by-step walkthrough of each indexing stage and the query path, check CocoIndex natively supports Google Drive, Amazon S3, Azure Blob Storage, and more with native incremental processing out of box - when new or updated files are detected, the pipeline will capture the changes and only process what's changed. - + diff --git a/docs/docs/examples/examples/image_search.md b/docs/docs/examples/examples/image_search.md index 3e4687841..db5cb1534 100644 --- a/docs/docs/examples/examples/image_search.md +++ b/docs/docs/examples/examples/image_search.md @@ -66,7 +66,7 @@ def image_object_embedding_flow(flow_builder, data_scope): The `add_source` function sets up a table with fields like `filename` and `content`. Images are automatically re-scanned every minute. - + ## Process Each Image and Collect the Embedding @@ -266,6 +266,6 @@ One of CocoIndex’s core strengths is its ability to connect to your existing d - Amazon S3 / SQS - Azure Blob Storage - + Once connected, CocoIndex continuously watches for changes — new uploads, updates, or deletions — and applies them to your index in real time. diff --git a/docs/docs/examples/examples/manual_extraction.md b/docs/docs/examples/examples/manual_extraction.md index 21c0367de..0d76bdc0e 100644 --- a/docs/docs/examples/examples/manual_extraction.md +++ b/docs/docs/examples/examples/manual_extraction.md @@ -67,7 +67,7 @@ def manual_extraction_flow( - `filename` (key, type: `str`): the filename of the file, e.g. `dir1/file1.md` - `content` (type: `str` if `binary` is `False`, otherwise `bytes`): the content of the file - + ## Parse Markdown diff --git a/docs/docs/examples/examples/multi_format_index.md b/docs/docs/examples/examples/multi_format_index.md index 2b0e9e319..8880602f1 100644 --- a/docs/docs/examples/examples/multi_format_index.md +++ b/docs/docs/examples/examples/multi_format_index.md @@ -52,7 +52,7 @@ data_scope["documents"] = flow_builder.add_source( cocoindex.sources.LocalFile(path="source_files", binary=True) ) ``` - + ## Convert Files to Pages @@ -203,4 +203,4 @@ Follow the url `https://cocoindex.io/cocoinsight`. It connects to your local Co ## Connect to other sources CocoIndex natively supports Google Drive, Amazon S3, Azure Blob Storage, and more. - + diff --git a/docs/docs/examples/examples/patient_form_extraction.md b/docs/docs/examples/examples/patient_form_extraction.md index 5068d54e0..ab92d7762 100644 --- a/docs/docs/examples/examples/patient_form_extraction.md +++ b/docs/docs/examples/examples/patient_form_extraction.md @@ -66,7 +66,7 @@ def patient_intake_extraction_flow( `flow_builder.add_source` will create a table with a few sub fields. - + ## Parse documents with different formats to Markdown @@ -298,4 +298,4 @@ Click on the `markdown` column for `Patient_Intake_Form_Joe.pdf`, you could see ## Connect to other sources CocoIndex natively supports Google Drive, Amazon S3, Azure Blob Storage, and more. - + diff --git a/docs/docs/examples/examples/photo_search.md b/docs/docs/examples/examples/photo_search.md index d17c998cf..07a8ca3ab 100644 --- a/docs/docs/examples/examples/photo_search.md +++ b/docs/docs/examples/examples/photo_search.md @@ -65,8 +65,8 @@ def face_recognition_flow(flow_builder, data_scope): This creates a table with `filename` and `content` fields. 📂 -You can connect it to your [S3 Buckets](https://cocoindex.io/docs/ops/sources/amazons3) (with SQS integration, [example](https://cocoindex.io/blogs/s3-incremental-etl)) -or [Azure Blob store](https://cocoindex.io/docs/ops/sources/azureblob). +You can connect it to your [S3 Buckets](https://cocoindex.io/docs/sources/amazons3) (with SQS integration, [example](https://cocoindex.io/blogs/s3-incremental-etl)) +or [Azure Blob store](https://cocoindex.io/docs/sources/azureblob). ## Detect and Extract Faces diff --git a/docs/docs/examples/examples/postgres_source.md b/docs/docs/examples/examples/postgres_source.md index 5f3d49141..d0b512920 100644 --- a/docs/docs/examples/examples/postgres_source.md +++ b/docs/docs/examples/examples/postgres_source.md @@ -59,7 +59,7 @@ CocoIndex incrementally sync data from Postgres. When new or updated rows are fo - `notification` enables change capture based on Postgres LISTEN/NOTIFY. Each change triggers an incremental processing on the specific row immediately. - Regardless if `notification` is provided or not, CocoIndex still needs to scan the full table to detect changes in some scenarios (e.g. between two `update` invocation), and the `ordinal_column` provides a field that CocoIndex can use to quickly detect which row has changed without reading value columns. -Check [Postgres source](https://cocoindex.io/docs/ops/sources/postgres) for more details. +Check [Postgres source](https://cocoindex.io/docs/sources/postgres) for more details. If you use the Postgres database hosted by Supabase, please click Connect on your project dashboard and find the URL there. Check [DatabaseConnectionSpec](https://cocoindex.io/docs/core/settings#databaseconnectionspec) for more details. diff --git a/docs/docs/examples/examples/product_recommendation.md b/docs/docs/examples/examples/product_recommendation.md index 110201b24..04af51a23 100644 --- a/docs/docs/examples/examples/product_recommendation.md +++ b/docs/docs/examples/examples/product_recommendation.md @@ -30,7 +30,7 @@ Product taxonomy is a way to organize product catalogs in a logical and hierarch ## Prerequisites * [Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres). CocoIndex uses PostgreSQL internally for incremental processing. -* [Install Neo4j](https://cocoindex.io/docs/ops/storages#Neo4j), a graph database. +* [Install Neo4j](https://cocoindex.io/docs/targets#Neo4j), a graph database. * - [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Create a `.env` file from `.env.example`, and fill `OPENAI_API_KEY`. Alternatively, we have native support for Gemini, Ollama, LiteLLM. You can choose your favorite LLM provider and work completely on-premises. diff --git a/docs/docs/examples/examples/simple_vector_index.md b/docs/docs/examples/examples/simple_vector_index.md index 53a543887..017fcda57 100644 --- a/docs/docs/examples/examples/simple_vector_index.md +++ b/docs/docs/examples/examples/simple_vector_index.md @@ -51,7 +51,7 @@ def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoind ``` `flow_builder.add_source` will create a table with sub fields (`filename`, `content`) - + ## Process each file and collect the embeddings diff --git a/docs/docs/getting_started/quickstart.md b/docs/docs/getting_started/quickstart.md index eb5656993..72bfbd7b2 100644 --- a/docs/docs/getting_started/quickstart.md +++ b/docs/docs/getting_started/quickstart.md @@ -64,7 +64,7 @@ doc_embeddings = data_scope.add_collector() `flow_builder.add_source` will create a table with sub fields (`filename`, `content`) - + diff --git a/examples/docs_to_knowledge_graph/README.md b/examples/docs_to_knowledge_graph/README.md index 714c2046c..5a7b21012 100644 --- a/examples/docs_to_knowledge_graph/README.md +++ b/examples/docs_to_knowledge_graph/README.md @@ -14,12 +14,12 @@ Please drop [Cocoindex on Github](https://github.com/cocoindex-io/cocoindex) a s ## Prerequisite * [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one. -* Install [Neo4j](https://cocoindex.io/docs/ops/targets#neo4j-dev-instance) or [Kuzu](https://cocoindex.io/docs/ops/targets#kuzu-dev-instance) if you don't have one. +* Install [Neo4j](https://cocoindex.io/docs/targets#neo4j-dev-instance) or [Kuzu](https://cocoindex.io/docs/targets#kuzu-dev-instance) if you don't have one. * The example uses Neo4j by default for now. If you want to use Kuzu, find out the "SELECT ONE GRAPH DATABASE TO USE" section and switch the active branch. * Install / configure LLM API. In this example we use Ollama, which runs LLM model locally. You need to get it ready following [this guide](https://cocoindex.io/docs/ai/llm#ollama). Alternatively, you can also follow the comments in source code to switch to OpenAI, and [configure OpenAI API key](https://cocoindex.io/docs/ai/llm#openai) before running the example. ## Documentation -You can read the official CocoIndex Documentation for Property Graph Targets [here](https://cocoindex.io/docs/ops/targets#property-graph-targets). +You can read the official CocoIndex Documentation for Property Graph Targets [here](https://cocoindex.io/docs/targets#property-graph-targets). ## Run @@ -48,7 +48,7 @@ cocoindex update main After the knowledge graph is built, you can explore the knowledge graph. * If you're using Neo4j, you can open the explorer at [http://localhost:7474](http://localhost:7474), with username `neo4j` and password `cocoindex`. -* If you're using Kuzu, you can start a Kuzu explorer locally. See [Kuzu dev instance](https://cocoindex.io/docs/ops/targets#kuzu-dev-instance) for more details. +* If you're using Kuzu, you can start a Kuzu explorer locally. See [Kuzu dev instance](https://cocoindex.io/docs/targets#kuzu-dev-instance) for more details. You can run the following Cypher query to get all relationships: diff --git a/examples/patient_intake_extraction/README.md b/examples/patient_intake_extraction/README.md index b25fe281a..0043d55e9 100644 --- a/examples/patient_intake_extraction/README.md +++ b/examples/patient_intake_extraction/README.md @@ -4,7 +4,7 @@ We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/c This repo shows how to use LLM to extract structured data from patient intake forms with different formats - like PDF, Docx, etc. -CocoIndex supports multiple [sources](https://cocoindex.io/docs/ops/sources) and [LLM models](https://cocoindex.io/docs/ai/llm) natively. +CocoIndex supports multiple [sources](https://cocoindex.io/docs/sources) and [LLM models](https://cocoindex.io/docs/ai/llm) natively. ![Structured Data From Patient Intake Forms](https://github.com/user-attachments/assets/1f6afb69-d26d-4a08-8774-13982d6aec1e) diff --git a/examples/product_recommendation/README.md b/examples/product_recommendation/README.md index b348ae5fe..fa8bafb20 100644 --- a/examples/product_recommendation/README.md +++ b/examples/product_recommendation/README.md @@ -9,11 +9,11 @@ Please drop [CocoIndex on Github](https://github.com/cocoindex-io/cocoindex) a s ## Prerequisite * [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) -* Install [Neo4j](https://cocoindex.io/docs/ops/targets#neo4j-dev-instance) +* Install [Neo4j](https://cocoindex.io/docs/targets#neo4j-dev-instance) * [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). ## Documentation -You can read the official CocoIndex Documentation for Property Graph Targets [here](https://cocoindex.io/docs/ops/targets#property-graph-targets). +You can read the official CocoIndex Documentation for Property Graph Targets [here](https://cocoindex.io/docs/targets#property-graph-targets). ## Run diff --git a/examples/text_embedding_qdrant/README.md b/examples/text_embedding_qdrant/README.md index be307232a..da2b1cbf6 100644 --- a/examples/text_embedding_qdrant/README.md +++ b/examples/text_embedding_qdrant/README.md @@ -2,7 +2,7 @@ [![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex) -CocoIndex supports Qdrant natively - [documentation](https://cocoindex.io/docs/ops/targets#qdrant). In this example, we will build index flow from text embedding from local markdown files, and query the index. We will use **Qdrant** as the vector database. +CocoIndex supports Qdrant natively - [documentation](https://cocoindex.io/docs/targets#qdrant). In this example, we will build index flow from text embedding from local markdown files, and query the index. We will use **Qdrant** as the vector database. We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful. From 14d3a5c465033cd884296632aabf0b841b3ca499 Mon Sep 17 00:00:00 2001 From: Linghua Jin Date: Sun, 12 Oct 2025 16:33:05 -0700 Subject: [PATCH 4/6] more links --- docs/docs/examples/examples/product_recommendation.md | 2 +- examples/docs_to_knowledge_graph/README.md | 5 +---- examples/product_recommendation/README.md | 2 +- examples/text_embedding_qdrant/README.md | 2 +- 4 files changed, 4 insertions(+), 7 deletions(-) diff --git a/docs/docs/examples/examples/product_recommendation.md b/docs/docs/examples/examples/product_recommendation.md index 04af51a23..e912b5946 100644 --- a/docs/docs/examples/examples/product_recommendation.md +++ b/docs/docs/examples/examples/product_recommendation.md @@ -30,7 +30,7 @@ Product taxonomy is a way to organize product catalogs in a logical and hierarch ## Prerequisites * [Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres). CocoIndex uses PostgreSQL internally for incremental processing. -* [Install Neo4j](https://cocoindex.io/docs/targets#Neo4j), a graph database. +* [Install Neo4j](https://cocoindex.io/docs/targets/neo4j), a graph database. * - [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Create a `.env` file from `.env.example`, and fill `OPENAI_API_KEY`. Alternatively, we have native support for Gemini, Ollama, LiteLLM. You can choose your favorite LLM provider and work completely on-premises. diff --git a/examples/docs_to_knowledge_graph/README.md b/examples/docs_to_knowledge_graph/README.md index 5a7b21012..41b38ac1b 100644 --- a/examples/docs_to_knowledge_graph/README.md +++ b/examples/docs_to_knowledge_graph/README.md @@ -14,8 +14,7 @@ Please drop [Cocoindex on Github](https://github.com/cocoindex-io/cocoindex) a s ## Prerequisite * [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one. -* Install [Neo4j](https://cocoindex.io/docs/targets#neo4j-dev-instance) or [Kuzu](https://cocoindex.io/docs/targets#kuzu-dev-instance) if you don't have one. - * The example uses Neo4j by default for now. If you want to use Kuzu, find out the "SELECT ONE GRAPH DATABASE TO USE" section and switch the active branch. +* Install [Neo4j](https://cocoindex.io/docs/targets/neo4j). * Install / configure LLM API. In this example we use Ollama, which runs LLM model locally. You need to get it ready following [this guide](https://cocoindex.io/docs/ai/llm#ollama). Alternatively, you can also follow the comments in source code to switch to OpenAI, and [configure OpenAI API key](https://cocoindex.io/docs/ai/llm#openai) before running the example. ## Documentation @@ -48,8 +47,6 @@ cocoindex update main After the knowledge graph is built, you can explore the knowledge graph. * If you're using Neo4j, you can open the explorer at [http://localhost:7474](http://localhost:7474), with username `neo4j` and password `cocoindex`. -* If you're using Kuzu, you can start a Kuzu explorer locally. See [Kuzu dev instance](https://cocoindex.io/docs/targets#kuzu-dev-instance) for more details. - You can run the following Cypher query to get all relationships: ```cypher diff --git a/examples/product_recommendation/README.md b/examples/product_recommendation/README.md index fa8bafb20..f3ce29b0a 100644 --- a/examples/product_recommendation/README.md +++ b/examples/product_recommendation/README.md @@ -9,7 +9,7 @@ Please drop [CocoIndex on Github](https://github.com/cocoindex-io/cocoindex) a s ## Prerequisite * [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) -* Install [Neo4j](https://cocoindex.io/docs/targets#neo4j-dev-instance) +* Install [Neo4j](https://cocoindex.io/docs/targets/neo4j) * [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). ## Documentation diff --git a/examples/text_embedding_qdrant/README.md b/examples/text_embedding_qdrant/README.md index da2b1cbf6..60d1dced1 100644 --- a/examples/text_embedding_qdrant/README.md +++ b/examples/text_embedding_qdrant/README.md @@ -2,7 +2,7 @@ [![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex) -CocoIndex supports Qdrant natively - [documentation](https://cocoindex.io/docs/targets#qdrant). In this example, we will build index flow from text embedding from local markdown files, and query the index. We will use **Qdrant** as the vector database. +CocoIndex supports Qdrant natively - [documentation](https://cocoindex.io/docs/targets/qdrant). In this example, we will build index flow from text embedding from local markdown files, and query the index. We will use **Qdrant** as the vector database. We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful. From f43b4a86daf60ba81723561ef9ea40d0a1391623 Mon Sep 17 00:00:00 2001 From: Linghua Jin Date: Sun, 12 Oct 2025 16:34:28 -0700 Subject: [PATCH 5/6] Update main.py --- examples/docs_to_knowledge_graph/main.py | 1 + 1 file changed, 1 insertion(+) diff --git a/examples/docs_to_knowledge_graph/main.py b/examples/docs_to_knowledge_graph/main.py index 92a1d0782..4dc9e5057 100644 --- a/examples/docs_to_knowledge_graph/main.py +++ b/examples/docs_to_knowledge_graph/main.py @@ -20,6 +20,7 @@ GraphDbDeclaration = cocoindex.targets.Neo4jDeclaration conn_spec = neo4j_conn_spec + @dataclasses.dataclass class DocumentSummary: """Describe a summary of a document.""" From cea0b1f235eb7d2271bda2460485e1aec8e43fd3 Mon Sep 17 00:00:00 2001 From: Linghua Jin Date: Tue, 14 Oct 2025 00:01:08 -0700 Subject: [PATCH 6/6] Update index.md --- docs/docs/sources/index.md | 343 ------------------------------------- 1 file changed, 343 deletions(-) diff --git a/docs/docs/sources/index.md b/docs/docs/sources/index.md index c61b79d71..09cbe1662 100644 --- a/docs/docs/sources/index.md +++ b/docs/docs/sources/index.md @@ -20,346 +20,3 @@ Related: - [Life cycle of a indexing flow](/docs/core/basics#life-cycle-of-an-indexing-flow) - [Live Update Tutorial](/docs/tutorials/live_updates) for change capture mechanisms. - - - -## LocalFile - -The `LocalFile` source imports files from a local file system. - -### Spec - -The spec takes the following fields: -* `path` (`str`): full path of the root directory to import files from -* `binary` (`bool`, optional): whether reading files as binary (instead of text) -* `included_patterns` (`list[str]`, optional): a list of glob patterns to include files, e.g. `["*.txt", "docs/**/*.md"]`. - If not specified, all files will be included. -* `excluded_patterns` (`list[str]`, optional): a list of glob patterns to exclude files, e.g. `["tmp", "**/node_modules"]`. - Any file or directory matching these patterns will be excluded even if they match `included_patterns`. - If not specified, no files will be excluded. - - :::info - - `included_patterns` and `excluded_patterns` are using Unix-style glob syntax. See [globset syntax](https://docs.rs/globset/latest/globset/index.html#syntax) for the details. - - ::: - -### Schema - -The output is a [*KTable*](/docs/core/data_types#ktable) with the following sub fields: -* `filename` (*Str*, key): the filename of the file, including the path, relative to the root directory, e.g. `"dir1/file1.md"` -* `content` (*Str* if `binary` is `False`, *Bytes* otherwise): the content of the file - -## AmazonS3 - -### Setup for Amazon S3 - -#### Setup AWS accounts - -You need to setup AWS accounts to own and access Amazon S3. In particular, - -* Setup an AWS account from [AWS homepage](https://aws.amazon.com/) or login with an existing account. -* AWS recommends all programming access to AWS should be done using [IAM users](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users.html) instead of root account. You can create an IAM user at [AWS IAM Console](https://console.aws.amazon.com/iam/home). -* Make sure your IAM user at least have the following permissions in the IAM console: - * Attach permission policy `AmazonS3ReadOnlyAccess` for read-only access to Amazon S3. - * (optional) Attach permission policy `AmazonSQSFullAccess` to receive notifications from Amazon SQS, if you want to enable change event notifications. - Note that `AmazonSQSReadOnlyAccess` is not enough, as we need to be able to delete messages from the queue after they're processed. - - -#### Setup Credentials for AWS SDK - -AWS SDK needs to access credentials to access Amazon S3. -The easiest way to setup credentials is to run: - -```sh -aws configure -``` - -It will create a credentials file at `~/.aws/credentials` and config at `~/.aws/config`. - -See the following documents if you need more control: - -* [`aws configure`](https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-files.html) -* [Globally configuring AWS SDKs and tools](https://docs.aws.amazon.com/sdkref/latest/guide/creds-config-files.html) - - -#### Create Amazon S3 buckets - -You can create a Amazon S3 bucket in the [Amazon S3 Console](https://s3.console.aws.amazon.com/s3/home), and upload your files to it. - -It's also doable by using the AWS CLI `aws s3 mb` (to create buckets) and `aws s3 cp` (to upload files). -When doing so, make sure your current user also has permission policy `AmazonS3FullAccess`. - -#### (Optional) Setup SQS queue for event notifications - -You can setup an Amazon Simple Queue Service (Amazon SQS) queue to receive change event notifications from Amazon S3. -It provides a change capture mechanism for your AmazonS3 data source, to trigger reprocessing of your AWS S3 files on any creation, update or deletion. Please use a dedicated SQS queue for each of your S3 data source. - -This is how to setup: - -* Create a SQS queue with proper access policy. - * In the [Amazon SQS Console](https://console.aws.amazon.com/sqs/home), create a queue. - * Add access policy statements, to make sure Amazon S3 can send messages to the queue. - ```json - { - ... - "Statement": [ - ... - { - "Sid": "__publish_statement", - "Effect": "Allow", - "Principal": { - "Service": "s3.amazonaws.com" - }, - "Resource": "${SQS_QUEUE_ARN}", - "Action": "SQS:SendMessage", - "Condition": { - "ArnLike": { - "aws:SourceArn": "${S3_BUCKET_ARN}" - } - } - } - ] - } - ``` - - Here, you need to replace `${SQS_QUEUE_ARN}` and `${S3_BUCKET_ARN}` with the actual ARN of your SQS queue and S3 bucket. - You can find the ARN of your SQS queue in the existing policy statement (it starts with `arn:aws:sqs:`), and the ARN of your S3 bucket in the S3 console (it starts with `arn:aws:s3:`). - -* In the [Amazon S3 Console](https://s3.console.aws.amazon.com/s3/home), open your S3 bucket. Under *Properties* tab, click *Create event notification*. - * Fill in an arbitrary event name, e.g. `S3ChangeNotifications`. - * If you want your AmazonS3 data source to expose a subset of files sharing a prefix, set the same prefix here. Otherwise, leave it empty. - * Select the following event types: *All object create events*, *All object removal events*. - * Select *SQS queue* as the destination, and specify the SQS queue you created above. - -AWS's [Guide of Configuring a Bucket for Notifications](https://docs.aws.amazon.com/AmazonS3/latest/userguide/ways-to-add-notification-config-to-bucket.html#step1-create-sqs-queue-for-notification) provides more details. - -### Spec - -The spec takes the following fields: -* `bucket_name` (`str`): Amazon S3 bucket name. -* `prefix` (`str`, optional): if provided, only files with path starting with this prefix will be imported. -* `binary` (`bool`, optional): whether reading files as binary (instead of text). -* `included_patterns` (`list[str]`, optional): a list of glob patterns to include files, e.g. `["*.txt", "docs/**/*.md"]`. - If not specified, all files will be included. -* `excluded_patterns` (`list[str]`, optional): a list of glob patterns to exclude files, e.g. `["*.tmp", "**/*.log"]`. - Any file or directory matching these patterns will be excluded even if they match `included_patterns`. - If not specified, no files will be excluded. - - :::info - - `included_patterns` and `excluded_patterns` are using Unix-style glob syntax. See [globset syntax](https://docs.rs/globset/latest/globset/index.html#syntax) for the details. - - ::: - -* `sqs_queue_url` (`str`, optional): if provided, the source will receive change event notifications from Amazon S3 via this SQS queue. - - :::info - - We will delete messages from the queue after they're processed. - If there are unrelated messages in the queue (e.g. test messages that SQS will send automatically on queue creation, messages for a different bucket, for non-included files, etc.), we will delete the message upon receiving it, to avoid repeatedly receiving irrelevant messages after they're redelivered. - - ::: - -### Schema - -The output is a [*KTable*](/docs/core/data_types#ktable) with the following sub fields: - -* `filename` (*Str*, key): the filename of the file, including the path, relative to the root directory, e.g. `"dir1/file1.md"`. -* `content` (*Str* if `binary` is `False`, otherwise *Bytes*): the content of the file. - - -## AzureBlob - -The `AzureBlob` source imports files from Azure Blob Storage. - -### Setup for Azure Blob Storage - -#### Get Started - -If you didn't have experience with Azure Blob Storage, you can refer to the [quickstart](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-portal). -These are actions you need to take: - -* Create a storage account in the [Azure Portal](https://portal.azure.com/). -* Create a container in the storage account. -* Upload your files to the container. -* Grant the user / identity / service principal (depends on your authentication method, see below) access to the storage account. At minimum, a **Storage Blob Data Reader** role is needed. See [this doc](https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-data-operations-portal) for reference. - -#### Authentication - -We support the following authentication methods: - -* Shared access signature (SAS) tokens. - You can generate it from the Azure Portal in the settings for a specific container. - You need to provide at least *List* and *Read* permissions when generating the SAS token. - It's a query string in the form of - `sp=rl&st=2025-07-20T09:33:00Z&se=2025-07-19T09:48:53Z&sv=2024-11-04&sr=c&sig=i3FDjsadfklj3%23adsfkk`. - -* Storage account access key. You can find it in the Azure Portal in the settings for a specific storage account. - -* Default credential. When none of the above is provided, it will use the default credential. - - This allows you to connect to Azure services without putting any secrets in the code or flow spec. - It automatically chooses the best authentication method based on your environment: - - * On your local machine: uses your Azure CLI login (`az login`) or environment variables. - - ```sh - az login - # Optional: Set a default subscription if you have more than one - az account set --subscription "" - ``` - * In Azure (VM, App Service, AKS, etc.): uses the resource’s Managed Identity. - * In automated environments: supports Service Principals via environment variables - * `AZURE_CLIENT_ID` - * `AZURE_TENANT_ID` - * `AZURE_CLIENT_SECRET` - -You can refer to [this doc](https://learn.microsoft.com/en-us/azure/developer/python/sdk/authentication/overview) for more details. - -### Spec - -The spec takes the following fields: - -* `account_name` (`str`): the name of the storage account. -* `container_name` (`str`): the name of the container. -* `prefix` (`str`, optional): if provided, only files with path starting with this prefix will be imported. -* `binary` (`bool`, optional): whether reading files as binary (instead of text). -* `included_patterns` (`list[str]`, optional): a list of glob patterns to include files, e.g. `["*.txt", "docs/**/*.md"]`. - If not specified, all files will be included. -* `excluded_patterns` (`list[str]`, optional): a list of glob patterns to exclude files, e.g. `["*.tmp", "**/*.log"]`. - Any file or directory matching these patterns will be excluded even if they match `included_patterns`. - If not specified, no files will be excluded. -* `sas_token` (`cocoindex.TransientAuthEntryReference[str]`, optional): a SAS token for authentication. -* `account_access_key` (`cocoindex.TransientAuthEntryReference[str]`, optional): an account access key for authentication. - - :::info - - `included_patterns` and `excluded_patterns` are using Unix-style glob syntax. See [globset syntax](https://docs.rs/globset/latest/globset/index.html#syntax) for the details. - - ::: - -### Schema - -The output is a [*KTable*](/docs/core/data_types#ktable) with the following sub fields: - -* `filename` (*Str*, key): the filename of the file, including the path, relative to the root directory, e.g. `"dir1/file1.md"`. -* `content` (*Str* if `binary` is `False`, otherwise *Bytes*): the content of the file. - - -## GoogleDrive - -The `GoogleDrive` source imports files from Google Drive. - -### Setup for Google Drive - -To access files in Google Drive, the `GoogleDrive` source will need to authenticate by service accounts. - -1. Register / login in **Google Cloud**. -2. In [**Google Cloud Console**](https://console.cloud.google.com/), search for *Service Accounts*, to enter the *IAM & Admin / Service Accounts* page. - - **Create a new service account**: Click *+ Create Service Account*. Follow the instructions to finish service account creation. - - **Add a key and download the credential**: Under "Actions" for this new service account, click *Manage keys* → *Add key* → *Create new key* → *JSON*. - Download the key file to a safe place. -3. In **Google Cloud Console**, search for *Google Drive API*. Enable this API. -4. In **Google Drive**, share the folders containing files that need to be imported through your source with the service account's email address. - **Viewer permission** is sufficient. - - The email address can be found under the *IAM & Admin / Service Accounts* page (in Step 2), in the format of `{service-account-id}@{gcp-project-id}.iam.gserviceaccount.com`. - - Copy the folder ID. Folder ID can be found from the last part of the folder's URL, e.g. `https://drive.google.com/drive/u/0/folders/{folder-id}` or `https://drive.google.com/drive/folders/{folder-id}?usp=drive_link`. - - -### Spec - -The spec takes the following fields: - -* `service_account_credential_path` (`str`): full path to the service account credential file in JSON format. -* `root_folder_ids` (`list[str]`): a list of Google Drive folder IDs to import files from. -* `binary` (`bool`, optional): whether reading files as binary (instead of text). -* `recent_changes_poll_interval` (`datetime.timedelta`, optional): when set, this source provides a change capture mechanism by polling Google Drive for recent modified files periodically. - - :::info - - Since it only retrieves metadata for recent modified files (up to the previous poll) during polling, - it's typically cheaper than a full refresh by setting the [refresh interval](/docs/core/flow_def#refresh-interval) especially when the folder contains a large number of files. - So you can usually set it with a smaller value compared to the `refresh_interval`. - - On the other hand, this only detects changes for files that still exist. - If the file is deleted (or the current account no longer has access to it), this change will not be detected by this change stream. - - So when a `GoogleDrive` source has `recent_changes_poll_interval` enabled, it's still recommended to set a `refresh_interval`, with a larger value. - So that most changes can be covered by polling recent changes (with low latency, like 10 seconds), and remaining changes (files no longer exist or accessible) will still be covered (with a higher latency, like 5 minutes, and should be larger if you have a huge number of files like 1M). - In reality, configure them based on your requirement: how fresh do you need the target index to be? - - ::: - -### Schema - -The output is a [*KTable*](/docs/core/data_types#ktable) with the following sub fields: - -* `file_id` (*Str*, key): the ID of the file in Google Drive. -* `filename` (*Str*): the filename of the file, without the path, e.g. `"file1.md"` -* `mime_type` (*Str*): the MIME type of the file. -* `content` (*Str* if `binary` is `False`, otherwise *Bytes*): the content of the file. - - -## Postgres - -The `Postgres` source imports rows from a PostgreSQL table. - -### Setup for PostgreSQL - -* Ensure the table exists and has a primary key. Tables without a primary key are not supported. -* Grant the connecting user read permissions on the target table (e.g. `SELECT`). -* Provide a database connection. You can: - * Use CocoIndex's default database connection, or - * Provide an explicit connection via a transient auth entry referencing a `DatabaseConnectionSpec` with a `url`, for example: - - ```python - cocoindex.add_transient_auth_entry( - cocoindex.sources.DatabaseConnectionSpec( - url="postgres://user:password@host:5432/dbname?sslmode=require", - ) - ) - ``` - -### Spec - -The spec takes the following fields: - -* `table_name` (`str`): the PostgreSQL table to read from. -* `database` (`cocoindex.TransientAuthEntryReference[DatabaseConnectionSpec]`, optional): database connection reference. If not provided, the default CocoIndex database is used. -* `included_columns` (`list[str]`, optional): non-primary-key columns to include. If not specified, all non-PK columns are included. -* `ordinal_column` (`str`, optional): to specify a non-primary-key column used for change tracking and ordering, e.g. can be a modified timestamp or a monotonic version number. Supported types are integer-like (`bigint`/`integer`) and timestamps (`timestamp`, `timestamptz`). - `ordinal_column` must not be a primary key column. -* `notification` (`cocoindex.sources.PostgresNotification`, optional): when present, enable change capture based on Postgres LISTEN/NOTIFY. It has the following fields: - * `channel_name` (`str`, optional): the Postgres notification channel to listen on. CocoIndex will automatically create the channel with the given name. If omitted, CocoIndex uses `{flow_name}__{source_name}__cocoindex`. - - :::info - - If `notification` is provided, CocoIndex listens for row changes using Postgres LISTEN/NOTIFY and creates the required database objects on demand when the flow starts listening: - - - Function to create notification message: `{channel_name}_n`. - - Trigger to react to table changes: `{channel_name}_t` on the specified `table_name`. - - Creation is automatic when listening begins. - - Currently CocoIndex doesn't automatically clean up these objects when the flow is dropped (unlike targets) - It's usually OK to leave them as they are, but if you want to clean them up, you can run the following SQL statements to manually drop them: - - ```sql - DROP TRIGGER IF EXISTS {channel_name}_t ON "{table_name}"; - DROP FUNCTION IF EXISTS {channel_name}_n(); - ``` - - ::: - -### Schema - -The output is a [*KTable*](/docs/core/data_types#ktable) with straightforward 1 to 1 mapping from Postgres table columns to CocoIndex table fields: - -* Key fields: All primary key columns in the Postgres table will be included automatically as key fields. -* Value fields: All non-primary-key columns in the Postgres table (included by `included_columns` or all when not specified) appear as value fields. - -### Example - -You can find end-to-end example using Postgres source at: -* [examples/postgres_source](https://github.com/cocoindex-io/cocoindex/tree/main/examples/postgres_source)