cocoindex-io
diff --git a/‎docs/docs/contributing/guide.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/docs/contributing/guide.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/docs/contributing/setup_dev_environment.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/docs/contributing/setup_dev_environment.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/docs/examples/examples/academic_papers_index.md‎
Lines changed: 16 additions & 16 deletions b/‎docs/docs/examples/examples/academic_papers_index.md‎
Lines changed: 16 additions & 16 deletions
diff --git a/‎docs/docs/examples/examples/codebase_index.md‎
Lines changed: 10 additions & 10 deletions b/‎docs/docs/examples/examples/codebase_index.md‎
Lines changed: 10 additions & 10 deletions
diff --git a/‎docs/docs/examples/examples/custom_targets.md‎
Lines changed: 12 additions & 10 deletions b/‎docs/docs/examples/examples/custom_targets.md‎
Lines changed: 12 additions & 10 deletions
diff --git a/‎docs/docs/examples/examples/docs_to_knowledge_graph.md‎
Lines changed: 11 additions & 10 deletions b/‎docs/docs/examples/examples/docs_to_knowledge_graph.md‎
Lines changed: 11 additions & 10 deletions
diff --git a/‎docs/docs/examples/examples/image_search.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/docs/examples/examples/image_search.md‎
Lines changed: 1 addition & 1 deletion
@@ -5,7 +5,7 @@ description: How to contribute to CocoIndex
 
 [CocoIndex](https://github.com/cocoindex-io/cocoindex) is an open source project. We are respectful, open and friendly. This guide explains how to get involved and contribute to [CocoIndex](https://github.com/cocoindex-io/cocoindex).
 
-Our [Discord server](https://discord.com/invite/zpA9S2DR7s) is constantly open.
+Our [Discord server](https://discord.com/invite/zpA9S2DR7s) is constantly open. 
 If you are unsure about anything, it is a good place to discuss! We'd love to collaborate and will always be friendly.
 
 ## Good First Issues
 
@@ -44,4 +44,4 @@ Follow the steps below to get CocoIndex built on the latest codebase locally - i
 -   Before running a specific example, set extra environment variables, for exposing extra traces, allowing dev UI, etc.
     ```sh
     . ./.env.lib_debug
-    ```
+    ```
@@ -19,10 +19,10 @@ import { GitHubButton, YouTubeButton } from '../../../src/components/GitHubButto
 
 1. Extract the paper metadata, including file name, title, author information, abstract, and number of pages.
 
-2. Build vector embeddings for the metadata, such as the title and abstract, for semantic search.
+2. Build vector embeddings for the metadata, such as the title and abstract, for semantic search. 
 This enables better metadata-driven semantic search results. For example, you can match text queries against titles and abstracts.
 
-3. Build an index of authors and all the file names associated with each author
+3. Build an index of authors and all the file names associated with each author 
 to answer questions like "Give me all the papers by Jeff Dean."
 
 4. If you want to perform full PDF embedding for the paper, you can extend the flow.
@@ -31,13 +31,13 @@ to answer questions like "Give me all the papers by Jeff Dean."
 
 - [Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres).
   CocoIndex uses PostgreSQL internally for incremental processing.
-- [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai).
+- [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai).  
   Alternatively, we have native support for Gemini, Ollama, LiteLLM. Check out the [guide](https://cocoindex.io/docs/ai/llm#ollama).
   You can choose your favorite LLM provider and work completely on-premises.
 
 ## Define Indexing Flow
 
-To better help you navigate what we will walk through, here is a flow diagram:
+To better help you navigate what we will walk through, here is a flow diagram: 
 
 1. Import a list of papers in PDF.
 2. For each file:
@@ -65,7 +65,7 @@ def paper_metadata_flow(
     )
 ```
 
-`flow_builder.add_source` will create a table with sub fields (`filename`, `content`),
+`flow_builder.add_source` will create a table with sub fields (`filename`, `content`), 
 we can refer to the [documentation](https://cocoindex.io/docs/ops/sources) for more details.
 
 ### Extract and collect metadata
@@ -108,10 +108,10 @@ After this step, you should have the basic info of each paper.
 
 ### Parse basic info
 
-We will convert the first page to Markdown using Marker.
+We will convert the first page to Markdown using Marker. 
 Alternatively, you can easily plug in your favorite PDF parser, such as Docling.
 
-Define a marker converter function and cache it, since its initialization is resource-intensive.
+Define a marker converter function and cache it, since its initialization is resource-intensive. 
 This ensures that the same converter instance is reused for different input files.
 
 ```python
@@ -140,7 +140,7 @@ def pdf_to_markdown(content: bytes) -> str:
 Pass it to your transform
 
 ```python
-with data_scope["documents"].row() as doc:
+with data_scope["documents"].row() as doc:      
     doc["first_page_md"] = doc["basic_info"]["first_page"].transform(
             pdf_to_markdown
         )
@@ -201,7 +201,7 @@ After this step, you should have the metadata of each paper.
 Just collect anything you need :)
 
 #### Collect `author` to `filename` information
-We’ve already extracted author list. Here we want to collect Author → Papers in a separate table to build a look up functionality.
+We’ve already extracted author list. Here we want to collect Author → Papers in a separate table to build a look up functionality. 
 Simply collect by author.
 
 ```python
@@ -230,8 +230,8 @@ doc["title_embedding"] = doc["metadata"]["title"].transform(
 
 #### Abstract
 
-Split abstract into chunks, embed each chunk and collect their embeddings.
-Sometimes the abstract could be very long.
+Split abstract into chunks, embed each chunk and collect their embeddings. 
+Sometimes the abstract could be very long. 
 
 ```python
 doc["abstract_chunks"] = doc["metadata"]["abstract"].transform(
@@ -305,7 +305,7 @@ author_papers.export(
     "author_papers",
     cocoindex.targets.Postgres(),
     primary_key_fields=["author_name", "filename"],
-)
+)    
 metadata_embeddings.export(
     "metadata_embeddings",
     cocoindex.targets.Postgres(),
@@ -325,14 +325,14 @@ We aim to standardize interfaces and make it like assembling building blocks.
 
 ## View in CocoInsight step by step
 
-You can walk through the project step by step in [CocoInsight](https://www.youtube.com/watch?v=MMrpUfUcZPk) to see
+You can walk through the project step by step in [CocoInsight](https://www.youtube.com/watch?v=MMrpUfUcZPk) to see 
 exactly how each field is constructed and what happens behind the scenes.
 
 ## Query the index
 
-You can refer to this section of [Text Embeddings](https://cocoindex.io/blogs/text-embeddings-101#3-query-the-index) about
-how to build query against embeddings.
-For now CocoIndex doesn't provide additional query interface. We can write SQL or rely on the query engine by the target storage.
+You can refer to this section of [Text Embeddings](https://cocoindex.io/blogs/text-embeddings-101#3-query-the-index) about 
+how to build query against embeddings. 
+For now CocoIndex doesn't provide additional query interface. We can write SQL or rely on the query engine by the target storage. 
 
 - Many databases already have optimized query implementations with their own best practices
 - The query space has excellent solutions for querying, reranking, and other search-related functionality.
 
@@ -15,11 +15,11 @@ import { GitHubButton, YouTubeButton } from '../../../src/components/GitHubButto
 <GitHubButton url="https://github.com/cocoindex-io/cocoindex/tree/main/examples/code_embedding"/>
 <YouTubeButton url="https://youtu.be/G3WstvhHO24?si=ndYfM0XRs03_hVPR" />
 
-## Setup
+## Setup 
 
 If you don't have Postgres installed, please follow [installation guide](https://cocoindex.io/docs/getting_started/installation).
 
-## Add the codebase as a source.
+## Add the codebase as a source. 
 
 Ingest files from the CocoIndex codebase root directory.
 
@@ -39,7 +39,7 @@ def code_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoind
 - Include files with the extensions of `.py`, `.rs`, `.toml`, `.md`, `.mdx`
 - Exclude files and directories starting `.`,  `target` in the root and `node_modules` under any directory.
 
-`flow_builder.add_source` will create a table with sub fields (`filename`, `content`).
+`flow_builder.add_source` will create a table with sub fields (`filename`, `content`). 
 See [documentation](https://cocoindex.io/docs/ops/sources) for more details.
 
 
@@ -70,23 +70,23 @@ Here we extract the extension of the filename and store it in the `extension` fi
 
 ### Split the file into chunks
 
-We will chunk the code with Tree-sitter.
-We use the `SplitRecursively` function to split the file into chunks.
+We will chunk the code with Tree-sitter. 
+We use the `SplitRecursively` function to split the file into chunks. 
 It is integrated with Tree-sitter, so you can pass in the language to the `language` parameter.
 To see all supported language names and extensions, see the documentation [here](https://cocoindex.io/docs/ops/functions#splitrecursively). All the major languages are supported, e.g., Python, Rust, JavaScript, TypeScript, Java, C++, etc. If it's unspecified or the specified language is not supported, it will be treated as plain text.
 
 ```python
 with data_scope["files"].row() as file:
     file["chunks"] = file["content"].transform(
           cocoindex.functions.SplitRecursively(),
-          language=file["extension"], chunk_size=1000, chunk_overlap=300)
+          language=file["extension"], chunk_size=1000, chunk_overlap=300) 
 ```
 
 
 ### Embed the chunks
 
-We use `SentenceTransformerEmbed` to embed the chunks.
-You can refer to the documentation [here](https://cocoindex.io/docs/ops/functions#sentencetransformerembed).
+We use `SentenceTransformerEmbed` to embed the chunks. 
+You can refer to the documentation [here](https://cocoindex.io/docs/ops/functions#sentencetransformerembed). 
 
 ```python
 @cocoindex.transform_flow()
@@ -101,7 +101,7 @@ def code_to_embedding(text: cocoindex.DataSlice[str]) -> cocoindex.DataSlice[lis
 
 Then for each chunk, we will embed it using the `code_to_embedding` function. and collect the embeddings to the `code_embeddings` collector.
 
-`@cocoindex.transform_flow()` is needed to share the transformation across indexing and query. We build a vector index and query against it,
+`@cocoindex.transform_flow()` is needed to share the transformation across indexing and query. We build a vector index and query against it, 
 the embedding computation needs to be consistent between indexing and querying. See [documentation](https://cocoindex.io/docs/query#transform-flow) for more details.
 
 
@@ -126,7 +126,7 @@ code_embeddings.export(
     vector_indexes=[cocoindex.VectorIndex("embedding", cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])
 ```
 
-We use Consine Similarity to measure the similarity between the query and the indexed data.
+We use Consine Similarity to measure the similarity between the query and the indexed data. 
 To learn more about Consine Similarity, see [Wiki](https://en.wikipedia.org/wiki/Cosine_similarity).
 
 ## Query the index
 
@@ -19,7 +19,7 @@ Let’s walk through a simple example—exporting `.md` files as `.html` using a
 Check out the full [source code](https://github.com/cocoindex-io/cocoindex/tree/main/examples/custom_output_files).
 
 The overall flow is simple:
-This example focuses on
+This example focuses on 
 - how to configure your custom target
 - the flow effortless picks up the changes in the source, recomputes only what's changed and export to the target
 
@@ -41,7 +41,7 @@ flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
 		refresh_interval=timedelta(seconds=5),
 	)
 ```
-This ingestion creates a table with `filename` and `content` fields.
+This ingestion creates a table with `filename` and `content` fields. 
 
 
 ## Process each file and collect
@@ -91,7 +91,7 @@ class LocalFileTargetConnector:
 
 ```
 
-The `describe()` method returns a human-readable string that describes the target, which is displayed in the CLI logs.
+The `describe()` method returns a human-readable string that describes the target, which is displayed in the CLI logs. 
 For example, it prints:
 
 `Target: Local directory ./data/output`
@@ -103,10 +103,10 @@ def describe(key: str) -> str:
     return f"Local directory {key}"
 ```
 
-`apply_setup_change()` applies setup changes to the backend. The previous and current specs are passed as arguments,
+`apply_setup_change()` applies setup changes to the backend. The previous and current specs are passed as arguments, 
 and the method is expected to update the backend setup to match the current state.
 
-A `None` spec indicates non-existence, so when `previous` is `None`, we need to create it,
+A `None` spec indicates non-existence, so when `previous` is `None`, we need to create it, 
 and when `current` is `None`, we need to delete it.
 
 
@@ -134,8 +134,8 @@ def apply_setup_change(
             os.rmdir(previous.directory)
 ```
 
-The `mutate()` method is called by CocoIndex to apply data changes to the target,
-batching mutations to potentially multiple targets of the same type.
+The `mutate()` method is called by CocoIndex to apply data changes to the target, 
+batching mutations to potentially multiple targets of the same type. 
 This allows the target connector flexibility in implementation (e.g., atomic commits, or processing items with dependencies in a specific order).
 
 Each element in the batch corresponds to a specific target and is represented by a tuple containing:
@@ -150,8 +150,8 @@ class LocalFileTargetValues:
     html: str
 ```
 
-The value type of the `dict` is `LocalFileTargetValues | None`,
-where a non-`None` value means an upsert and `None` value means a delete. Similar to `apply_setup_changes()`,
+The value type of the `dict` is `LocalFileTargetValues | None`, 
+where a non-`None` value means an upsert and `None` value means a delete. Similar to `apply_setup_changes()`, 
 idempotency is expected here.
 
 ```python
@@ -218,5 +218,7 @@ This keeps your knowledge graph continuously synchronized with your document sou
 Sometimes there may be an internal/homegrown tool or API (e.g. within a company) that's not publicly available.
 These can only be connected through custom targets.
 
-### Faster adoption of new export logic
+### Faster adoption of new export logic 
 When a new tool, database, or API joins your stack, simply define a Target Spec and Target Connector — start exporting right away, with no pipeline refactoring required.
+
+
@@ -23,7 +23,7 @@ We will generate two kinds of relationships:
 2. Mentions of entities in a document. E.g., "core/basics.mdx" mentions `CocoIndex` and `Incremental Processing`.
 
 ## Setup
-*   [Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres). CocoIndex uses PostgreSQL internally for incremental processing.
+*   [Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres). CocoIndex uses PostgreSQL internally for incremental processing. 
 *   [Install Neo4j](https://cocoindex.io/docs/ops/storages#Neo4j), a graph database.
 *   [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Alternatively, you can switch to Ollama, which runs LLM models locally - [guide](https://cocoindex.io/docs/ai/llm#ollama).
 
@@ -36,7 +36,7 @@ You can read the official CocoIndex Documentation for Property Graph Targets [he
 
 ### Add documents as source
 
-We will process CocoIndex documentation markdown files (`.md`, `.mdx`) from the `docs/core` directory ([markdown files](https://github.com/cocoindex-io/cocoindex/tree/main/docs/docs/core), [deployed docs](https://cocoindex.io/docs/core/basics)).
+We will process CocoIndex documentation markdown files (`.md`, `.mdx`) from the `docs/core` directory ([markdown files](https://github.com/cocoindex-io/cocoindex/tree/main/docs/docs/core), [deployed docs](https://cocoindex.io/docs/core/basics)). 
 
 ```python
 @cocoindex.flow_def(name="DocsToKG")
@@ -124,7 +124,7 @@ Next, we will use `cocoindex.functions.ExtractByLlm` to extract the relationship
 doc["relationships"] = doc["content"].transform(
     cocoindex.functions.ExtractByLlm(
         llm_spec=cocoindex.LlmSpec(
-            api_type=cocoindex.LlmApiType.OPENAI,
+            api_type=cocoindex.LlmApiType.OPENAI, 
             model="gpt-4o"
         ),
         output_type=list[Relationship],
@@ -170,7 +170,7 @@ with doc["relationships"].row() as relationship:
 
 
 ### Build knowledge graph
-
+ 
 #### Basic concepts
 All nodes for Neo4j need two things:
 1. Label: The type of the node. E.g., `Document`, `Entity`.
@@ -216,10 +216,10 @@ This exports Neo4j nodes with label `Document` from the `document_node` collecto
 
 #### Export `RELATIONSHIP` and `Entity` nodes to Neo4j
 
-We don't have explicit collector for `Entity` nodes.
+We don't have explicit collector for `Entity` nodes. 
 They are part of the `entity_relationship` collector and fields are collected during the relationship extraction.
 
-To export them as Neo4j nodes, we need to first declare `Entity` nodes.
+To export them as Neo4j nodes, we need to first declare `Entity` nodes. 
 
 ```python
 flow_builder.declare(
@@ -268,7 +268,7 @@ In a relationship, there's:
 2.  A relationship connecting the source and target.
 Note that different relationships may share the same source and target nodes.
 
-`NodeFromFields` takes the fields from the `entity_relationship` collector and creates `Entity` nodes.
+`NodeFromFields` takes the fields from the `entity_relationship` collector and creates `Entity` nodes. 
 
 #### Export the `entity_mention` to Neo4j.
 
@@ -314,14 +314,14 @@ It creates relationships by:
     ```sh
     cocoindex update --setup main.py
     ```
-
+    
     You'll see the index updates state in the terminal. For example, you'll see the following output:
 
     ```
     documents: 7 added, 0 removed, 0 updated
     ```
 
-3.  (Optional) I used CocoInsight to troubleshoot the index generation and understand the data lineage of the pipeline.
+3.  (Optional) I used CocoInsight to troubleshoot the index generation and understand the data lineage of the pipeline. 
 It is in free beta now, you can give it a try. Run following command to start CocoInsight:
 
     ```sh
@@ -348,7 +348,8 @@ MATCH p=()-->() RETURN p
 
 
 ## Support us
-We are constantly improving, and more features and examples are coming soon.
+We are constantly improving, and more features and examples are coming soon. 
 If you love this article, please give us a star ⭐ at [GitHub repo](https://github.com/cocoindex-io/cocoindex) to help us grow.
 
 Thanks for reading!
+
@@ -211,4 +211,4 @@ Once connected, CocoIndex continuously watches for changes — new uploads, upda
 ## Support us
 
 We’re constantly adding more examples and improving our runtime.
-If you found this helpful, please ⭐ star [CocoIndex on GitHub](https://github.com/cocoindex-io/cocoindex) and share it with others.
+If you found this helpful, please ⭐ star [CocoIndex on GitHub](https://github.com/cocoindex-io/cocoindex) and share it with others.