update text embedding qdrant with its native query path (#521)

badmonster0 · web-flow · commit 915b85dc730e · 2025-05-20T11:47:29.000-07:00
diff --git a/examples/text_embedding/README.md b/examples/text_embedding/README.md
@@ -4,20 +4,20 @@
 
 In this example, we will build index flow from text embedding from local markdown files, and query the index.
 
-We appreicate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful.
+We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful.
 
-## Steps:
+## Steps
 🌱 A detailed step by step tutorial can be found here: [Get Started Documentation](https://cocoindex.io/docs/getting_started/quickstart)
 
-### Indexing Flow:
+### Indexing Flow
 <img width="461" alt="Screenshot 2025-05-19 at 5 48 28 PM" src="https://github.com/user-attachments/assets/b6825302-a0c7-4b86-9a2d-52da8286b4bd" />
 
-1. We will ingest from a list of local files.
-2. For each file, perform chunking (Recursive Split) and then embeddings. 
+1. We will ingest a list of local files.
+2. For each file, perform chunking (recursively split) and then embedding. 
 3. We will save the embeddings and the metadata in Postgres with PGVector.
    
-### Query:
-We will match against user-provided text by a SQL query, reusing the embedding operation in the indexing flow.
+### Query
+We will match against user-provided text by a SQL query, and reuse the embedding operation in the indexing flow.
 
 
 ## Prerequisite
diff --git a/examples/text_embedding_qdrant/README.md b/examples/text_embedding_qdrant/README.md
@@ -1,69 +1,87 @@
-## Description
+# Build text embedding and semantic search 🔍 with Qdrant
+
+[![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex)
+
+CocoIndex supports Qdrant natively - [documentation](https://cocoindex.io/docs/ops/storages#qdrant). In this example, we will build index flow from text embedding from local markdown files, and query the index. We will use **Qdrant** as the vector database.
+
+We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful.
+
+<img width="860" alt="CocoIndex supports Qdrant" src="https://github.com/user-attachments/assets/a9deecfa-dd94-4b97-a1b1-90488d8178df" />
+
+## Steps
+### Indexing Flow
+<img width="480" alt="Index flow for text embedding" src="https://github.com/user-attachments/assets/44d47b5e-b49b-4f05-9a00-dcb8027602a1" />
+
+1. We will ingest a list of local files.
+2. For each file, perform chunking (recursively split) and then embedding. 
+3. We will save the embeddings and the metadata in Postgres with PGVector.
+   
+### Query
+We use Qdrant client to query the index, and reuse the embedding operation in the indexing flow.
 
-Example to build a vector index in Qdrant based on local files.
 
 ## Pre-requisites
 
-- [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
+- [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one. Although the target store is Qdrant, CocoIndex uses Postgress to track the data lineage for incremental processing.
 
 - Run Qdrant.
 
-```bash
-docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant
-```
+   ```bash
+   docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant
+   ```
 
 - [Create a collection](https://qdrant.tech/documentation/concepts/vectors/#named-vectors) to export the embeddings to.
 
-```bash
-curl  -X PUT \
-  'http://localhost:6333/collections/cocoindex' \
-  --header 'Content-Type: application/json' \
-  --data-raw '{
-  "vectors": {
-    "text_embedding": {
-      "size": 384,
-      "distance": "Cosine"
-    }
-  }
-}'
-```
-
-You can view the collections and data with the Qdrant dashboard at <http://localhost:6333/dashboard>.
+   ```bash
+   curl  -X PUT \
+     'http://localhost:6333/collections/cocoindex' \
+     --header 'Content-Type: application/json' \
+     --data-raw '{
+     "vectors": {
+       "text_embedding": {
+         "size": 384,
+         "distance": "Cosine"
+       }
+     }
+   }'
+   ```
+
+   You can view the collections and data with the Qdrant dashboard at <http://localhost:6333/dashboard>.
 
 ## Run
 
-Install dependencies:
+- Install dependencies:
 
-```bash
-pip install -e .
-```
+   ```bash
+   pip install -e .
+   ```
 
-Setup:
+- Setup:
 
-```bash
-python main.py cocoindex setup
-```
+   ```bash
+   python main.py cocoindex setup
+   ```
 
-Update index:
+- Update index:
 
-```bash
-python main.py cocoindex update
-```
+   ```bash
+   python main.py cocoindex update
+   ```
 
-Run:
+- Run:
 
-```bash
-python main.py
-```
+   ```bash
+   python main.py
+   ```
 
 ## CocoInsight
-
-CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: [Watch on YouTube](https://youtu.be/ZnmyoHslBSc?si=pPLXWALztkA710r9).
-
-Run CocoInsight to understand your RAG data pipeline:
+I used CocoInsight (Free beta now) to troubleshoot the index generation and understand the data lineage of the pipeline. 
+It just connects to your local CocoIndex server, with Zero pipeline data retention. Run following command to start CocoInsight:
 
 ```bash
 python main.py cocoindex server -ci
 ```
 
-Then open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight).
+Open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight).
+
+
diff --git a/examples/text_embedding_qdrant/main.py b/examples/text_embedding_qdrant/main.py
@@ -1,21 +1,26 @@
 from dotenv import load_dotenv
+from qdrant_client import QdrantClient
+from qdrant_client.http.models import Filter, FieldCondition, MatchValue
 
 import cocoindex
 
+# Define Qdrant connection constants
+QDRANT_GRPC_URL = "http://localhost:6334"
+QDRANT_COLLECTION = "cocoindex"
 
-def text_to_embedding(text: cocoindex.DataSlice) -> cocoindex.DataSlice:
+
+@cocoindex.transform_flow()
+def text_to_embedding(text: cocoindex.DataSlice[str]) -> cocoindex.DataSlice[list[float]]:
     """
     Embed the text using a SentenceTransformer model.
     This is a shared logic between indexing and querying, so extract it as a function.
     """
     return text.transform(
         cocoindex.functions.SentenceTransformerEmbed(
-            model="sentence-transformers/all-MiniLM-L6-v2"
-        )
-    )
+            model="sentence-transformers/all-MiniLM-L6-v2"))
 
 
-@cocoindex.flow_def(name="TextEmbedding")
+@cocoindex.flow_def(name="TextEmbeddingWithQdrant")
 def text_embedding_flow(
     flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
 ):
@@ -50,35 +55,39 @@ def text_embedding_flow(
     doc_embeddings.export(
         "doc_embeddings",
         cocoindex.storages.Qdrant(
-            collection_name="cocoindex", grpc_url="http://localhost:6334/"
+            collection_name=QDRANT_COLLECTION, grpc_url=QDRANT_GRPC_URL
         ),
         primary_key_fields=["id"],
         setup_by_user=True,
     )
 
 
-query_handler = cocoindex.query.SimpleSemanticsQueryHandler(
-    name="SemanticsSearch",
-    flow=text_embedding_flow,
-    target_name="doc_embeddings",
-    query_transform_flow=text_to_embedding,
-    default_similarity_metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY,
-)
-
-
 @cocoindex.main_fn()
 def _run():
+    # Initialize Qdrant client
+    client = QdrantClient(url=QDRANT_GRPC_URL, prefer_grpc=True)
+    
     # Run queries in a loop to demonstrate the query capabilities.
     while True:
         try:
             query = input("Enter search query (or Enter to quit): ")
             if query == "":
                 break
-            results, _ = query_handler.search(query, 10, "text_embedding")
+            
+            # Get the embedding for the query
+            query_embedding = text_to_embedding.eval(query)
+            
+            search_results = client.search(
+                collection_name=QDRANT_COLLECTION,
+                query_vector=("text_embedding", query_embedding),
+                limit=10
+            )
             print("\nSearch results:")
-            for result in results:
-                print(f"[{result.score:.3f}] {result.data['filename']}")
-                print(f"    {result.data['text']}")
+            for result in search_results:
+                score = result.score
+                payload = result.payload
+                print(f"[{score:.3f}] {payload['filename']}")
+                print(f"    {payload['text']}")
                 print("---")
             print()
         except KeyboardInterrupt:
diff --git a/examples/text_embedding_qdrant/pyproject.toml b/examples/text_embedding_qdrant/pyproject.toml
@@ -3,7 +3,7 @@ name = "text-embedding-qdrant"
 version = "0.1.0"
 description = "Simple example for cocoindex: build embedding index based on local text files."
 requires-python = ">=3.10"
-dependencies = ["cocoindex>=0.1.39", "python-dotenv>=1.0.1"]
+dependencies = ["cocoindex>=0.1.39", "python-dotenv>=1.0.1", "qdrant-client>=1.6.0"]
 
 [tool.setuptools]
 packages = []