Skip to content

Commit 636ac3d

Browse files
committed
doc: End-to-end example
Signed-off-by: Anush008 <[email protected]>
1 parent 379797c commit 636ac3d

File tree

3 files changed

+76
-19
lines changed

3 files changed

+76
-19
lines changed

docs/docs/ops/storages.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,12 @@ Exports data to a [Qdrant](https://qdrant.tech/) collection.
2121

2222
The spec takes the following fields:
2323

24-
* `qdrant_url` (type: `str`, required): The [gRPC URL](https://qdrant.tech/documentation/interfaces/#grpc-interface) of the Qdrant instance. Defaults to `http://localhost:6334/`.
24+
* `grpc_url` (type: `str`, required): The [gRPC URL](https://qdrant.tech/documentation/interfaces/#grpc-interface) of the Qdrant instance. Defaults to `http://localhost:6334/`.
2525

26-
* `collection` (type: `str`, required): The name of the collection to export the data to.
26+
* `collection_name` (type: `str`, required): The name of the collection to export the data to.
27+
28+
The field name for the vector embeddings must match the [vector name](https://qdrant.tech/documentation/concepts/vectors/) used when the collection was created.
29+
30+
If no primary key is set during export, a random UUID is used as the Qdrant point ID.
31+
32+
You can find an end-to-end example [here](https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding).

examples/text_embedding/README.md

Lines changed: 34 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,32 @@
1-
Simple example for cocoindex: build embedding index based on local files.
1+
## Description
22

3-
## Prerequisite
4-
[Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
3+
Example to build a vector index in Qdrant based on local files.
4+
5+
## Pre-requisites
6+
7+
- [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
8+
9+
- Run Qdrant.
10+
11+
```bash
12+
docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant
13+
```
14+
15+
- [Create a collection](https://qdrant.tech/documentation/concepts/vectors/#named-vectors) to export the embeddings to.
16+
17+
```bash
18+
curl -X PUT \
19+
'http://localhost:6333/collections/cocoindex' \
20+
--header 'Content-Type: application/json' \
21+
--data-raw '{
22+
"vectors": {
23+
"text_embedding": {
24+
"size": 384,
25+
"distance": "Cosine"
26+
}
27+
}
28+
}'
29+
```
530

631
## Run
732

@@ -23,19 +48,22 @@ Update index:
2348
python main.py cocoindex update
2449
```
2550

51+
You can now view the data in the Qdrant dashboard at <http://localhost:6333/dashboard>.
52+
2653
Run:
2754

2855
```bash
2956
python main.py
3057
```
3158

32-
## CocoInsight
59+
## CocoInsight
60+
3361
CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: [Watch on YouTube](https://youtu.be/ZnmyoHslBSc?si=pPLXWALztkA710r9).
3462

3563
Run CocoInsight to understand your RAG data pipeline:
3664

37-
```
65+
```bash
3866
python main.py cocoindex server -c https://cocoindex.io
3967
```
4068

41-
Then open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight).
69+
Then open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight).

examples/text_embedding/main.py

Lines changed: 34 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,55 +2,77 @@
22

33
import cocoindex
44

5+
56
def text_to_embedding(text: cocoindex.DataSlice) -> cocoindex.DataSlice:
67
"""
78
Embed the text using a SentenceTransformer model.
89
This is a shared logic between indexing and querying, so extract it as a function.
910
"""
1011
return text.transform(
1112
cocoindex.functions.SentenceTransformerEmbed(
12-
model="sentence-transformers/all-MiniLM-L6-v2"))
13+
model="sentence-transformers/all-MiniLM-L6-v2"
14+
)
15+
)
16+
1317

1418
@cocoindex.flow_def(name="TextEmbedding")
15-
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
19+
def text_embedding_flow(
20+
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
21+
):
1622
"""
1723
Define an example flow that embeds text into a vector database.
1824
"""
1925
data_scope["documents"] = flow_builder.add_source(
20-
cocoindex.sources.LocalFile(path="markdown_files"))
26+
cocoindex.sources.LocalFile(path="markdown_files")
27+
)
2128

2229
doc_embeddings = data_scope.add_collector()
2330

2431
with data_scope["documents"].row() as doc:
2532
doc["chunks"] = doc["content"].transform(
2633
cocoindex.functions.SplitRecursively(),
27-
language="markdown", chunk_size=2000, chunk_overlap=500)
34+
language="markdown",
35+
chunk_size=2000,
36+
chunk_overlap=500,
37+
)
2838

2939
with doc["chunks"].row() as chunk:
3040
chunk["embedding"] = text_to_embedding(chunk["text"])
31-
doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
32-
text=chunk["text"], embedding=chunk["embedding"])
41+
doc_embeddings.collect(
42+
id=cocoindex.GeneratedField.UUID,
43+
filename=doc["filename"],
44+
location=chunk["location"],
45+
text=chunk["text"],
46+
# 'text_embedding' is the name of the vector we've created the Qdrant collection with.
47+
text_embedding=chunk["embedding"],
48+
)
3349

3450
doc_embeddings.export(
3551
"doc_embeddings",
36-
cocoindex.storages.Postgres(),
37-
primary_key_fields=["filename", "location"],
38-
vector_index=[("embedding", cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])
52+
cocoindex.storages.Qdrant(
53+
collection_name="cocoindex", grpc_url="http://localhost:6334/"
54+
),
55+
primary_key_fields=["id"],
56+
setup_by_user=True,
57+
)
58+
3959

4060
query_handler = cocoindex.query.SimpleSemanticsQueryHandler(
4161
name="SemanticsSearch",
4262
flow=text_embedding_flow,
4363
target_name="doc_embeddings",
4464
query_transform_flow=text_to_embedding,
45-
default_similarity_metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)
65+
default_similarity_metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY,
66+
)
67+
4668

4769
@cocoindex.main_fn()
4870
def _run():
4971
# Run queries in a loop to demonstrate the query capabilities.
5072
while True:
5173
try:
5274
query = input("Enter search query (or Enter to quit): ")
53-
if query == '':
75+
if query == "":
5476
break
5577
results, _ = query_handler.search(query, 10)
5678
print("\nSearch results:")
@@ -62,6 +84,7 @@ def _run():
6284
except KeyboardInterrupt:
6385
break
6486

87+
6588
if __name__ == "__main__":
6689
load_dotenv(override=True)
6790
_run()

0 commit comments

Comments
 (0)