Skip to content

Commit e5ada33

Browse files
authored
feat: Qdrant storage support (#205)
* feat: Qdrant storage Signed-off-by: Anush008 <[email protected]> * feat: Qdrant storage support Signed-off-by: Anush008 <[email protected]> * chore: review updates Signed-off-by: Anush008 <[email protected]> * chore: New API check_state_compatibility Signed-off-by: Anush008 <[email protected]> * refactor: Simplify payload conversion Signed-off-by: Anush008 <[email protected]> * refactor: No ResourceSetupStatusCheck Signed-off-by: Anush008 <[email protected]> * refactor: Replaced SetupState with () Signed-off-by: Anush008 <[email protected]> * feat: Parse point ID Signed-off-by: Anush008 <[email protected]> * chore: Support all Value::Basic values Signed-off-by: Anush008 <[email protected]> * chore: Handle all BasicValue types in search(), doc updates Signed-off-by: Anush008 <[email protected]> * doc: End-to-end example Signed-off-by: Anush008 <[email protected]> * feat: Support for api_key Signed-off-by: Anush008 <[email protected]> * fix: no process-level CryptoProvider available -- call CryptoProvider::install_default() before this point Signed-off-by: Anush008 <[email protected]> * chore: Removed key_value_fields_iter() Signed-off-by: Anush008 <[email protected]> * docs: examples/text_embedding_qdrant Signed-off-by: Anush008 <[email protected]> * chore: Undo change to examples/pdf_embedding Signed-off-by: Anush008 <[email protected]> * feat: Optionally delete points Signed-off-by: Anush008 <[email protected]> * chore: parse BasicValueType::Date | BasicValueType::LocalDateTime | BasicValueType::OffsetDateTime | BasicValueType::Time | BasicValueType::Uuid Signed-off-by: Anush008 <[email protected]> * refactor: Don't nest complex types Signed-off-by: Anush008 <[email protected]> --------- Signed-off-by: Anush008 <[email protected]>
1 parent 457c710 commit e5ada33

File tree

13 files changed

+948
-5
lines changed

13 files changed

+948
-5
lines changed

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,7 @@ rustls = { version = "0.23.25" }
9797
http-body-util = "0.1.3"
9898
yaml-rust2 = "0.10.1"
9999
urlencoding = "2.1.3"
100+
qdrant-client = "1.13.0"
100101
uuid = { version = "1.16.0", features = ["serde", "v4", "v8"] }
101102
tokio-stream = "0.1.17"
102103
async-stream = "0.3.6"

docs/docs/ops/storages.md

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,41 @@ description: CocoIndex Built-in Storages
77

88
## Postgres
99

10-
`Postgres` exports data to Postgres database (with pgvector extension).
10+
Exports data to Postgres database (with pgvector extension).
1111

1212
The spec takes the following fields:
1313

1414
* `database_url` (type: `str`, optional): The URL of the Postgres database to use as the internal storage, e.g. `postgres://cocoindex:cocoindex@localhost/cocoindex`. If unspecified, will use the same database as the [internal storage](/docs/core/basics#internal-storage).
1515

1616
* `table_name` (type: `str`, optional): The name of the table to store to. If unspecified, will generate a new automatically. We recommend specifying a name explicitly if you want to directly query the table. It can be omitted if you want to use CocoIndex's query handlers to query the table.
17+
18+
## Qdrant
19+
20+
Exports data to a [Qdrant](https://qdrant.tech/) collection.
21+
22+
The spec takes the following fields:
23+
24+
* `collection_name` (type: `str`, required): The name of the collection to export the data to.
25+
26+
* `grpc_url` (type: `str`, optional): The [gRPC URL](https://qdrant.tech/documentation/interfaces/#grpc-interface) of the Qdrant instance. Defaults to `http://localhost:6334/`.
27+
28+
* `api_key` (type: `str`, optional). API key to authenticate requests with.
29+
30+
Before exporting, you must create a collection with a [vector name](https://qdrant.tech/documentation/concepts/vectors/#named-vectors) that matches the vector field name in CocoIndex, and set `setup_by_user=True` during export.
31+
32+
Example:
33+
34+
```python
35+
doc_embeddings.export(
36+
"doc_embeddings",
37+
cocoindex.storages.Qdrant(
38+
collection_name="cocoindex",
39+
grpc_url="http://xyz-example.cloud-region.cloud-provider.cloud.qdrant.io:6334/",
40+
api_key="<your-api-key-here>",
41+
),
42+
primary_key_fields=["id_field"],
43+
setup_by_user=True,
44+
)
45+
```
46+
47+
You can find an end-to-end example [here](https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding_qdrant).
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Postgres database address for cocoindex
2+
COCOINDEX_DATABASE_URL=postgres://cocoindex:cocoindex@localhost/cocoindex
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
## Description
2+
3+
Example to build a vector index in Qdrant based on local files.
4+
5+
## Pre-requisites
6+
7+
- [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
8+
9+
- Run Qdrant.
10+
11+
```bash
12+
docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant
13+
```
14+
15+
- [Create a collection](https://qdrant.tech/documentation/concepts/vectors/#named-vectors) to export the embeddings to.
16+
17+
```bash
18+
curl -X PUT \
19+
'http://localhost:6333/collections/cocoindex' \
20+
--header 'Content-Type: application/json' \
21+
--data-raw '{
22+
"vectors": {
23+
"text_embedding": {
24+
"size": 384,
25+
"distance": "Cosine"
26+
}
27+
}
28+
}'
29+
```
30+
31+
You can view the collections and data with the Qdrant dashboard at <http://localhost:6333/dashboard>.
32+
33+
## Run
34+
35+
Install dependencies:
36+
37+
```bash
38+
pip install -e .
39+
```
40+
41+
Setup:
42+
43+
```bash
44+
python main.py cocoindex setup
45+
```
46+
47+
Update index:
48+
49+
```bash
50+
python main.py cocoindex update
51+
```
52+
53+
Run:
54+
55+
```bash
56+
python main.py
57+
```
58+
59+
## CocoInsight
60+
61+
CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: [Watch on YouTube](https://youtu.be/ZnmyoHslBSc?si=pPLXWALztkA710r9).
62+
63+
Run CocoInsight to understand your RAG data pipeline:
64+
65+
```bash
66+
python main.py cocoindex server -c https://cocoindex.io
67+
```
68+
69+
Then open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight).
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
from dotenv import load_dotenv
2+
3+
import cocoindex
4+
5+
6+
def text_to_embedding(text: cocoindex.DataSlice) -> cocoindex.DataSlice:
7+
"""
8+
Embed the text using a SentenceTransformer model.
9+
This is a shared logic between indexing and querying, so extract it as a function.
10+
"""
11+
return text.transform(
12+
cocoindex.functions.SentenceTransformerEmbed(
13+
model="sentence-transformers/all-MiniLM-L6-v2"
14+
)
15+
)
16+
17+
18+
@cocoindex.flow_def(name="TextEmbedding")
19+
def text_embedding_flow(
20+
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
21+
):
22+
"""
23+
Define an example flow that embeds text into a vector database.
24+
"""
25+
data_scope["documents"] = flow_builder.add_source(
26+
cocoindex.sources.LocalFile(path="markdown_files")
27+
)
28+
29+
doc_embeddings = data_scope.add_collector()
30+
31+
with data_scope["documents"].row() as doc:
32+
doc["chunks"] = doc["content"].transform(
33+
cocoindex.functions.SplitRecursively(),
34+
language="markdown",
35+
chunk_size=2000,
36+
chunk_overlap=500,
37+
)
38+
39+
with doc["chunks"].row() as chunk:
40+
chunk["embedding"] = text_to_embedding(chunk["text"])
41+
doc_embeddings.collect(
42+
id=cocoindex.GeneratedField.UUID,
43+
filename=doc["filename"],
44+
location=chunk["location"],
45+
text=chunk["text"],
46+
# 'text_embedding' is the name of the vector we've created the Qdrant collection with.
47+
text_embedding=chunk["embedding"],
48+
)
49+
50+
doc_embeddings.export(
51+
"doc_embeddings",
52+
cocoindex.storages.Qdrant(
53+
collection_name="cocoindex", grpc_url="http://localhost:6334/"
54+
),
55+
primary_key_fields=["id"],
56+
setup_by_user=True,
57+
)
58+
59+
60+
query_handler = cocoindex.query.SimpleSemanticsQueryHandler(
61+
name="SemanticsSearch",
62+
flow=text_embedding_flow,
63+
target_name="doc_embeddings",
64+
query_transform_flow=text_to_embedding,
65+
default_similarity_metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY,
66+
)
67+
68+
69+
@cocoindex.main_fn()
70+
def _run():
71+
# Run queries in a loop to demonstrate the query capabilities.
72+
while True:
73+
try:
74+
query = input("Enter search query (or Enter to quit): ")
75+
if query == "":
76+
break
77+
results, _ = query_handler.search(query, 10, "text_embedding")
78+
print("\nSearch results:")
79+
for result in results:
80+
print(f"[{result.score:.3f}] {result.data['filename']}")
81+
print(f" {result.data['text']}")
82+
print("---")
83+
print()
84+
except KeyboardInterrupt:
85+
break
86+
87+
88+
if __name__ == "__main__":
89+
load_dotenv(override=True)
90+
_run()

0 commit comments

Comments
 (0)