Skip to content

Commit 5fa0771

Browse files
authored
docs(query-handler): docs and minor tweaks (#1015)
1 parent 4df3ddb commit 5fa0771

File tree

3 files changed

+124
-4
lines changed

3 files changed

+124
-4
lines changed

docs/docs/query.mdx

Lines changed: 113 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,10 @@ The main functionality of CocoIndex is indexing.
1212
The goal of indexing is to enable efficient querying against your data.
1313
You can use any libraries or frameworks of your choice to perform queries.
1414
At the same time, CocoIndex provides seamless integration between indexing and querying workflows.
15-
For example, you can share transformations between indexing and querying, and easily retrieve table names when using CocoIndex's default naming conventions.
15+
16+
* You can share transformations between indexing and querying.
17+
* You can define query handlers, so that you can easily run queries in tools like CocoInsight.
18+
* You can easily retrieve table names when using CocoIndex's default naming conventions.
1619

1720
## Transform Flow
1821

@@ -76,6 +79,115 @@ print(await text_to_embedding.eval_async("Hello, world!"))
7679
</TabItem>
7780
</Tabs>
7881

82+
## Query Handler
83+
84+
Query handlers let you expose a simple function that takes a query string and returns structured results. They are discoverable by tools like CocoInsight so you can query your indexes without writing extra glue code.
85+
86+
- **What you write**: a plain Python function `def search(query: str) -> cocoindex.QueryOutput`.
87+
- **How you register**: decorate it with `@<your_flow>.query_handler(...)` or call `flow.add_query_handler(...)` directly.
88+
- **What you return**: a `cocoindex.QueryOutput(results=[...], query_info=...)`.
89+
- **Optional metadata**: `QueryHandlerResultFields` tells tools which fields contain the embedding vector and score.
90+
91+
### Minimum Query Handler
92+
93+
A minimum query handler looks like this:
94+
95+
<Tabs>
96+
<TabItem value="python" label="Python">
97+
98+
```python
99+
@my_flow.query_handler(name="run_query") # Name is optional, use the function name by default
100+
def run_query(query: str) -> cocoindex.QueryOutput:
101+
# 1) Perform your query against the input `query`
102+
...
103+
104+
# 2) Return structured results
105+
return cocoindex.QueryOutput(results=[{"filename": "...", "text": "..."}])
106+
```
107+
108+
</TabItem>
109+
</Tabs>
110+
111+
Notes about the decorator:
112+
113+
- The handler can be sync or async.
114+
- The decorator registers the handler as a query handler for the flow. It doesn't change the function signature: you can still call the function directly.
115+
116+
Your function returns a `cocoindex.QueryOutput`, with a `results` field, which is a list of dicts (or dataclass instances) representing query results.
117+
Each element is a query result. All data types convertible to JSON are supported. Embeddings can be `list[float]` or numpy array.
118+
119+
A simple query handler like this will enable CocoInsight to display the query results for you to view easily.
120+
121+
### Query Handler with Additional Information
122+
123+
You can provide additional information by extra fields like this:
124+
125+
<Tabs>
126+
<TabItem value="python" label="Python">
127+
128+
```python
129+
@my_flow.query_handler(
130+
name="run_query", # Name is optional, use the function name by default
131+
result_fields=cocoindex.QueryHandlerResultFields(
132+
embedding=["embedding"], # path to the vector field in each result
133+
score="score", # numeric similarity score (higher is better)
134+
)
135+
)
136+
def run_query(query: str) -> cocoindex.QueryOutput:
137+
# 1) Compute embedding for the input query (often via a transform flow)
138+
query_vector = text_to_embedding.eval(query)
139+
140+
# 2) Run your database/vector store query
141+
...
142+
143+
# 3) Return structured results plus optional query_info
144+
return cocoindex.QueryOutput(
145+
results=[{"text": "...", "embedding": some_vec, "score": 0.92}],
146+
query_info=cocoindex.QueryInfo(
147+
embedding=query_vector,
148+
similarity_metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY,
149+
),
150+
)
151+
```
152+
153+
</TabItem>
154+
</Tabs>
155+
156+
- `result_fields` within `query_handler` specifies field names in the query results returned by the query handler. This provides metadata for tools like CocoInsight to recognize structure of the query results, as specified by the following fields (all optional):
157+
- `embedding` is a list of keys that navigates to the embedding in each result (use multiple in case of multiple embeddings, e.g. using different models).
158+
- `score` should point to a numeric field where larger means more relevant.
159+
160+
- `QueryOutput.query_info` specifies information for the query itself, with the following fields (all optional):
161+
- `embedding` is the embedding of the query.
162+
- `similarity_metric` is the similarity metric used to query the index.
163+
164+
165+
### Directly Register without Decorator
166+
167+
The above example can be written without decorator like this:
168+
169+
```python
170+
def my_search(query: str) -> cocoindex.QueryOutput:
171+
...
172+
173+
my_flow.add_query_handler(
174+
name="run_query",
175+
handler=my_search,
176+
result_fields=cocoindex.QueryHandlerResultFields(embedding=["embedding"], score="score"),
177+
)
178+
```
179+
180+
Sometimes this provides more flexibility.
181+
182+
### Examples
183+
184+
You can see our following examples:
185+
186+
- [Text Embedding (PostgreSQL)](https://github.com/cocoindex-io/cocoindex/blob/main/examples/text_embedding/main.py)
187+
- [Text Embedding (Qdrant)](https://github.com/cocoindex-io/cocoindex/blob/main/examples/text_embedding_qdrant/main.py)
188+
- [Code Embedding](https://github.com/cocoindex-io/cocoindex/blob/main/examples/code_embedding/main.py)
189+
190+
79191
## Get Target Native Names
80192

81193
In your indexing flow, when you export data to a target, you can specify the target name (e.g. a database table name, a collection name, the node label in property graph databases, etc.) explicitly,

python/cocoindex/flow.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -869,6 +869,9 @@ def add_query_handler(
869869
*,
870870
result_fields: QueryHandlerResultFields | None = None,
871871
) -> None:
872+
"""
873+
Add a query handler to the flow.
874+
"""
872875
async_handler = to_async_call(handler)
873876

874877
async def _handler(query: str) -> dict[str, Any]:
@@ -893,9 +896,13 @@ def query_handler(
893896
name: str | None = None,
894897
result_fields: QueryHandlerResultFields | None = None,
895898
) -> Callable[[Callable[[str], Any]], Callable[[str], Any]]:
899+
"""
900+
A decorator to declare a query handler.
901+
"""
902+
896903
def _inner(handler: Callable[[str], Any]) -> Callable[[str], Any]:
897904
self.add_query_handler(
898-
name or handler.__name__, handler, result_fields=result_fields
905+
name or handler.__qualname__, handler, result_fields=result_fields
899906
)
900907
return handler
901908

python/cocoindex/query_handler.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@
88
@dataclasses.dataclass
99
class QueryHandlerResultFields:
1010
"""
11-
Specify field names in the returned query handler.
11+
Specify field names in query results returned by the query handler.
12+
This provides metadata for tools like CocoInsight to recognize structure of the query results.
1213
"""
1314

1415
embedding: list[str] = dataclasses.field(default_factory=list)
@@ -47,4 +48,4 @@ class QueryOutput(Generic[R]):
4748
"""
4849

4950
results: list[R]
50-
query_info: QueryInfo | None = None
51+
query_info: QueryInfo = dataclasses.field(default_factory=QueryInfo)

0 commit comments

Comments
 (0)