You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/core/custom_function.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ Notes:
33
33
34
34
* The `cocoindex.op.function()` function decorator also takes optional parameters.
35
35
See [Parameters for custom functions](#parameters-for-custom-functions) for details.
36
-
* Types of arugments and the return value must be annotated, so that CocoIndex will have information about data types of the operation's output fields.
36
+
* Types of arguments and the return value must be annotated, so that CocoIndex will have information about data types of the operation's output fields.
37
37
See [Data Types](/docs/core/data_types) for supported types.
| Json ||`cocoindex.typing.Json`| Any type convertible to JSON by `json` package |
36
+
| LocalDatetime | Date and time without timezone |`cocoindex.LocalDateTime`|`datetime.datetime`|
37
+
| OffsetDatetime | Date and time with a timezone offset |`cocoindex.OffsetDateTime`|`datetime.datetime`|
38
+
| Vector[*T*, *Dim*?]|*T* must be basic type. *Dim* is a positive integer and optional. |`cocoindex.Vector[T]` or `cocoindex.Vector[T, Dim]`|`list[T]`|
39
+
| Json ||`cocoindex.Json`| Any data convertible to JSON by `json` package |
40
+
41
+
Values of all data types can be represented by values in Python's native types (as described under the Native Python Type column).
42
+
However, the underlying execution engine and some storage system (like Postgres) has finer distinctions for some types, specifically:
34
43
35
-
For some types, CocoIndex Python SDK provides annotated types with finer granularity than Python's original type, e.g.
36
44
**Float32* and *Float64* for `float`, with different precision.
37
45
**LocalDateTime* and *OffsetDateTime* for `datetime.datetime`, with different timezone awareness.
38
-
**Vector* has dimension information.
46
+
**Vector* has optional dimension information.
47
+
**Range* and *Json* provide a clear tag for the type, to clearly distinguish the type in CocoIndex.
39
48
40
-
When defining [custom functions](/docs/core/custom_function), use the specific types as type annotations for arguments and return values.
41
-
So CocoIndex will have information about the specific type.
49
+
The native Python type is always more permissive and can represent a superset of possible values.
50
+
* Only when you annotate the return type of a custom function, you should use the specific type,
51
+
so that CocoIndex will have information about the precise type to be used in the execution engine and storage system.
52
+
* For all other purposes, e.g. to provide annotation for argument types of a custom function, or used internally in your custom function,
53
+
you can choose whatever to use.
54
+
The native Python type is usually simpler.
42
55
43
56
### Struct Type
44
57
@@ -94,9 +107,7 @@ LTable is a Table type whose row order is preserved. LTable has no key column.
94
107
In Python, a LTable type is represented by `list[R]`, where `R` is a dataclass representing a row.
95
108
For example, you can use `list[Person]` to represent a LTable with 3 columns: `first_name` (Str), `last_name` (Str), `dob` (Date).
96
109
97
-
## Index Types
98
-
99
-
### Key Types
110
+
## Key Types
100
111
101
112
Currently, the following types are key types
102
113
@@ -108,16 +119,3 @@ Currently, the following types are key types
108
119
- Uuid
109
120
- Date
110
121
- Struct with all fields being key types
111
-
112
-
### Vector Type
113
-
114
-
Users can create vector index on fields with `vector` types.
115
-
A vector index also needs to be configured with a similarity metric, and the index is only effective when this metric is used during retrieval.
116
-
117
-
Following metrics are supported:
118
-
119
-
| Metric Name | Description | Similarity Order |
120
-
|-------------|-------------|------------------|
121
-
| CosineSimilarity |[Cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity)| Larger is more similar |
122
-
| L2Distance |[L2 distance (a.k.a. Euclidean distance)](https://en.wikipedia.org/wiki/Euclidean_distance)| Smaller is more similar |
123
-
| InnerProduct |[Inner product](https://en.wikipedia.org/wiki/Inner_product_space)| Larger is more similar |
Copy file name to clipboardExpand all lines: docs/docs/core/flow_def.mdx
+21-4Lines changed: 21 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
1
---
2
2
title: Flow Definition
3
3
description: Define a CocoIndex flow, by specifying source, transformations and storages, and connect input/output data of them.
4
+
toc_max_heading_level: 4
4
5
---
5
6
6
7
importTabsfrom'@theme/Tabs';
@@ -281,16 +282,32 @@ The target storage is managed by CocoIndex, i.e. it'll be created by [CocoIndex
281
282
The `name` for the same storage should remain stable across different runs.
282
283
If it changes, CocoIndex will treat it as an old storage removed and a new one created, and perform setup changes and reindexing accordingly.
283
284
284
-
####Storage Indexes
285
+
## Storage Indexes
285
286
286
287
Many storage supports indexes, to boost efficiency in retrieving data.
287
288
CocoIndex provides a common way to configure indexes for various storages.
288
289
289
-
**Primary key*. `primary_key_fields` (`Sequence[str]`): the fields to be used as primary key. Types of the fields must be supported as key fields. See [Key Types](data_types#key-types) for more details.
290
-
**Vector index*. `vector_indexes` (`Sequence[VectorIndexDef]`): the fields to create vector index. `VectorIndexDef` has the following fields:
290
+
### Primary Key
291
+
292
+
*Primary key* is specified by `primary_key_fields` (`Sequence[str]`).
293
+
Types of the fields must be key types. See [Key Types](data_types#key-types) for more details.
294
+
295
+
### Vector Index
296
+
297
+
*Vector index* is specified by `vector_indexes` (`Sequence[VectorIndexDef]`). `VectorIndexDef` has the following fields:
298
+
291
299
*`field_name`: the field to create vector index.
292
-
*`metric`: the similarity metric to use. See [Vector Type](data_types#vector-type) for more details about supported similarity metrics.
300
+
*`metric`: the similarity metric to use.
301
+
302
+
#### Similarity Metrics
303
+
304
+
Following metrics are supported:
293
305
306
+
| Metric Name | Description | Similarity Order |
307
+
|-------------|-------------|------------------|
308
+
| CosineSimilarity |[Cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity)| Larger is more similar |
309
+
| L2Distance |[L2 distance (a.k.a. Euclidean distance)](https://en.wikipedia.org/wiki/Euclidean_distance)| Smaller is more similar |
310
+
| InnerProduct |[Inner product](https://en.wikipedia.org/wiki/Inner_product_space)| Larger is more similar |
0 commit comments