Skip to content

Commit fe59040

Browse files
authored
Add documents to clarify indexable types and vector indexing metrics. (#52)
1 parent d6a4c25 commit fe59040

File tree

2 files changed

+29
-3
lines changed

2 files changed

+29
-3
lines changed

docs/docs/core/data_types.mdx

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,4 +43,30 @@ A struct has a bunch of fields, each with a name and a type.
4343

4444
A table has a collection of rows, each of which is a struct with specified schema.
4545

46-
The first field of a table is always the primary key.
46+
The first field of a table is always the primary key.
47+
48+
## Indexable Types
49+
50+
### Key Types
51+
52+
Currently, the following types are supported as types for key fields:
53+
54+
- `bytes`
55+
- `str`
56+
- `bool`
57+
- `int64`
58+
- `range`
59+
- Struct with all fields being key types
60+
61+
### Vector Type
62+
63+
Users can create vector index on fields with `vector` types.
64+
A vector index also needs to be configured with a similarity metric, and the index is only effective when this metric is used during retrieval.
65+
66+
Following metrics are supported:
67+
68+
| Metric Name | Description | Similarity Order |
69+
|-------------|-------------|------------------|
70+
| `CosineSimilarity` | [Cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) | Larger is more similar |
71+
| `L2Distance` | [L2 distance (a.k.a. Euclidean distance)](https://en.wikipedia.org/wiki/Euclidean_distance) | Smaller is more similar |
72+
| `InnerProduct` | [Inner product](https://en.wikipedia.org/wiki/Inner_product_space) | Larger is more similar |

docs/docs/core/flow_def.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -198,8 +198,8 @@ Export must happen at the top level of a flow, i.e. not within any child scopes
198198

199199
* `name`: the name to identify the export target.
200200
* `target_spec`: the storage spec as the export target.
201-
* `primary_key_fields` (optional): the fields to be used as primary key.
202-
* `vector_index` (optional): the fields to create vector index.
201+
* `primary_key_fields` (optional): the fields to be used as primary key. Types of the fields must be supported as key fields. See [Key Types](data_types#key-types) for more details.
202+
* `vector_index` (optional): the fields to create vector index. Each item is a tuple of a field name and a similarity metric. See [Vector Type](data_types#vector-type) for more details about supported similarity metrics.
203203

204204
<Tabs>
205205
<TabItem value="python" label="Python" default>

0 commit comments

Comments
 (0)