Skip to content

Commit 933c600

Browse files
authored
Merge pull request #171 from HarperDB/vector-indexing
Add documentation on how to define fields with vector indexing
2 parents 8f998d6 + 7873b35 commit 933c600

File tree

2 files changed

+46
-3
lines changed

2 files changed

+46
-3
lines changed

docs/developers/applications/defining-schemas.md

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,51 @@ The `@primaryKey` directive specifies that an attribute is the primary key for a
171171

172172
#### `@indexed`
173173

174-
The `@indexed` directive specifies that an attribute should be indexed. This is necessary if you want to execute queries using this attribute (whether that is through RESTful query parameters, SQL, or NoSQL operations).
174+
The `@indexed` directive specifies that an attribute should be indexed. When an attribute is indexed, Harper will create secondary index from the data in this field for fast/efficient querying using this field. This is necessary if you want to execute queries using this attribute (whether that is through RESTful query parameters, SQL, or NoSQL operations).
175+
176+
A standard index will index the values in each field, so you can query directly by those values. If the field's value is an array, each of the values in the array will be indexed (you can query by any individual value).
177+
178+
#### Vector Indexing
179+
180+
The `@indexed` directive can also specify a `type`. To use vector indexing, you can specify the `type` as `HNSW` for Hierarchical Navigable Small World indexing. This will create a vector index for the attribute. For example:
181+
```graphql
182+
type Product @table {
183+
id: Long @primaryKey
184+
textEmbeddings: [Float] @indexed(type: "HNSW")
185+
}
186+
```
187+
188+
HNSW indexing finds the nearest neighbors to a search vector. To use this, you can query with a `sort` parameter, for example:
189+
```javascript
190+
let results = Product.search({
191+
sort: { attribute: 'textEmbeddings', target: searchVector },
192+
limit: 5 // get the five nearest neighbors
193+
})
194+
```
195+
This can be used in combination with other conditions as well, for example:
196+
```javascript
197+
let results = Product.search({
198+
conditions: [{ attribute: 'price', comparator: 'lt', value: 50 }],
199+
sort: { attribute: 'textEmbeddings', target: searchVector },
200+
limit: 5 // get the five nearest neighbors
201+
})
202+
```
203+
204+
HNSW supports several additional arguments to the `@indexed` directive to adjust the HNSW parameters:
205+
* `distance` - Define the distance function. This can be set to 'euclidean' or 'cosine' (uses negative of cosine similarity). The default is cosine.
206+
* `efConstruction` - Maximum number of nodes to keep in the list for finding nearest neighbors. A higher value can yield better recall, and a lower value can have better performance. If `efSearchConstruction` is set, this is only applied to indexing. The default is 100.
207+
* `M` - The preferred number of connections at each layer in the HNSW graph. A higher number uses more space but can be helpful when the intrinsic dimensionality of the data is higher. A lower number can be more efficient. The default is 16.
208+
* `optimizeRouting` - This uses a heuristic to avoid graph connections that match existing indirect connections (connections through another node). This can yield more efficient graph traversals for the same M setting. This is a number between 0 and 1 and a higher value will more aggressively omit connections with alternate paths. Setting this to 0 will disable route optimizing and follow the traditional HNSW algorithm for creating connections. The default is 0.5.
209+
* `mL` - The normalization factor for level generation, by default this is computed from `M`.
210+
* `efSearchConstruction` - Maximum number of nodes to keep in the list for finding nearest neighbors for searching. The default is 50.
211+
212+
For exmpale
213+
```graphql
214+
type Product @table {
215+
id: Long @primaryKey
216+
textEmbeddings: [Float] @indexed(type: "HNSW", distance: "euclidean", optimizeRouting: 0, efSearchConstruction: 100)
217+
}
218+
```
175219

176220
#### `@createdTime`
177221

docs/technical-details/release-notes/4.tucker/4.6.0.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
6/13/2025
66

77
### Vector Indexing: Hierarchical Navigable Small World
8-
Harper 4.6 now includes support for vector indexing, which allows for efficient and fast queries on large semantic data sets. Vector indexing is powered by the [Hierarchical Navigable Small World (HNSW) algorithm](https://arxiv.org/abs/1603.09320) and can be used to index any vector-valued property, and is particularly useful for vector text-embedding data. HNSW is a preferred algorithm for vector indexing and searching because it provides an excellent balance of recall and performance.
8+
Harper 4.6 now includes support for vector indexing, which allows for efficient and fast queries on large semantic data sets. Vector indexing is powered by the [Hierarchical Navigable Small World (HNSW) algorithm](https://arxiv.org/abs/1603.09320) and can be used to index any vector-valued property, and is particularly useful for vector text-embedding data. This provides powerful efficient vector-based searching for semantic and AI-based querying functionality. HNSW is a preferred algorithm for vector indexing and searching because it provides an excellent balance of recall and performance.
99

1010
### New Extension API with support for dynamic reloading
1111
4.6 introduces a new extension API with significant ergonomic improvements for creating new extension components that are more robust and dynamic. The new API also provides a mechanism for dynamic reloading of some files and configuration without restarts.
@@ -21,4 +21,3 @@ An important change is that logging to standard out/error will _not_ include the
2121

2222
### Resource API Upgrades
2323
4.6 includes an upgraded form of the Resource API that can be selected with significant improvements in ease of use.
24-

0 commit comments

Comments
 (0)