Add documentation on how to define fields with vector indexing

kriszyp · kriszyp · commit 31780185d026 · 2025-05-14T06:47:01.000-06:00
diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
@@ -122,6 +122,7 @@
   * [Storage Algorithm](technical-details/reference/storage-algorithm.md)
 * [Release Notes](technical-details/release-notes/README.md)
   * [Harper Tucker (Version 4)](technical-details/release-notes/4.tucker/README.md)
+    * [4.6.0](technical-details/release-notes/4.tucker/4.6.0.md)
     * [4.5.2](technical-details/release-notes/4.tucker/4.5.2.md)
     * [4.5.1](technical-details/release-notes/4.tucker/4.5.1.md)
     * [4.5.0](technical-details/release-notes/4.tucker/4.5.0.md)
diff --git a/docs/developers/applications/defining-schemas.md b/docs/developers/applications/defining-schemas.md
@@ -169,7 +169,35 @@ The `@primaryKey` directive specifies that an attribute is the primary key for a
 
 #### `@indexed`
 
-The `@indexed` directive specifies that an attribute should be indexed. This is necessary if you want to execute queries using this attribute (whether that is through RESTful query parameters, SQL, or NoSQL operations).
+The `@indexed` directive specifies that an attribute should be indexed. When an attribute is indexed, Harper will create secondary index from the data in this field for fast/efficient querying using this field. This is necessary if you want to execute queries using this attribute (whether that is through RESTful query parameters, SQL, or NoSQL operations).
+
+A standard index will index the values in each field, so you can query directly by those values. If the field's value is an array, each of the values in the array will be indexed (you can query by any individual value).
+
+#### Vector Indexing
+
+The `@indexed` directive can also specify a `type`. To use vector indexing, you can specify the `type` as `HNSW` for Hierarchical Navigable Small World indexing. This will create a vector index for the attribute. For example:
+```graphql
+type Product @table {
+	id: Long @primaryKey
+	textEmbeddings: [Float] @indexed(type: "HNSW")
+}
+```
+
+HNSW indexing finds the nearest neighbors to a search vector. To use this, you can query with a `sort` parameter, for example:
+```javascript
+let results = Product.search({
+  sort: { attribute: 'textEmbeddings', target: searchVector },
+  limit: 5 // get the five nearest neighbors
+})
+```
+
+HNSW supports several additional arguments to the `@indexed` directive to adjust the HNSW parameters:
+* `distance` - Define the distance function. This can be set to 'euclidean' or 'cosine' (uses negative of cosine similarity). The default is cosine.
+* `efConstruction` - Maximum number of nodes to keep in the list for finding nearest neighbors. A higher value can yield better recall, and a lower value can have better performance. If `efSearchConstruction` is set, this is only applied to indexing. The default is 100.
+* `M` - The preferred number of connections at each layer in the HNSW graph. A higher number uses more space but can be helpful when the intrinsic dimensionality of the data is higher. A lower number can be more efficient. The default is 16.
+* `optimizeRouting` - This uses a heuristic to avoid graph connections that match existing indirect connections (connections through another node). This can yield more efficient graph traversals for the same M setting. This is a number between 0 and 1 and a higher value will more aggressively omit connections with alternate paths. Setting this to 0 will disable route optimizing and follow the traditional HNSW algorithm for creating connections. The default is 0.5.
+* `mL` - The normalization factor for level generation, by default this is computed from `M`.
+* `efSearchConstruction` - Maximum number of nodes to keep in the list for finding nearest neighbors for searching. The default is 50.
 
 #### `@createdTime`
 
diff --git a/docs/technical-details/release-notes/4.tucker/4.6.0.md b/docs/technical-details/release-notes/4.tucker/4.6.0.md
@@ -0,0 +1,8 @@
+# 4.6.0
+
+#### HarperDB 4.6.0
+
+6/13/2025
+
+### Vector Indexing
+4.6 introduces vector indexing support with the Hierarchical Navigable Small World (HNSW) algorithm. This provides powerful efficient vector-based searching for semantic and AI-based querying functionality. HNSW maintains an optimal balance of recall rate with efficient, high-performance execution.
diff --git a/docs/technical-details/release-notes/README.md b/docs/technical-details/release-notes/README.md
@@ -4,6 +4,8 @@
 
 [Meet Tucker](4.tucker/tucker.md) Our 4th Release Pup
 
+[4.6.0 Tucker](4.tucker/4.6.0.md)
+
 [4.5.2 Tucker](4.tucker/4.5.2.md)
 
 [4.5.1 Tucker](4.tucker/4.5.1.md)