@@ -16,13 +16,13 @@ You can calculate vector embeddings using [ArangoDB's GraphML](../../data-scienc
1616capabilities (available in ArangoGraph) or using external tools.
1717
1818{{< warning >}}
19- The vector index is an experimental feature that you need to enable for the
20- ArangoDB server with the ` --experimental- vector-index ` startup option.
19+ You need to enable the vector index feature for the
20+ ArangoDB server with the ` --vector-index ` startup option.
2121Once enabled for a deployment, it cannot be disabled anymore because it
2222permanently changes how the data is managed by the RocksDB storage engine
2323(it adds an additional column family).
2424
25- To restore a dump that contains vector indexes, the ` --experimental- vector-index `
25+ To restore a dump that contains vector indexes, the ` --vector-index `
2626startup option needs to be enabled on the deployment you want to restore to.
2727{{< /warning >}}
2828
@@ -56,21 +56,37 @@ be found depends on the data as well as the search effort (see the `nProbe` opti
5656{{< info >}}
5757- If there is more than one suitable vector index over the same attribute, it is
5858 undefined which one is selected.
59- - You cannot have any ` FILTER ` operation between ` FOR ` and ` LIMIT ` for
60- pre-filtering.
59+
60+ - In v3.12.4 and v3.12.5, you cannot have any ` FILTER ` operation between ` FOR `
61+ and ` LIMIT ` for pre-filtering. From v3.12.6 onward, you can add ` FILTER `
62+ operations between ` FOR ` and ` SORT ` that are then applied during the lookup in
63+ the vector index. Example:
64+
65+ ``` aql
66+ FOR doc IN coll
67+ FILTER doc.val > 3
68+ SORT APPROX_NEAR_COSINE(doc.vector, @q) DESC
69+ LIMIT 5
70+ RETURN doc
71+ ```
72+
73+ Note that e.g. ` LIMIT 5 ` does not ensure that you get 5 results by searching
74+ as many neighboring Voronoi cells as necessary, but it rather considers only as
75+ many as configured via the ` nProbes ` parameter.
6176{{< /info >}}
6277
6378### APPROX_NEAR_COSINE()
6479
6580` APPROX_NEAR_COSINE(vector1, vector2, options) → similarity `
6681
67- Retrieve the approximate angular similarity using the cosine metric, accelerated
68- by a matching vector index.
6982
70- The higher the cosine similarity value is, the more similar the two vectors
71- are. The closer it is to 0, the more different they are. The value can also
72- be negative, indicating that the vectors are not similar and point in opposite
73- directions. You need to sort in descending order so that the most similar
83+ Retrieve the approximate cosine of the angle between two vectors, accelerated
84+ by a matching vector index with the ` cosine ` metric.
85+
86+ The closer the similarity value is to 1, the more similar the two vectors
87+ are. The closer it is to 0, the more different they are. The value can also be
88+ negative up to -1, indicating that the vectors are not similar and point in opposite
89+ directions. You need to ** sort in descending order** so that the most similar
7490documents come first, which is what a vector index using the ` cosine ` metric
7591can provide.
7692
@@ -83,8 +99,8 @@ can provide.
8399 closest Voronoi cells to consider for the search results. The larger the number,
84100 the slower the search but the better the search results. If not specified, the
85101 ` defaultNProbe ` value of the vector index is used.
86- - returns ** similarity** (number): The approximate angular similarity between
87- both vectors.
102+ - returns ** similarity** (number): The approximate cosine similarity of
103+ both normalized vectors. The value range is ` [-1, 1] ` .
88104
89105** Examples**
90106
@@ -126,15 +142,83 @@ FOR docOuter IN coll
126142 RETURN { key: docOuter._key, neighbors }
127143```
128144
145+ ### APPROX_NEAR_INNER_PRODUCT()
146+
147+ <small >Introduced in: v3.12.6</small >
148+
149+ ` APPROX_NEAR_INNER_PRODUCT(vector1, vector2, options) → similarity `
150+
151+ Retrieve the approximate dot product of two vectors, accelerated by a matching
152+ vector index with the ` innerProduct ` metric.
153+
154+ The higher the similarity value is, the more similar the two vectors
155+ are. The closer it is to 0, the more different they are. The value can also
156+ be negative, indicating that the vectors are not similar and point in opposite
157+ directions. You need to ** sort in descending order** so that the most similar
158+ documents come first, which is what a vector index using the ` innerProduct `
159+ metric can provide.
160+
161+ - ** vector1** (array of numbers): The first vector. Either this parameter or
162+ ` vector2 ` needs to reference a stored attribute holding the vector embedding.
163+ - ** vector2** (array of numbers): The second vector. Either this parameter or
164+ ` vector1 ` needs to reference a stored attribute holding the vector embedding.
165+ - ** options** (object, _ optional_ ):
166+ - ** nProbe** (number, _ optional_ ): How many neighboring centroids respectively
167+ closest Voronoi cells to consider for the search results. The larger the number,
168+ the slower the search but the better the search results. If not specified, the
169+ ` defaultNProbe ` value of the vector index is used.
170+ - returns ** similarity** (number): The approximate dot product
171+ of both vectors without normalization. The value range is unbounded.
172+
173+ ** Examples**
174+
175+ Return up to ` 10 ` similar documents based on their closeness to the vector
176+ ` @q ` according to the inner product metric:
177+
178+ ``` aql
179+ FOR doc IN coll
180+ SORT APPROX_NEAR_INNER_PRODUCT(doc.vector, @q) DESC
181+ LIMIT 10
182+ RETURN doc
183+ ```
184+
185+ Return up to ` 5 ` similar documents as well as the similarity value,
186+ considering ` 20 ` neighboring centroids respectively closest Voronoi cells:
187+
188+ ``` aql
189+ FOR doc IN coll
190+ LET similarity = APPROX_NEAR_INNER_PRODUCT(doc.vector, @q, { nProbe: 20 })
191+ SORT similarity DESC
192+ LIMIT 5
193+ RETURN MERGE( { similarity }, doc)
194+ ```
195+
196+ Return the similarity value and the document keys of up to ` 3 ` similar documents
197+ for multiple input vectors using a subquery. In this example, the input vectors
198+ are taken from ten random documents of the same collection:
199+
200+ ``` aql
201+ FOR docOuter IN coll
202+ LIMIT 10
203+ LET neighbors = (
204+ FOR docInner IN coll
205+ LET similarity = APPROX_NEAR_INNER_PRODUCT(docInner.vector, docOuter.vector)
206+ SORT similarity DESC
207+ LIMIT 3
208+ RETURN { key: docInner._key, similarity }
209+ )
210+ RETURN { key: docOuter._key, neighbors }
211+ ```
212+
129213### APPROX_NEAR_L2()
130214
131- ` APPROX_NEAR_L2(vector1, vector2, options) → similarity `
215+ ` APPROX_NEAR_L2(vector1, vector2, options) → distance `
132216
133217Retrieve the approximate distance using the L2 (Euclidean) metric, accelerated
134- by a matching vector index.
218+ by a matching vector index with the ` l2 ` metric .
135219
136220The closer the distance is to 0, the more similar the two vectors are. The higher
137- the value, the more different the they are. You need to sort in ascending order
221+ the value, the more different the they are. You need to ** sort in ascending order**
138222so that the most similar documents come first, which is what a vector index using
139223the ` l2 ` metric can provide.
140224
@@ -147,7 +231,7 @@ the `l2` metric can provide.
147231 for the search results. The larger the number, the slower the search but the
148232 better the search results. If not specified, the ` defaultNProbe ` value of
149233 the vector index is used.
150- - returns ** similarity ** (number): The approximate L2 (Euclidean) distance between
234+ - returns ** distance ** (number): The approximate L2 (Euclidean) distance between
151235 both vectors.
152236
153237** Examples**
0 commit comments