You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/reference/how-to/knn-search.asciidoc
+21-9Lines changed: 21 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,10 +16,11 @@ structures. So these same recommendations also help with indexing speed.
16
16
The default <<dense-vector-element-type,`element_type`>> is `float`. But this
17
17
can be automatically quantized during index time through
18
18
<<dense-vector-quantization,`quantization`>>. Quantization will reduce the
19
-
required memory by 4x, but it will also reduce the precision of the vectors and
20
-
increase disk usage for the field (by up to 25%). Increased disk usage is a
19
+
required memory by 4x, 8x, or as much as 32x, but it will also reduce the precision of the vectors and
20
+
increase disk usage for the field (by up to 25%, 12.5%, or 3.125%, respectively). Increased disk usage is a
21
21
result of {es} storing both the quantized and the unquantized vectors.
22
-
For example, when quantizing 40GB of floating point vectors an extra 10GB of data will be stored for the quantized vectors. The total disk usage amounts to 50GB, but the memory usage for fast search will be reduced to 10GB.
22
+
For example, when int8 quantizing 40GB of floating point vectors an extra 10GB of data will be stored for the quantized vectors.
23
+
The total disk usage amounts to 50GB, but the memory usage for fast search will be reduced to 10GB.
23
24
24
25
For `float` vectors with `dim` greater than or equal to `384`, using a
25
26
<<dense-vector-quantization,`quantized`>> index is highly recommended.
@@ -68,12 +69,23 @@ Another option is to use <<synthetic-source,synthetic `_source`>>.
68
69
kNN search. HNSW is a graph-based algorithm which only works efficiently when
69
70
most vector data is held in memory. You should ensure that data nodes have at
70
71
least enough RAM to hold the vector data and index structures. To check the
71
-
size of the vector data, you can use the <<indices-disk-usage>> API. As a
72
-
loose rule of thumb, and assuming the default HNSW options, the bytes used will
73
-
be `num_vectors * 4 * (num_dimensions + 12)`. When using the `byte` <<dense-vector-element-type,`element_type`>>
74
-
the space required will be closer to `num_vectors * (num_dimensions + 12)`. Note that
75
-
the required RAM is for the filesystem cache, which is separate from the Java
76
-
heap.
72
+
size of the vector data, you can use the <<indices-disk-usage>> API.
73
+
74
+
Here are estimates for different element types and quantization levels:
Copy file name to clipboardExpand all lines: docs/reference/inference/inference-apis.asciidoc
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,21 +35,21 @@ Elastic –, then create an {infer} endpoint by the <<put-inference-api>>.
35
35
Now use <<semantic-search-semantic-text, semantic text>> to perform
36
36
<<semantic-search, semantic search>> on your data.
37
37
38
-
[discrete]
39
-
[[default-enpoints]]
40
-
=== Default {infer} endpoints
38
+
//[discrete]
39
+
//[[default-enpoints]]
40
+
//=== Default {infer} endpoints
41
41
42
-
Your {es} deployment contains some preconfigured {infer} endpoints that makes it easier for you to use them when defining `semantic_text` fields or {infer} processors.
43
-
The following list contains the default {infer} endpoints listed by `inference_id`:
42
+
//Your {es} deployment contains some preconfigured {infer} endpoints that makes it easier for you to use them when defining `semantic_text` fields or {infer} processors.
43
+
//The following list contains the default {infer} endpoints listed by `inference_id`:
44
44
45
-
* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts)
46
-
* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts)
45
+
//* `.elser-2-elasticsearch`: uses the {ml-docs}/ml-nlp-elser.html[ELSER] built-in trained model for `sparse_embedding` tasks (recommended for English language texts)
46
+
//* `.multilingual-e5-small-elasticsearch`: uses the {ml-docs}/ml-nlp-e5.html[E5] built-in trained model for `text_embedding` tasks (recommended for non-English language texts)
47
47
48
-
Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
49
-
The API call will automatically download and deploy the model which might take a couple of minutes.
50
-
Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
51
-
For these models, the minimum number of allocations is `0`.
52
-
If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
48
+
//Use the `inference_id` of the endpoint in a <<semantic-text,`semantic_text`>> field definition or when creating an <<inference-processor,{infer} processor>>.
49
+
//The API call will automatically download and deploy the model which might take a couple of minutes.
50
+
//Default {infer} enpoints have {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[adaptive allocations] enabled.
51
+
//For these models, the minimum number of allocations is `0`.
52
+
//If there is no {infer} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
0 commit comments