Skip to content

Commit 3164c6c

Browse files
authored
Merge branch 'main' into eis-text-embedding-task-type
2 parents b7d10b8 + 870d581 commit 3164c6c

File tree

37 files changed

+1540
-189
lines changed

37 files changed

+1540
-189
lines changed

build-tools-internal/src/main/resources/forbidden/es-all-signatures.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,3 +61,7 @@ org.apache.logging.log4j.message.ParameterizedMessage#<init>(java.lang.String, j
6161

6262
@defaultMessage Use WriteLoadForecaster#getForecastedWriteLoad instead
6363
org.elasticsearch.cluster.metadata.IndexMetadata#getForecastedWriteLoad()
64+
65+
@defaultMessage Use org.elasticsearch.index.codec.vectors.OptimizedScalarQuantizer instead
66+
org.apache.lucene.util.quantization.OptimizedScalarQuantizer#<init>(org.apache.lucene.index.VectorSimilarityFunction, float, int)
67+
org.apache.lucene.util.quantization.OptimizedScalarQuantizer#<init>(org.apache.lucene.index.VectorSimilarityFunction)

docs/reference/elasticsearch/mapping-reference/dense-vector.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ In many cases, a brute-force kNN search is not efficient enough. For this reason
5555

5656
Unmapped array fields of float elements with size between 128 and 4096 are dynamically mapped as `dense_vector` with a default similariy of `cosine`. You can override the default similarity by explicitly mapping the field as `dense_vector` with the desired similarity.
5757

58-
Indexing is enabled by default for dense vector fields and indexed as `int8_hnsw`. When indexing is enabled, you can define the vector similarity to use in kNN search:
58+
Indexing is enabled by default for dense vector fields and indexed as `bbq_hnsw` if dimensions are greater than or equal to 384, otherwise they are indexed as `int8_hnsw`. When indexing is enabled, you can define the vector similarity to use in kNN search:
5959

6060
```console
6161
PUT my-index-2
@@ -105,7 +105,7 @@ The `dense_vector` type supports quantization to reduce the memory footprint req
105105

106106
When using a quantized format, you may want to oversample and rescore the results to improve accuracy. See [oversampling and rescoring](docs-content://solutions/search/vector/knn.md#dense-vector-knn-search-rescoring) for more information.
107107

108-
To use a quantized index, you can set your index type to `int8_hnsw`, `int4_hnsw`, or `bbq_hnsw`. When indexing `float` vectors, the current default index type is `int8_hnsw`.
108+
To use a quantized index, you can set your index type to `int8_hnsw`, `int4_hnsw`, or `bbq_hnsw`. When indexing `float` vectors, the current default index type is `bbq_hnsw` for vectors with greater than or equal to 384 dimensions, otherwise it's `int8_hnsw`.
109109

110110
Quantized vectors can use [oversampling and rescoring](docs-content://solutions/search/vector/knn.md#dense-vector-knn-search-rescoring) to improve accuracy on approximate kNN search results.
111111

@@ -255,9 +255,9 @@ $$$dense-vector-index-options$$$
255255
`type`
256256
: (Required, string) The type of kNN algorithm to use. Can be either any of:
257257
* `hnsw` - This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) for scalable approximate kNN search. This supports all `element_type` values.
258-
* `int8_hnsw` - The default index type for float vectors. This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) in addition to automatically scalar quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint by 4x at the cost of some accuracy. See [Automatically quantize vectors for kNN search](#dense-vector-quantization).
258+
* `int8_hnsw` - The default index type for float vectors with less than 384 dimensions. This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) in addition to automatically scalar quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint by 4x at the cost of some accuracy. See [Automatically quantize vectors for kNN search](#dense-vector-quantization).
259259
* `int4_hnsw` - This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) in addition to automatically scalar quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint by 8x at the cost of some accuracy. See [Automatically quantize vectors for kNN search](#dense-vector-quantization).
260-
* `bbq_hnsw` - This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) in addition to automatically binary quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint by 32x at the cost of accuracy. See [Automatically quantize vectors for kNN search](#dense-vector-quantization).
260+
* `bbq_hnsw` - The default index type for float vectors with greater than or equal to 384 dimensions. This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) in addition to automatically binary quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint by 32x at the cost of accuracy. See [Automatically quantize vectors for kNN search](#dense-vector-quantization).
261261
* `flat` - This utilizes a brute-force search algorithm for exact kNN search. This supports all `element_type` values.
262262
* `int8_flat` - This utilizes a brute-force search algorithm in addition to automatically scalar quantization. Only supports `element_type` of `float`.
263263
* `int4_flat` - This utilizes a brute-force search algorithm in addition to automatically half-byte scalar quantization. Only supports `element_type` of `float`.

0 commit comments

Comments
 (0)