Skip to content

Commit e86a8a1

Browse files
kosabogiMikep86szabosteve
authored
Improves visibility of vector index options and inference configuration (elastic#141653)
* Improve visibility of vector index options and inference configuration * Fixes link * Removes incorrect note * Update docs/reference/elasticsearch/mapping-reference/semantic-text.md Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co> * Adresses suggestions * Syntax fix * Fixes syntax, adds list * Applies suggestions * Update docs/reference/elasticsearch/mapping-reference/dense-vector.md Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> * Update docs/reference/elasticsearch/mapping-reference/semantic-text-setup-configuration.md Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> * Update docs/reference/elasticsearch/mapping-reference/semantic-text.md Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> * Update docs/reference/elasticsearch/mapping-reference/semantic-text.md Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> * Fixes syntax and code example --------- Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co> Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
1 parent 1c17a0f commit e86a8a1

File tree

4 files changed

+181
-28
lines changed

4 files changed

+181
-28
lines changed

docs/reference/elasticsearch/mapping-reference/dense-vector.md

Lines changed: 69 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -29,10 +29,13 @@ PUT my-index
2929
"properties": {
3030
"my_vector": {
3131
"type": "dense_vector",
32-
"dims": 3
32+
"dims": 3,
33+
"index_options": {
34+
"type": "bbq_disk" <1>
35+
}
3336
},
34-
"my_text" : {
35-
"type" : "keyword"
37+
"my_text": {
38+
"type": "keyword"
3639
}
3740
}
3841
}
@@ -50,6 +53,8 @@ PUT my-index/_doc/2
5053
"my_vector" : [-0.5, 10, 10]
5154
}
5255
```
56+
1. (Optional) Controls how vectors are indexed internally for kNN search. In this example, `bbq_disk` enables [disk-based binary quantization](/reference/elasticsearch/mapping-reference/bbq.md#bbq-disk), which can significantly reduce memory usage for large vector datasets. If you don’t specify `index_options`, {{es}} automatically selects a default indexing strategy based on the vector type and dimensions. To learn more about the available index options and how they affect vector quantization, refer to [Automatically quantize vectors for kNN search](#dense-vector-quantization).
57+
5358
:::
5459
:::{tab-item} Base64-encoded string
5560
```console
@@ -338,34 +343,60 @@ This configuration is appropriate when full source fidelity is required, such as
338343

339344
## Automatically quantize vectors for kNN search [dense-vector-quantization]
340345

341-
The `dense_vector` type supports quantization to reduce the memory footprint required when [searching](docs-content://solutions/search/vector/knn.md#approximate-knn) `float` vectors. The three following quantization strategies are supported:
346+
The `dense_vector` field type supports quantization to reduce the memory footprint required when [searching](docs-content://solutions/search/vector/knn.md#approximate-knn) `float` vectors. The supported vector quantization strategies for `dense_vector` kNN indexing are:
347+
- [`int8`](#dense-vector-quantization-int8)
348+
- [`int4`](#dense-vector-quantization-int4)
349+
- [`bbq`](#dense-vector-quantization-bbq), available as:
350+
- [`bbq_hnsw`](/reference/elasticsearch/mapping-reference/bbq.md#bbq-hnsw)
351+
- [`bbq_flat`](/reference/elasticsearch/mapping-reference/bbq.md#bbq-flat)
352+
- [`bbq_disk`](/reference/elasticsearch/mapping-reference/bbq.md#bbq-disk)
342353

343-
* `int8` - Quantizes each dimension of the vector to 1-byte integers. This reduces the memory footprint by 75% (or 4x) at the cost of some accuracy.
344-
* `int4` - Quantizes each dimension of the vector to half-byte integers. This reduces the memory footprint by 87% (or 8x) at the cost of accuracy.
345-
* `bbq` - [Better binary quantization](/reference/elasticsearch/mapping-reference/bbq.md) which reduces each dimension to a single bit precision. This reduces the memory footprint by 96% (or 32x) at a larger cost of accuracy. Generally, oversampling during query time and reranking can help mitigate the accuracy loss.
354+
Here is an example of configuring disk-based binary quantization using `bbq_disk`:
346355

347-
When using a quantized format, you may want to oversample and rescore the results to improve accuracy. See [oversampling and rescoring](docs-content://solutions/search/vector/knn.md#dense-vector-knn-search-rescoring) for more information.
348-
349-
To use a quantized index, you can set your index type to `int8_hnsw`, `int4_hnsw`, or `bbq_hnsw`. When indexing `float` vectors, the current default index type is `bbq_hnsw` for vectors with greater than or equal to 384 dimensions, otherwise it's `int8_hnsw`.
350-
351-
:::{note}
352-
In {{stack}} 9.0, dense vector fields are always indexed as `int8_hnsw`.
353-
:::
356+
```console
357+
PUT my-bbq-disk-index
358+
{
359+
"mappings": {
360+
"properties": {
361+
"my_vector": {
362+
"type": "dense_vector",
363+
"dims": 384,
364+
"index": true,
365+
"index_options": {
366+
"type": "bbq_disk"
367+
}
368+
}
369+
}
370+
}
371+
}
372+
```
354373

355374
Quantized vectors can use [oversampling and rescoring](docs-content://solutions/search/vector/knn.md#dense-vector-knn-search-rescoring) to improve accuracy on approximate kNN search results.
356375

357376
::::{note}
358377
Quantization will continue to keep the raw float vector values on disk for reranking, reindexing, and quantization improvements over the lifetime of the data. This means disk usage will increase by ~25% for `int8`, ~12.5% for `int4`, and ~3.1% for `bbq` due to the overhead of storing the quantized and raw vectors.
359378
::::
360379

361-
::::{note}
362-
`int4` quantization requires an even number of vector dimensions.
363-
::::
380+
### Default quantization types
381+
382+
::::{applies-switch}
383+
384+
:::{applies-item} stack: ga 9.0
385+
When indexing `float` vectors, the default index type is `int8_hnsw`.
386+
:::
387+
388+
:::{applies-item} stack: ga 9.1+
389+
When indexing `float` vectors, the default index type is:
390+
- `bbq_hnsw` for vectors with greater than or equal to 384 dimensions
391+
- `int8_hnsw` for vectors with less than 384 dimensions
392+
:::
364393

365-
::::{note}
366-
`bbq` quantization only supports vector dimensions that are greater than 64.
367394
::::
368395

396+
### int8 [dense-vector-quantization-int8]
397+
398+
Quantizes each dimension of the vector to 1-byte integers. This reduces the memory footprint by 75% (or 4x) at the cost of some accuracy.
399+
369400
Here is an example of how to create a byte-quantized index:
370401

371402
```console
@@ -386,6 +417,10 @@ PUT my-byte-quantized-index
386417
}
387418
```
388419

420+
### int4 [dense-vector-quantization-int4]
421+
422+
Quantizes each dimension of the vector to half-byte integers. This reduces the memory footprint by 87% (or 8x) at the cost of accuracy.
423+
389424
Here is an example of how to create a half-byte-quantized index:
390425

391426
```console
@@ -406,6 +441,17 @@ PUT my-byte-quantized-index
406441
}
407442
```
408443

444+
::::{note}
445+
`int4` quantization requires an even number of vector dimensions.
446+
::::
447+
448+
### bbq [dense-vector-quantization-bbq]
449+
450+
`bbq` or [Better binary quantization](/reference/elasticsearch/mapping-reference/bbq.md) reduces each dimension to a single bit precision. This reduces the memory footprint by 96% (or 32x) at a larger cost of accuracy. Generally, [oversampling](/reference/elasticsearch/mapping-reference/bbq.md#bbq-oversampling) during query time and reranking can help mitigate the accuracy loss. You can choose one of the following BBQ index types:
451+
* [`bbq_hnsw`](/reference/elasticsearch/mapping-reference/bbq.md#bbq-hnsw)
452+
* [`bbq_flat`](/reference/elasticsearch/mapping-reference/bbq.md#bbq-flat)
453+
* [`bbq_disk`](/reference/elasticsearch/mapping-reference/bbq.md#bbq-disk)
454+
409455
Here is an example of how to create a binary quantized index:
410456

411457
```console
@@ -426,6 +472,10 @@ PUT my-byte-quantized-index
426472
}
427473
```
428474

475+
::::{note}
476+
`bbq` quantization only supports vector dimensions that are greater than 64.
477+
::::
478+
429479
## Parameters for dense vector fields [dense-vector-params]
430480

431481
The following mapping parameters are accepted:

docs/reference/elasticsearch/mapping-reference/semantic-text-setup-configuration.md

Lines changed: 71 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,4 +214,74 @@ PUT my-index-000003
214214
}
215215
}
216216
```
217-
% TEST[skip:Requires {{infer}} endpoint]
217+
218+
## Set `index_options` for `sparse_vectors` [index-options-sparse_vectors]
219+
220+
```{applies_to}
221+
stack: ga 9.2
222+
```
223+
224+
Configuring `index_options` for [sparse vector fields](/reference/elasticsearch/mapping-reference/sparse-vector.md) lets you configure [token pruning](/reference/elasticsearch/mapping-reference/sparse-vector.md#token-pruning), which controls whether non-significant or overly frequent tokens are omitted to improve query performance.
225+
226+
The following example enables token pruning and sets pruning thresholds for a `sparse_vector` field:
227+
228+
229+
```console
230+
PUT semantic-embeddings
231+
{
232+
"mappings": {
233+
"properties": {
234+
"content": {
235+
"type": "semantic_text",
236+
"index_options": {
237+
"sparse_vector": {
238+
"prune": true, <1>
239+
"pruning_config": {
240+
"tokens_freq_ratio_threshold": 10, <2>
241+
"tokens_weight_threshold": 0.5 <3>
242+
}
243+
}
244+
}
245+
}
246+
}
247+
}
248+
}
249+
```
250+
1. (Optional) Enables pruning. Default is `true`.
251+
2. (Optional) Prunes tokens whose frequency is more than 10 times the average token frequency in the field. Default is `5`.
252+
3. (Optional) Prunes tokens whose weight is lower than 0.5. Default is `0.4`.
253+
254+
Learn more about [sparse_vector index options](/reference/elasticsearch/mapping-reference/sparse-vector.md#sparse-vector-index-options) settings and [token pruning](/reference/elasticsearch/mapping-reference/sparse-vector.md#token-pruning).
255+
256+
## Set `index_options` for `dense_vectors` [index-options-dense_vectors]
257+
258+
Configuring `index_options` for [dense vector fields](/reference/elasticsearch/mapping-reference/dense-vector.md) lets you control how dense vectors are indexed for kNN search. You can select the indexing algorithm, such as `int8_hnsw`, `int4_hnsw`, or `disk_bbq`, among [other available index options](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options).
259+
260+
The following example shows how to configure `index_options` for a dense vector field using the `int8_hnsw` indexing algorithm:
261+
262+
263+
```console
264+
PUT semantic-embeddings
265+
{
266+
"mappings": {
267+
"properties": {
268+
"content": {
269+
"type": "semantic_text",
270+
"index_options": {
271+
"dense_vector": {
272+
"type": "int8_hnsw", <1>
273+
"m": 15, <2>
274+
"ef_construction": 90, <3>
275+
"confidence_interval": 0.95 <4>
276+
}
277+
}
278+
}
279+
}
280+
}
281+
}
282+
```
283+
1. (Optional) Selects the `int8_hnsw` vector quantization strategy. Learn about [default quantization types](/reference/elasticsearch/mapping-reference/dense-vector.md#default-quantization-types).
284+
2. (Optional) Sets `m` to 15 to control how many neighbors each node connects to in the HNSW graph. Default is `16`.
285+
3. (Optional) Sets `ef_construction` to 90 to control how many candidate neighbors are considered during graph construction. Default is `100`.
286+
4. (Optional) Sets `confidence_interval` to 0.95 to limit the value range used during quantization and balance accuracy with memory efficiency.
287+

docs/reference/elasticsearch/mapping-reference/semantic-text.md

Lines changed: 39 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,35 +9,66 @@ applies_to:
99

1010
# Semantic text field type [semantic-text]
1111

12+
:::::{warning}
13+
The `semantic_text` field mapping can be added regardless of license state. However, it typically calls the [{{infer-cap}} API](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference), which requires an [appropriate license](https://www.elastic.co/subscriptions). In these cases, using `semantic_text` in a cluster without the appropriate license causes operations such as indexing and reindexing to fail.
14+
:::::
15+
1216
The `semantic_text` field type simplifies [semantic search](docs-content://solutions/search/semantic-search.md) by providing sensible defaults that automate most of the manual work typically required for vector search. Using `semantic_text`, you don't have to manually configure mappings, set up ingestion pipelines, or handle chunking. The field type automatically:
1317

1418
- Configures index mappings: Chooses the correct field type (`sparse_vector` or `dense_vector`), dimensions, similarity functions, and storage optimizations based on the {{infer}} endpoint.
1519
- Generates embeddings during indexing: Automatically generates embeddings when you index documents, without requiring ingestion pipelines or {{infer}} processors.
1620
- Handles chunking: Automatically chunks long text documents during indexing.
1721

18-
:::::{warning}
19-
The `semantic_text` field mapping can be added regardless of license state. However, it typically calls the [{{infer-cap}} API](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference), which requires an [appropriate license](https://www.elastic.co/subscriptions). In these cases, using `semantic_text` in a cluster without the appropriate license causes operations such as indexing and reindexing to fail.
20-
:::::
21-
2222
## Basic `semantic_text` mapping example
2323

24-
The following example creates an index mapping with a `semantic_text` field:
24+
The following example creates an index mapping with a `semantic_text` field, using default values:
2525

2626
```console
2727
PUT semantic-embeddings
2828
{
2929
"mappings": {
3030
"properties": {
3131
"content": {
32-
"type": "semantic_text" <1>
32+
"type": "semantic_text"
33+
}
34+
}
35+
}
36+
}
37+
```
38+
39+
## Extended `semantic_text` mapping example
40+
41+
The following example creates an index mapping with a `semantic_text` field that uses dense vectors:
42+
43+
```console
44+
PUT semantic-embeddings
45+
{
46+
"mappings": {
47+
"properties": {
48+
"content": {
49+
"type": "semantic_text",
50+
"inference_id": "my-inference-endpoint", <1>
51+
"search_inference_id": "my-search-inference-endpoint", <2>
52+
"index_options": { <3>
53+
"dense_vector": {
54+
"type": "bbq_disk"
55+
}
56+
},
57+
"chunking_settings": { <4>
58+
"strategy": "word",
59+
"max_chunk_size": 120,
60+
"overlap": 40
61+
}
3362
}
3463
}
3564
}
3665
}
3766
```
38-
% TEST[skip:Requires {{infer}} endpoint]
3967

40-
1. In this example, the `semantic_text` field uses a [default {{infer}} endpoint](./semantic-text-setup-configuration.md#default-and-preconfigured-endpoints) because the `inference_id` parameter isn't specified.
68+
1. (Optional) Specifies the [{{infer}} endpoint](/reference/elasticsearch/mapping-reference/semantic-text-reference.md#configuring-inference-endpoints) used to generate embeddings at index time. If you don’t specify an `inference_id`, the `semantic_text` field uses a [default {{infer}} endpoint](/reference/elasticsearch/mapping-reference/semantic-text-setup-configuration.md#default-and-preconfigured-endpoints).
69+
2. (Optional) The {{infer}} endpoint used to generate embeddings at query time. If not specified, the endpoint defined by `inference_id` is used at both index and query time.
70+
3. (Optional) Configures how the underlying vector representation is indexed. In this example, [`bbq_disk`](/reference/elasticsearch/mapping-reference/bbq.md#bbq-disk) is selected for dense vectors. You can configure different index options depending on whether the field uses dense or sparse vectors. Learn how to [set `index_options` for `sparse_vectors`](/reference/elasticsearch/mapping-reference/semantic-text-setup-configuration.md#index-options-sparse_vectors) and how to [set `index_options` for `dense_vectors`](/reference/elasticsearch/mapping-reference/semantic-text-setup-configuration.md#index-options-dense_vectors).
71+
4. (Optional) Overrides the [chunking settings](/reference/elasticsearch/mapping-reference/semantic-text-reference.md#chunking-behavior) from the {{infer}} endpoint. In this example, the `word` strategy splits text on individual words with a maximum of 120 words per chunk and an overlap of 40 words between chunks. The default chunking strategy is `sentence`.
4172

4273
:::{tip}
4374
For a complete example, refer to the [Semantic search with `semantic_text`](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md) tutorial.

docs/reference/elasticsearch/mapping-reference/sparse-vector.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@ See [semantic search with ELSER](docs-content://solutions/search/semantic-search
6262

6363
The following parameters are accepted by `sparse_vector` fields:
6464

65+
$$$sparse-vector-index-options$$$
66+
6567
index_options {applies_to}`stack: ga 9.1`
6668
: (Optional, object) You can set index options for your `sparse_vector` field to determine if you should prune tokens, and the parameter configurations for the token pruning. If pruning options are not set in your [`sparse_vector` query](/reference/query-languages/query-dsl/query-dsl-sparse-vector-query.md), Elasticsearch will use the default options configured for the field, if any.
6769

0 commit comments

Comments
 (0)