Skip to content

Commit f39b78a

Browse files
committed
add changelog and docs for index_options
1 parent e24ab76 commit f39b78a

File tree

2 files changed

+35
-1
lines changed

2 files changed

+35
-1
lines changed

docs/changelog/126739.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 126739
2+
summary: Add pruning index options to sparse vector field
3+
area: Inference
4+
type: enhancement
5+
issues: []

docs/reference/elasticsearch/mapping-reference/sparse-vector.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,14 @@ PUT my-index
1717
"mappings": {
1818
"properties": {
1919
"text.tokens": {
20-
"type": "sparse_vector"
20+
"type": "sparse_vector",
21+
"index_options": {
22+
"prune": true,
23+
"pruning_config": {
24+
"tokens_freq_ratio_threshold": 5,
25+
"tokens_weight_threshold: 0.4
26+
}
27+
}
2128
}
2229
}
2330
}
@@ -36,6 +43,28 @@ The following parameters are accepted by `sparse_vector` fields:
3643
* Exclude the field from [_source](/reference/elasticsearch/rest-apis/retrieve-selected-fields.md#source-filtering).
3744
* Use [synthetic `_source`](/reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source).
3845

46+
[index_options](...)
47+
: (Optional, object) You can set index options for your `sparse_vector` field to determine if you should prune tokens, and the parameter configurations for the token pruning. If the pruning options are not set for your `sparse_query` vector on the field, Elasticsearch will use the defaults if set here for the field. The available options for the index options are:
48+
49+
Parameters for `index_options` are:
50+
51+
`prune`
52+
: (Optional, boolean) [preview] Whether to perform pruning, omitting the non-significant tokens from the query to improve query performance. If `prune` is true but the `pruning_config` is not specified, pruning will occur but default values will be used. Default: false.
53+
54+
`pruning_config`
55+
: (Optional, object) [preview] Optional pruning configuration. If enabled, this will omit non-significant tokens from the query in order to improve query performance. This is only used if `prune` is set to `true`. If `prune` is set to `true` but `pruning_config` is not specified, default values will be used.
56+
57+
Parameters for `pruning_config` include:
58+
59+
`tokens_freq_ratio_threshold`
60+
: (Optional, integer) [preview] Tokens whose frequency is more than `tokens_freq_ratio_threshold` times the average frequency of all tokens in the specified field are considered outliers and pruned. This value must between 1 and 100. Default: `5`.
61+
62+
`tokens_weight_threshold`
63+
: (Optional, float) [preview] Tokens whose weight is less than `tokens_weight_threshold` are considered insignificant and pruned. This value must be between 0 and 1. Default: `0.4`.
64+
65+
::::{note}
66+
The default values for `tokens_freq_ratio_threshold` and `tokens_weight_threshold` were chosen based on tests using ELSERv2 that provided the most optimal results.
67+
::::
3968

4069

4170
## Multi-value sparse vectors [index-multi-value-sparse-vectors]

0 commit comments

Comments
 (0)