Skip to content

Commit aa24341

Browse files
authored
Merge branch 'main' into eis-text-embedding-task-type
2 parents cd3e116 + d3049e0 commit aa24341

File tree

113 files changed

+4868
-2151
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

113 files changed

+4868
-2151
lines changed

docs/changelog/129089.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 129089
2+
summary: Update `sparse_vector` field mapping to include default setting for token pruning
3+
area: Mapping
4+
type: enhancement
5+
issues: []

docs/changelog/129413.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 129413
2+
summary: '`SageMaker` Elastic Payload'
3+
area: Machine Learning
4+
type: enhancement
5+
issues: []

docs/changelog/129557.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 129557
2+
summary: Pushdown for LIKE (LIST)
3+
area: ES|QL
4+
type: enhancement
5+
issues: []

docs/reference/elasticsearch/mapping-reference/sparse-vector.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,33 @@ PUT my-index
2424
}
2525
```
2626

27+
## Token pruning
28+
```{applies_to}
29+
stack: preview 9.1
30+
```
31+
32+
With any new indices created, token pruning will be turned on by default with appropriate defaults. You can control this behaviour using the optional `index_options` parameters for the field:
33+
34+
```console
35+
PUT my-index
36+
{
37+
"mappings": {
38+
"properties": {
39+
"text.tokens": {
40+
"type": "sparse_vector",
41+
"index_options": {
42+
"prune": true,
43+
"pruning_config": {
44+
"tokens_freq_ratio_threshold": 5,
45+
"tokens_weight_threshold": 0.4
46+
}
47+
}
48+
}
49+
}
50+
}
51+
}
52+
```
53+
2754
See [semantic search with ELSER](docs-content://solutions/search/semantic-search/semantic-search-elser-ingest-pipelines.md) for a complete example on adding documents to a `sparse_vector` mapped field using ELSER.
2855

2956
## Parameters for `sparse_vector` fields [sparse-vectors-params]
@@ -36,6 +63,38 @@ The following parameters are accepted by `sparse_vector` fields:
3663
* Exclude the field from [_source](/reference/elasticsearch/rest-apis/retrieve-selected-fields.md#source-filtering).
3764
* Use [synthetic `_source`](/reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source).
3865

66+
index_options {applies_to}`stack: preview 9.1`
67+
: (Optional, object) You can set index options for your `sparse_vector` field to determine if you should prune tokens, and the parameter configurations for the token pruning. If pruning options are not set in your [`sparse_vector` query](/reference/query-languages/query-dsl/query-dsl-sparse-vector-query.md), Elasticsearch will use the default options configured for the field, if any.
68+
69+
Parameters for `index_options` are:
70+
71+
`prune` {applies_to}`stack: preview 9.1`
72+
: (Optional, boolean) Whether to perform pruning, omitting the non-significant tokens from the query to improve query performance. If `prune` is true but the `pruning_config` is not specified, pruning will occur but default values will be used. Default: true.
73+
74+
`pruning_config` {applies_to}`stack: preview 9.1`
75+
: (Optional, object) Optional pruning configuration. If enabled, this will omit non-significant tokens from the query in order to improve query performance. This is only used if `prune` is set to `true`. If `prune` is set to `true` but `pruning_config` is not specified, default values will be used. If `prune` is set to false but `pruning_config` is specified, an exception will occur.
76+
77+
Parameters for `pruning_config` include:
78+
79+
`tokens_freq_ratio_threshold` {applies_to}`stack: preview 9.1`
80+
: (Optional, integer) Tokens whose frequency is more than `tokens_freq_ratio_threshold` times the average frequency of all tokens in the specified field are considered outliers and pruned. This value must between 1 and 100. Default: `5`.
81+
82+
`tokens_weight_threshold` {applies_to}`stack: preview 9.1`
83+
: (Optional, float) Tokens whose weight is less than `tokens_weight_threshold` are considered insignificant and pruned. This value must be between 0 and 1. Default: `0.4`.
84+
85+
::::{note}
86+
The default values for `tokens_freq_ratio_threshold` and `tokens_weight_threshold` were chosen based on tests using ELSERv2 that provided the most optimal results.
87+
::::
88+
89+
When token pruning is applied, non-significant tokens will be pruned from the query.
90+
Non-significant tokens can be defined as tokens that meet both of the following criteria:
91+
* The token appears much more frequently than most tokens, indicating that it is a very common word and may not benefit the overall search results much.
92+
* The weight/score is so low that the token is likely not very relevant to the original term
93+
94+
Both the token frequency threshold and weight threshold must show the token is non-significant in order for the token to be pruned.
95+
This ensures that:
96+
* The tokens that are kept are frequent enough and have significant scoring.
97+
* Very infrequent tokens that may not have as high of a score are removed.
3998

4099

41100
## Multi-value sparse vectors [index-multi-value-sparse-vectors]

docs/reference/query-languages/esql/_snippets/functions/layout/kql.md

Lines changed: 1 addition & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/reference/query-languages/esql/_snippets/functions/layout/match.md

Lines changed: 1 addition & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/reference/query-languages/esql/_snippets/functions/layout/match_phrase.md

Lines changed: 1 addition & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/reference/query-languages/esql/_snippets/functions/layout/qstr.md

Lines changed: 1 addition & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
* [preview] [`KQL`](../../functions-operators/search-functions.md#esql-kql)
2-
* [preview] [`MATCH`](../../functions-operators/search-functions.md#esql-match)
3-
* [preview] [`MATCH_PHRASE`](../../functions-operators/search-functions.md#esql-match_phrase)
4-
* [preview] [`QSTR`](../../functions-operators/search-functions.md#esql-qstr)
1+
* [`KQL`](../../functions-operators/search-functions.md#esql-kql)
2+
* [`MATCH`](../../functions-operators/search-functions.md#esql-match)
3+
* [`MATCH_PHRASE`](../../functions-operators/search-functions.md#esql-match_phrase)
4+
* [`QSTR`](../../functions-operators/search-functions.md#esql-qstr)
55
% * [preview] [`TERM`](../../functions-operators/search-functions.md#esql-term)
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
2+
::::{note}
3+
If a field is only in some documents it will be `NULL` in the documents that did not contain it.
4+
::::
5+
6+

0 commit comments

Comments
 (0)