Skip to content

Commit d14cb0b

Browse files
committed
add clarification for token pruning behaviour
1 parent 41b8c39 commit d14cb0b

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

docs/reference/query-languages/query-dsl/query-dsl-sparse-vector-query.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,15 @@ GET _search
8383
The default values for `tokens_freq_ratio_threshold` and `tokens_weight_threshold` were chosen based on tests using ELSERv2 that provided the most optimal results.
8484
::::
8585

86-
86+
When token pruning is applied, non-significant tokens will be pruned from the query.
87+
Non-significant tokens can be defined as tokens that meet both of the following criteria:
88+
* The token appears much more frequently than most tokens, indicating that it is a very common word and may not benefit the overall search results much.
89+
* The weight/score is so low that the token is likely not very relevant to the original term
90+
91+
Both the token frequency threshold and weight threshold must show the token is non-significant in order for the token to be pruned.
92+
This ensures that:
93+
* The tokens that are kept are frequent enough and have significant scoring.
94+
* Very infrequent tokens that may not have as high of a score are removed.
8795

8896
## Example ELSER query [sparse-vector-query-example]
8997

0 commit comments

Comments
 (0)