You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is very common for an LTR model to leverage raw term statistics as features.
86
+
To extract these information, you can use the {ref}/modules-scripting-fields.html#scripting-term-statistics[term statistics feature] provided as part of the <<query-dsl-script-score-query,`script_score`>> query.
Copy file name to clipboardExpand all lines: docs/reference/reranking/learning-to-rank-search-usage.asciidoc
-7Lines changed: 0 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,10 +61,3 @@ When exposing pagination to users, `window_size` should remain constant as each
61
61
====== Negative scores
62
62
63
63
Depending on how your model is trained, it’s possible that the model will return negative scores for documents. While negative scores are not allowed from first-stage retrieval and ranking, it is possible to use them in the LTR rescorer.
Copy file name to clipboardExpand all lines: docs/reference/scripting/fields.asciidoc
+75Lines changed: 75 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -80,6 +80,81 @@ GET my-index-000001/_search
80
80
}
81
81
-------------------------------------
82
82
83
+
[discrete]
84
+
[[scripting-term-statistics]]
85
+
=== Accessing term statistics of a document within a script
86
+
87
+
Scripts used in a <<query-dsl-script-score-query,`script_score`>> query have access to the `_termStats` variable which provides statistical information about the terms in the child query.
88
+
89
+
In the following example, `_termStats` is used within a <<query-dsl-script-score-query,`script_score`>> query to retrieve the average term frequency for the terms `quick`, `brown`, and `fox` in the `text` field:
<1> Child query used to infer the field and the terms considered in term statistics.
123
+
124
+
<2> The script calculates the average document frequency for the terms in the query using `_termStats`.
125
+
126
+
`_termStats` provides access to the following functions for working with term statistics:
127
+
128
+
- `uniqueTermsCount`: Returns the total number of unique terms in the query. This value is the same across all documents.
129
+
- `matchedTermsCount`: Returns the count of query terms that matched within the current document.
130
+
- `docFreq`: Provides document frequency statistics for the terms in the query, indicating how many documents contain each term. This value is consistent across all documents.
131
+
- `totalTermFreq`: Provides the total frequency of terms across all documents, representing how often each term appears in the entire corpus. This value is consistent across all documents.
132
+
- `termFreq`: Returns the frequency of query terms within the current document, showing how often each term appears in that document.
0 commit comments