Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions docs/reference/query-dsl/script-score-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,17 @@ multiplied by `boost` to produce final documents' scores. Defaults to `1.0`.
===== Use relevance scores in a script

Within a script, you can
{ref}/modules-scripting-fields.html#scripting-score[access]
{ref}/modules-scripting-fields.html#scripting-score[access]
the `_score` variable which represents the current relevance score of a
document.

[[script-score-access-term-statistics]]
===== Use term statistics in a script

Within a script, you can
{ref}/modules-scripting-fields.html#scripting-term-statistics[access]
the `_termStats` variable which provides statistical information about the terms used in the child query of the `script_score` query.

[[script-score-predefined-functions]]
===== Predefined functions
You can use any of the available {painless}/painless-contexts.html[painless
Expand Down Expand Up @@ -147,7 +154,7 @@ updated since update operations also update the value of the `_seq_no` field.

[[decay-functions-numeric-fields]]
====== Decay functions for numeric fields
You can read more about decay functions
You can read more about decay functions
{ref}/query-dsl-function-score-query.html#function-decay[here].

* `double decayNumericLinear(double origin, double scale, double offset, double decay, double docValue)`
Expand Down Expand Up @@ -233,7 +240,7 @@ The `script_score` query calculates the score for
every matching document, or hit. There are faster alternative query types that
can efficiently skip non-competitive hits:

* If you want to boost documents on some static fields, use the
* If you want to boost documents on some static fields, use the
<<query-dsl-rank-feature-query, `rank_feature`>> query.
* If you want to boost documents closer to a date or geographic point, use the
<<query-dsl-distance-feature-query, `distance_feature`>> query.
Expand Down
37 changes: 25 additions & 12 deletions docs/reference/reranking/learning-to-rank-model-training.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,21 @@ Feature extractors are defined using templated queries. https://eland.readthedoc
from eland.ml.ltr import QueryFeatureExtractor

feature_extractors=[
# We want to use the score of the match query for the title field as a feature:
# We want to use the BM25 score of the match query for the title field as a feature:
QueryFeatureExtractor(
feature_name="title_bm25",
query={"match": {"title": "{{query}}"}}
),
# We want to use the the number of matched terms in the title field as a feature:
QueryFeatureExtractor(
feature_name="title_matched_term_count",
query={
"script_score": {
"query": {"match": {"title": "{{query}}"}},
"script": {"source": "return _termStats.matchedTermsCount();"},
}
},
),
# We can use a script_score query to get the value
# of the field rating directly as a feature:
QueryFeatureExtractor(
Expand All @@ -54,26 +64,29 @@ feature_extractors=[
}
},
),
# We can execute a script on the value of the query
# and use the return value as a feature:
QueryFeatureExtractor(
feature_name="query_length",
# We extract the number of terms in the query as feature.
QueryFeatureExtractor(
feature_name="query_term_count",
query={
"script_score": {
"query": {"match_all": {}},
"script": {
"source": "return params['query'].splitOnToken(' ').length;",
"params": {
"query": "{{query}}",
}
},
"query": {"match": {"title": "{{query}}"}},
"script": {"source": "return _termStats.uniqueTermsCount();"},
}
},
),
]
----
// NOTCONSOLE

[NOTE]
.Tern statistics as features
===================================================

It is very common for an LTR model to leverage raw term statistics as features.
To extract this information, you can use the {ref}/modules-scripting-fields.html#scripting-term-statistics[term statistics feature] provided as part of the <<query-dsl-script-score-query,`script_score`>> query.

===================================================

Once the feature extractors have been defined, they are wrapped in an `eland.ml.ltr.LTRModelConfig` object for use in later training steps:

[source,python]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,3 @@ When exposing pagination to users, `window_size` should remain constant as each
====== Negative scores

Depending on how your model is trained, it’s possible that the model will return negative scores for documents. While negative scores are not allowed from first-stage retrieval and ranking, it is possible to use them in the LTR rescorer.

[discrete]
[[learning-to-rank-rescorer-limitations-term-statistics]]
====== Term statistics as features

We do not currently support term statistics as features, however future releases will introduce this capability.

73 changes: 73 additions & 0 deletions docs/reference/scripting/fields.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,79 @@ GET my-index-000001/_search
}
-------------------------------------

[discrete]
[[scripting-term-statistics]]
=== Accessing term statistics of a document within a script

Scripts used in a <<query-dsl-script-score-query,`script_score`>> query have access to the `_termStats` variable which provides statistical information about the terms in the child query.

In the following example, `_termStats` is used within a <<query-dsl-script-score-query,`script_score`>> query to retrieve the average term frequency for the terms `quick`, `brown`, and `fox` in the `text` field:

[source,console]
-------------------------------------
PUT my-index-000001/_doc/1?refresh
{
"text": "quick brown fox"
}

PUT my-index-000001/_doc/2?refresh
{
"text": "quick fox"
}

GET my-index-000001/_search
{
"query": {
"script_score": {
"query": { <1>
"match": {
"text": "quick brown fox"
}
},
"script": {
"source": "_termStats.termFreq().getAverage()" <2>
}
}
}
}
-------------------------------------

<1> Child query used to infer the field and the terms considered in term statistics.

<2> The script calculates the average document frequency for the terms in the query using `_termStats`.

`_termStats` provides access to the following functions for working with term statistics:

- `uniqueTermsCount`: Returns the total number of unique terms in the query. This value is the same across all documents.
- `matchedTermsCount`: Returns the count of query terms that matched within the current document.
- `docFreq`: Provides document frequency statistics for the terms in the query, indicating how many documents contain each term. This value is consistent across all documents.
- `totalTermFreq`: Provides the total frequency of terms across all documents, representing how often each term appears in the entire corpus. This value is consistent across all documents.
- `termFreq`: Returns the frequency of query terms within the current document, showing how often each term appears in that document.

[NOTE]
.Functions returning aggregated statistics
===================================================

The `docFreq`, `termFreq` and `totalTermFreq` functions return objects that represent statistics across all terms of the child query.

Statistics provides support for the following methods:

`getAverage()`: Returns the average value of the metric.
`getMin()`: Returns the minimum value of the metric.
`getMax()`: Returns the maximum value of the metric.
`getSum()`: Returns the sum of the metric values.
`getCount()`: Returns the count of terms included in the metric calculation.

===================================================


[NOTE]
.Painless language required
===================================================

The `_termStats` variable is only available when using the <<modules-scripting-painless, Painless>> scripting language.

===================================================

[discrete]
[[modules-scripting-doc-vals]]
Expand Down
Loading