Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/125259.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 125259
summary: Leverage scorer supplier in `QueryFeatureExtractor`
area: Ranking
type: enhancement
issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,12 @@ public void setNextReader(LeafReaderContext segmentContext) throws IOException {
if (weight == null) {
continue;
}
Scorer scorer = weight.scorer(segmentContext);
if (scorer != null) {
subScorers.add(new FeatureDisiWrapper(scorer, featureNames.get(i)));
var scorerSupplier = weight.scorerSupplier(segmentContext);
if (scorerSupplier != null) {
var scorer = scorerSupplier.get(0L);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this "leadCost" thing.

Reading the docs:

 * @param leadCost Cost of the scorer that will be used in order to lead iteration. This can be
   *     interpreted as an upper bound of the number of times that {@link DocIdSetIterator#nextDoc},
   *     {@link DocIdSetIterator#advance} and {@link TwoPhaseIterator#matches} will be called. Under
   *     doubt, pass {@link Long#MAX_VALUE}, which will produce a {@link Scorer} that has good
   *     iteration capabilities.

So, shouldn't this be "num of docs that we will actually score", e.g. the rank_window? But that number seems really small anyways. Maybe 0 is just fine here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helps optimize scorers based on the estimated number of matches, which is derived from the cost of the lead iterator in the query. For example, IndexOrDocValuesQuery uses this to decide between a points/term query and a doc values query. Setting the leading cost to 0 forces all queries to use a scorer optimized for a small set of selected documents (the top N). In the case of IndexOrDocValuesQuery, this means selecting the doc values query.

if (scorer != null) {
subScorers.add(new FeatureDisiWrapper(scorer, featureNames.get(i)));
}
}
}
approximation = subScorers.size() > 0 ? new DisjunctionDISIApproximation(subScorers) : null;
Expand Down
Loading