Skip to content

Conversation

@jimczi
Copy link
Contributor

@jimczi jimczi commented Mar 19, 2025

Follow up of #125103 that leverages scorer supplier to create queries optimised to run on top docs only.

Follow up of elastic#125103 that leverages scorer supplier to create queries optimised to run on top docs only.
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Mar 19, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

subScorers.add(new FeatureDisiWrapper(scorer, featureNames.get(i)));
var scorerSupplier = weight.scorerSupplier(segmentContext);
if (scorerSupplier != null) {
var scorer = scorerSupplier.get(0L);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this "leadCost" thing.

Reading the docs:

 * @param leadCost Cost of the scorer that will be used in order to lead iteration. This can be
   *     interpreted as an upper bound of the number of times that {@link DocIdSetIterator#nextDoc},
   *     {@link DocIdSetIterator#advance} and {@link TwoPhaseIterator#matches} will be called. Under
   *     doubt, pass {@link Long#MAX_VALUE}, which will produce a {@link Scorer} that has good
   *     iteration capabilities.

So, shouldn't this be "num of docs that we will actually score", e.g. the rank_window? But that number seems really small anyways. Maybe 0 is just fine here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helps optimize scorers based on the estimated number of matches, which is derived from the cost of the lead iterator in the query. For example, IndexOrDocValuesQuery uses this to decide between a points/term query and a doc values query. Setting the leading cost to 0 forces all queries to use a scorer optimized for a small set of selected documents (the top N). In the case of IndexOrDocValuesQuery, this means selecting the doc values query.

@jimczi jimczi added the auto-backport Automatically create backport pull requests when merged label Mar 20, 2025
@jimczi jimczi merged commit 22be0d9 into elastic:main Mar 20, 2025
17 checks passed
@jimczi jimczi deleted the query_feature_scorer_supplier branch March 20, 2025 11:22
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

elasticsearchmachine pushed a commit that referenced this pull request Mar 20, 2025
Follow up of #125103 that leverages scorer supplier to create queries optimised to run on top docs only.
afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Mar 21, 2025
Follow up of elastic#125103 that leverages scorer supplier to create queries optimised to run on top docs only.
smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Mar 21, 2025
Follow up of elastic#125103 that leverages scorer supplier to create queries optimised to run on top docs only.
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
Follow up of elastic#125103 that leverages scorer supplier to create queries optimised to run on top docs only.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >enhancement :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants