- 
                Notifications
    
You must be signed in to change notification settings  - Fork 25.6k
 
Leverage scorer supplier in QueryFeatureExtractor #125259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Follow up of elastic#125103 that leverages scorer supplier to create queries optimised to run on top docs only.
| 
           Pinging @elastic/es-search-relevance (Team:Search Relevance)  | 
    
| 
           Hi @jimczi, I've created a changelog YAML for you.  | 
    
| subScorers.add(new FeatureDisiWrapper(scorer, featureNames.get(i))); | ||
| var scorerSupplier = weight.scorerSupplier(segmentContext); | ||
| if (scorerSupplier != null) { | ||
| var scorer = scorerSupplier.get(0L); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this "leadCost" thing.
Reading the docs:
 * @param leadCost Cost of the scorer that will be used in order to lead iteration. This can be
   *     interpreted as an upper bound of the number of times that {@link DocIdSetIterator#nextDoc},
   *     {@link DocIdSetIterator#advance} and {@link TwoPhaseIterator#matches} will be called. Under
   *     doubt, pass {@link Long#MAX_VALUE}, which will produce a {@link Scorer} that has good
   *     iteration capabilities.
So, shouldn't this be "num of docs that we will actually score", e.g. the rank_window? But that number seems really small anyways. Maybe 0 is just fine here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helps optimize scorers based on the estimated number of matches, which is derived from the cost of the lead iterator in the query. For example, IndexOrDocValuesQuery uses this to decide between a points/term query and a doc values query. Setting the leading cost to 0 forces all queries to use a scorer optimized for a small set of selected documents (the top N). In the case of IndexOrDocValuesQuery, this means selecting the doc values query.
          💚 Backport successful
  | 
    
Follow up of elastic#125103 that leverages scorer supplier to create queries optimised to run on top docs only.
Follow up of elastic#125103 that leverages scorer supplier to create queries optimised to run on top docs only.
Follow up of elastic#125103 that leverages scorer supplier to create queries optimised to run on top docs only.
Follow up of #125103 that leverages scorer supplier to create queries optimised to run on top docs only.