Skip to content

Conversation

@romseygeek
Copy link
Contributor

This allows client code that is querying sorted indexes to ensure that segments most likely to produce entries in a top-k search are queried first, allowing efficient skipping of other segments.

This allows client code that is querying sorted indexes to ensure that
segments most likely to produce entries in a top-k search are queried
first, allowing efficient skipping of other segments.

@Override
public LeafCollector getLeafCollector(LeafReaderContext context) throws IOException {
String segmentId = context.toString().substring(19, 20);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something of a hack, but I couldn't think of an easier way to work out what the 'original' ordinal for a segment is. Less hacky solution ideas are welcome!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't ord a public field in LeafReaderContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, but it is generated by BaseCompositeReader after the segment sorter has been applied. So the ord of the first segment in the leaves array is always 0, and we can't use it to see if the segments have been re-ordered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, segment id - nevermind. Don't know. The substring hack seems... very fragile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the test to just compare the order of leaves before and after the re-ordering. The collector checks were necessary when I was wiring everything through IndexSearcher but aren't needed now.

@romseygeek romseygeek merged commit d455fef into apache:main Jan 23, 2026
12 checks passed
@romseygeek romseygeek deleted the sort/reader-resorter branch January 23, 2026 10:34
romseygeek added a commit that referenced this pull request Jan 23, 2026
)

This allows client code that is querying sorted indexes to ensure that
segments most likely to produce entries in a top-k search are queried
first, allowing efficient skipping of other segments.
finnroblin pushed a commit to finnroblin/lucene that referenced this pull request Feb 2, 2026
…che#15591)

This allows client code that is querying sorted indexes to ensure that
segments most likely to produce entries in a top-k search are queried
first, allowing efficient skipping of other segments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants