-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Support DocIdSetBuilder with partition bounds #15383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Prudhvi Godithi <[email protected]>
Signed-off-by: Prudhvi Godithi <[email protected]>
Signed-off-by: Prudhvi Godithi <[email protected]>
|
Hey all, pending to add some tests/validations and code clean up from my end but before this I would like to get some early feedback on the approach to see if the idea would make sense. |
|
Adding @jainankitk @getsaurabh02 to the conversation. |
Signed-off-by: Prudhvi Godithi <[email protected]>
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
Signed-off-by: Prudhvi Godithi <[email protected]>
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
Signed-off-by: Prudhvi Godithi <[email protected]>
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
Signed-off-by: Prudhvi Godithi <[email protected]>
Signed-off-by: Prudhvi Godithi <[email protected]>
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
1 similar comment
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
Signed-off-by: Prudhvi Godithi <[email protected]>
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
Signed-off-by: Prudhvi Godithi <[email protected]>
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
Signed-off-by: Prudhvi Godithi <[email protected]>
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
|
Ok the exists checks and tests are now green, let me add some tests in |
| public sealed interface BulkAdder | ||
| permits FixedBitSetAdder, | ||
| BufferAdder, | ||
| PartitionAwareFixedBitSetAdder, | ||
| PartitionAwareBufferAdder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now megamorphic :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. We should run the benchmark to quantify the impact due to virtual calls and megamorphism. Also assuming the impact is significant, I am wondering if we can use directly PartitionAwareFixedBitSetAdder instead of FixedBitSetAdder?
Description
Coming from #14485 and #13745 (Initial implementation of intra-segment search concurrency #13542), when splitting a segment into partitions for intra segment search, each partition would create a
DocIdSetBuilderthat allocates memory based on the entire segment size, even though it only collects documents within a small partition range. This PR adds partition aware support toDocIdSetBuilderwhich creates bitsets and buffers scoped to its doc ID range instead of the entire segment size, this change will have memory efficiency during intra segment search.Example for a Segment with 1M documents split into 4 partitions of 250K docs each and now each partition creates a FixedBitSet(1M) which is not required.
PartitionAwareBufferAdder:PartitionAwareFixedBitSetAdderOffsetBitDocIdSet&OffsetDocIdSetIteratorFixedBitSetuses the doc ID parameter directly as an array index. When we create partition sized bitsets to save memory, we store documents using relative indices (0 to partitionSize-1) internally, but the Lucene API requires iterators to return absolute doc IDs. These wrapper classes handle the conversion automatically.PartitionAwareFixedBitSetAdderis used). This is to convert partition relative indices back to absolute doc IDs.Without Optimization (Old Way):
With Optimization (New Way):