-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Optimize DirectIO prefetch for monotonically increasing access #136946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Move to draft and there is something odd. We are in places aligning the data to the blockSize and my change align data to the prefetch block size. I guess that's really the change in the approach. What it makes very difficult to maintain slots in the current approach is that we align data using the block size but then we try to fit it in prefetch blocks of different size so a byte can be loaded differently depending on the alignment. Aligning the data to the prefetch block size ensures a position on the file will be always add to the same position on a prefetch block. I guess everything works because in my machine the prefetch block size = 2 * block size. |
|
I wonder if it's better to just check the 'floor' of already requested blocks when potentially adding a position slot. I think the "grabbing oldest" is likely ok logic if you want to split it out of this PR |
|
I made data to be blockSize align in 3f32209 |
benwtrent
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. have you benchmarked it to see if it helps?
I tried with some local examples and it helped quite a lot, of course it was a case the current code was doing very bad. I will try to get a realistic example. |
|
@iverase I tried with vector rescoring and this PR seems to make things slower. Could you confirm? Maybe I have benchmarking bias here :) |
|
I ran more benchmarks, this does seem faster. this pr baseline: If you agree, I think its good to merge. |
This PR proposes a new implementation for DirectIO prefetching that is optimised for access in monotonically increase order which is the typical access when doing vector rescoring.