-
Notifications
You must be signed in to change notification settings - Fork 202
Inline consolidate fast-path #629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline consolidate fast-path #629
Conversation
Gives the compiler the opportunity to inline consolidate calls for small amounts of data by separating the complex consolidate logic for more than one update to a different function. We don't mark the function as cold or inline never because we want to leave the decision to the compiler. In local testing, this shaved a few percentages of the spines example. With the patch: ``` Running ["new"] arrangement 5.682871841s loading complete 14.032609636s queries complete 14.037398497s shut down ``` With current master: ``` Running ["new"] arrangement 6.010566878s loading complete 14.673966926s queries complete 14.678923984s shut down ``` Signed-off-by: Moritz Hoffmann <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the consolidation functions by separating fast-path logic for small datasets from the complex consolidation logic for larger datasets. The optimization enables the compiler to inline the fast-path consolidation calls, resulting in improved performance for common cases with small amounts of data.
Key changes:
- Added inline hints to public consolidation functions to encourage inlining
- Extracted complex consolidation logic into separate "slow" functions for slices with more than one element
- Restructured control flow to prioritize the simple case handling for single-element or empty slices
| // In a world where there are not many results, we may never even need to call in to merge sort. | ||
| slice.sort_by(|x,y| x.0.cmp(&y.0)); | ||
| /// Part of `consolidate_slice` that handles slices of length greater than 1. | ||
| fn consolidate_slice_slow<T: Ord, R: Semigroup>(slice: &mut [(T, R)]) -> usize { |
Copilot
AI
Jul 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function consolidate_slice_slow should be marked as pub(crate) or remain private. Making it public exposes an implementation detail that users shouldn't call directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming nit but, it's not that it's slow it's just general. There isn't a fast version, there's just a less general version.
| // In a world where there are not many results, we may never even need to call in to merge sort. | ||
| slice.sort_unstable_by(|x,y| (&x.0, &x.1).cmp(&(&y.0, &y.1))); | ||
| /// Part of `consolidate_updates_slice` that handles slices of length greater than 1. | ||
| pub fn consolidate_updates_slice_slow<D: Ord, T: Ord, R: Semigroup>(slice: &mut [(D, T, R)]) -> usize { |
Copilot
AI
Jul 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function consolidate_updates_slice_slow is marked as public but appears to be an implementation detail. Consider making it pub(crate) or private to avoid exposing internal implementation.
| pub fn consolidate_updates_slice_slow<D: Ord, T: Ord, R: Semigroup>(slice: &mut [(D, T, R)]) -> usize { | |
| fn consolidate_updates_slice_slow<D: Ord, T: Ord, R: Semigroup>(slice: &mut [(D, T, R)]) -> usize { |
Signed-off-by: Moritz Hoffmann <[email protected]>
frankmcsherry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good! I had some nits that we can follow up with async.
| consolidate_slice_slow(slice) | ||
| } | ||
| else { | ||
| slice.iter().filter(|x| !x.1.is_zero()).count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not the most important thing, but I wonder if there is a way to write this that is more helpful to Rust. Like, rather than iter and hope Rust realizes the slices are either len 0 or len 1, we could get(0) and unwrap_or(0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see these were inherited from the previous implementation! We can explore in follow-up work, or .. ignore if no improvements!
| // In a world where there are not many results, we may never even need to call in to merge sort. | ||
| slice.sort_by(|x,y| x.0.cmp(&y.0)); | ||
| /// Part of `consolidate_slice` that handles slices of length greater than 1. | ||
| fn consolidate_slice_slow<T: Ord, R: Semigroup>(slice: &mut [(T, R)]) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming nit but, it's not that it's slow it's just general. There isn't a fast version, there's just a less general version.
Gives the compiler the opportunity to inline consolidate calls for small
amounts of data by separating the complex consolidate logic for more than
one update to a different function. We don't mark the function as cold or
inline never because we want to leave the decision to the compiler.
In local testing, this shaved a few percentages of the spines example. With
the patch:
With current master:
Signed-off-by: Moritz Hoffmann [email protected]