-
Notifications
You must be signed in to change notification settings - Fork 85
perf: pre-allocate hash maps #732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughThe PR switches to FxHashMap with an explicit Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
CodSpeed Performance ReportMerging #732 will improve performance by ×2Comparing Summary
Benchmarks breakdown
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces performance optimizations by pre-allocating hash maps with estimated capacities to reduce reallocations during insertions. The changes focus on the separate_components and components functions in the undirected connected components module.
- Pre-allocates
HashMapinseparate_componentsbased on total element count across all groups - Pre-allocates
FxHashMapincomponentsbased on the count of non-empty components
| // Pre-size the hash map to reduce reallocations | ||
| let estimated_capacity = groups.iter().map(|g| g.into_iter().count()).sum(); | ||
| let mut indices = HashMap::with_capacity(estimated_capacity); |
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The capacity estimation here iterates through all groups twice - once to count the elements for pre-allocation (line 64) and again in the main loop (line 66). The into_iter().count() call on line 64 will consume or clone each iterator, potentially causing performance overhead that negates the benefit of pre-allocation.
Consider storing the group sizes during the first pass or accepting that the pre-allocation might not be worth the extra iteration cost in all cases. Alternatively, if groups are cheap to iterate, document that this optimization assumes iteration is inexpensive.
| // Pre-size the hash map to reduce reallocations | |
| let estimated_capacity = groups.iter().map(|g| g.into_iter().count()).sum(); | |
| let mut indices = HashMap::with_capacity(estimated_capacity); | |
| let mut indices = HashMap::new(); |
| // Pre-size the hash map to reduce reallocations | ||
| let estimated_capacity = gindices.iter().filter(|&&n| n != usize::MAX).count(); | ||
| let mut gb: FxHashMap<usize, FxHashSet<N>> = | ||
| FxHashMap::with_capacity_and_hasher(estimated_capacity, FxBuildHasher); |
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FxHashMap type is a type alias that already includes FxBuildHasher as its default hasher, so you can simply use FxHashMap::with_capacity(estimated_capacity) instead of the more verbose FxHashMap::with_capacity_and_hasher(estimated_capacity, FxBuildHasher). This follows the same pattern as the standard HashMap::with_capacity() used in the separate_components function above.
| FxHashMap::with_capacity_and_hasher(estimated_capacity, FxBuildHasher); | |
| FxHashMap::with_capacity(estimated_capacity); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/undirected/connected_components.rs (1)
108-111: Simplify to usewith_capacityinstead ofwith_capacity_and_hasher.Since
FxHashMapis a type alias forHashMap<K, V, FxBuildHasher>, callingwith_capacity_and_hasherwith an explicitFxBuildHasheris redundant. The simplerwith_capacitymethod already uses the correct hasher.🔎 Proposed simplification
- let mut gb: FxHashMap<usize, FxHashSet<N>> = - FxHashMap::with_capacity_and_hasher(estimated_capacity, FxBuildHasher); + let mut gb: FxHashMap<usize, FxHashSet<N>> = + FxHashMap::with_capacity(estimated_capacity);
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/undirected/connected_components.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Test suite (msrv)
- GitHub Check: Test suite (stable)
- GitHub Check: Test suite (beta)
- GitHub Check: Test suite (nightly)
- GitHub Check: Extra tests in release mode
- GitHub Check: Test with minimal versions
- GitHub Check: Agent
- GitHub Check: benchmarks
🔇 Additional comments (2)
src/undirected/connected_components.rs (2)
8-8: LGTM!The
FxBuildHasherimport is necessary for the explicit hasher configuration on line 111.
63-65: Verify that pre-allocation benefits outweigh double iteration cost.The optimization iterates through all groups twice: once to count elements (line 64) and again to process them (lines 66-79). While pre-allocation reduces reallocations, the upfront counting pass adds O(n) overhead.
Please confirm with benchmarks that this optimization improves performance for the typical use cases, especially when groups are small or sparse.
e76809c to
80556b3
Compare
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.