Skip to content

Conversation

@samueltardieu
Copy link
Member

@samueltardieu samueltardieu commented Jan 2, 2026

Summary by CodeRabbit

  • Refactor
    • Improved internal data structure initialization and pre-sizing to reduce memory reallocations and improve performance during component processing, with no changes to public behavior or interfaces.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 2, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

The PR switches to FxHashMap with an explicit FxBuildHasher and precomputes capacity estimates in separate_components and components, initializing hash maps with those capacities to reduce reallocations; no public signatures changed.

Changes

Cohort / File(s) Summary
Connected components optimizations
src/undirected/connected_components.rs
Add FxBuildHasher import. In separate_components, sum element counts to precompute estimated_capacity and initialize the indices map with that capacity. In components, compute estimated_capacity from gindices (excluding usize::MAX) and construct an FxHashMap with explicit capacity and hasher instead of default.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰
I nibbled bytes and counted seeds,
Pre-sized baskets meet our needs.
Fx hashes, snug and keen,
Fewer hops, the graph runs clean.
A tiny rabbit's optimization scene!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'perf: pre-allocate hash maps' directly and clearly describes the main change: optimizing performance by pre-allocating hash maps in the connected_components.rs file.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e76809c and 80556b3.

📒 Files selected for processing (1)
  • src/undirected/connected_components.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/undirected/connected_components.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: benchmarks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 2, 2026

CodSpeed Performance Report

Merging #732 will improve performance by ×2

Comparing push-zuzvxtrytpyn (80556b3) with main (d5aceb5)

Summary

⚡ 1 improvement
✅ 37 untouched

Benchmarks breakdown

Benchmark BASE HEAD Efficiency
separate_components 2.2 ms 1.1 ms ×2

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces performance optimizations by pre-allocating hash maps with estimated capacities to reduce reallocations during insertions. The changes focus on the separate_components and components functions in the undirected connected components module.

  • Pre-allocates HashMap in separate_components based on total element count across all groups
  • Pre-allocates FxHashMap in components based on the count of non-empty components

Comment on lines +63 to +65
// Pre-size the hash map to reduce reallocations
let estimated_capacity = groups.iter().map(|g| g.into_iter().count()).sum();
let mut indices = HashMap::with_capacity(estimated_capacity);
Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The capacity estimation here iterates through all groups twice - once to count the elements for pre-allocation (line 64) and again in the main loop (line 66). The into_iter().count() call on line 64 will consume or clone each iterator, potentially causing performance overhead that negates the benefit of pre-allocation.

Consider storing the group sizes during the first pass or accepting that the pre-allocation might not be worth the extra iteration cost in all cases. Alternatively, if groups are cheap to iterate, document that this optimization assumes iteration is inexpensive.

Suggested change
// Pre-size the hash map to reduce reallocations
let estimated_capacity = groups.iter().map(|g| g.into_iter().count()).sum();
let mut indices = HashMap::with_capacity(estimated_capacity);
let mut indices = HashMap::new();

Copilot uses AI. Check for mistakes.
// Pre-size the hash map to reduce reallocations
let estimated_capacity = gindices.iter().filter(|&&n| n != usize::MAX).count();
let mut gb: FxHashMap<usize, FxHashSet<N>> =
FxHashMap::with_capacity_and_hasher(estimated_capacity, FxBuildHasher);
Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FxHashMap type is a type alias that already includes FxBuildHasher as its default hasher, so you can simply use FxHashMap::with_capacity(estimated_capacity) instead of the more verbose FxHashMap::with_capacity_and_hasher(estimated_capacity, FxBuildHasher). This follows the same pattern as the standard HashMap::with_capacity() used in the separate_components function above.

Suggested change
FxHashMap::with_capacity_and_hasher(estimated_capacity, FxBuildHasher);
FxHashMap::with_capacity(estimated_capacity);

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/undirected/connected_components.rs (1)

108-111: Simplify to use with_capacity instead of with_capacity_and_hasher.

Since FxHashMap is a type alias for HashMap<K, V, FxBuildHasher>, calling with_capacity_and_hasher with an explicit FxBuildHasher is redundant. The simpler with_capacity method already uses the correct hasher.

🔎 Proposed simplification
-        let mut gb: FxHashMap<usize, FxHashSet<N>> =
-            FxHashMap::with_capacity_and_hasher(estimated_capacity, FxBuildHasher);
+        let mut gb: FxHashMap<usize, FxHashSet<N>> =
+            FxHashMap::with_capacity(estimated_capacity);
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5aceb5 and e76809c.

📒 Files selected for processing (1)
  • src/undirected/connected_components.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Test suite (msrv)
  • GitHub Check: Test suite (stable)
  • GitHub Check: Test suite (beta)
  • GitHub Check: Test suite (nightly)
  • GitHub Check: Extra tests in release mode
  • GitHub Check: Test with minimal versions
  • GitHub Check: Agent
  • GitHub Check: benchmarks
🔇 Additional comments (2)
src/undirected/connected_components.rs (2)

8-8: LGTM!

The FxBuildHasher import is necessary for the explicit hasher configuration on line 111.


63-65: Verify that pre-allocation benefits outweigh double iteration cost.

The optimization iterates through all groups twice: once to count elements (line 64) and again to process them (lines 66-79). While pre-allocation reduces reallocations, the upfront counting pass adds O(n) overhead.

Please confirm with benchmarks that this optimization improves performance for the typical use cases, especially when groups are small or sparse.

@samueltardieu samueltardieu added this pull request to the merge queue Jan 3, 2026
Merged via the queue into main with commit 857616f Jan 3, 2026
14 checks passed
@samueltardieu samueltardieu deleted the push-zuzvxtrytpyn branch January 3, 2026 01:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants