Skip to content

perf: misc micro-optimizations#553

Merged
Qumeric merged 4 commits intomainfrom
val/test-opt
Feb 3, 2026
Merged

perf: misc micro-optimizations#553
Qumeric merged 4 commits intomainfrom
val/test-opt

Conversation

@Qumeric
Copy link
Contributor

@Qumeric Qumeric commented Jan 19, 2026

  1. Use Cell instead of RefCell -- slightly uglier but faster. I tried it previous and it did not work, now it works, not sure what was the issue. This is the main optimization
  2. Allocate exact path size instead of maximum possible
  3. Skip nibble decode on common paths
    Before, insert/delete would always call prefix_to_nibs to expand the compact path into a SmallVec, even when the key matched or the prefix was clearly a match. That allocation/expansion is now avoided on the hot path:
    • Leaf: if the encoded path already matches the key nibs, update in place without decoding.
    • Extension: if the encoded path is a prefix of the key nibs, descend without decoding.
    • Only when we need to split/merge do we decode to nibs and compute lcp.

Bench links:

Comparison                                                                                                                                                                                                                                                                         
 ┌─────────────────────────┬────────────┬────────────┬──────────────┬────────────────────────────┐                                                                                                                                                                                            
 │         Metric          │  Baseline  │ Cell Only  │ Cell + Alloc │ Cell + Alloc + Skip Decode │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Overall                 │            │            │              │                            │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Total Time (seq)        │ 531.1s     │ 527.27s    │ 531.54s      │ 528.81s                    │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Total Time (parallel)   │ 16.71s     │ 15.82s     │ 15.81s       │ 15.92s                     │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Total Time (32 provers) │ 28.13s     │ 28.06s     │ 28.16s       │ 28.16s                     │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ reth.prove_stark        │            │            │              │                            │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Proof Time (seq)        │ 156.43s    │ 156.77s    │ 157.43s      │ 155.27s                    │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Proof Time (parallel)   │ 6.80s      │ 6.66s      │ 6.61s        │ 6.67s                      │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Proof Time (32 provers) │ 10.29s     │ 10.52s     │ 10.26s       │ 10.63s                     │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Main Cells (avg)        │ 55,946,302 │ 55,471,353 │ 55,467,351   │ 55,461,070                 │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Total Cells (avg)       │ 94,140,625 │ 93,576,545 │ 93,572,016   │ 93,564,068                 │                                                                                                                                                                                            
 └─────────────────────────┴────────────┴────────────┴──────────────┴────────────────────────────┘                                                                                                                                                                                            
                                                                                                                                                                                     

Key Observations

  • Cell reduction: ~0.9% fewer main cells and ~0.6% fewer total cells from baseline. ~90% of it is just RefCell -> Cell.
  • Parallel time: Slight improvement (~5%) from baseline (16.71s → 15.81s best). However, most of it is improvement in leaf (2.80s -> ~2.10s) -- I think it's not going to be reproducible on other blocks.

@claude

This comment was marked as outdated.

@claude
Copy link

claude bot commented Jan 19, 2026

Code Review: perf: misc micro-optimizations

Summary

This PR introduces three micro-optimizations to the MPT (Merkle Patricia Trie) implementation. The changes are well-targeted and demonstrate a good understanding of performance hotspots. Overall, this is a solid performance improvement with clean implementation.

✅ Strengths

1. Cell vs RefCell Change (trie.rs:63)

Excellent optimization. Replacing RefCell<Option<NodeRef<'a>>> with Cell<Option<NodeRef<'a>>> is valid here because:

  • NodeRef is now Copy (correctly added in node.rs:30)
  • The cache access pattern only requires replacing the entire value, not mutating it
  • Cell::get() and Cell::set() are cheaper than RefCell::borrow_mut() (no runtime borrow checking overhead)

The migration is implemented consistently across all usage sites (reference_encode, reference_length, hash, add_node, invalidate_ref_cache).

2. Exact Capacity Allocation (hp.rs:176, 201)

Good improvement. The calculation 1 + (nibs.len() / 2) correctly computes the exact encoded path size:

  • 1 byte for the prefix
  • Each pair of nibbles becomes 1 byte
  • For odd-length paths, the formula still works (e.g., 3 nibbles → 1 + 3/2 = 1 + 1 = 2 bytes)

This eliminates over-allocation from the previous fixed 33-byte capacity.

3. Skip Nibble Decode on Hot Paths (trie.rs:653, 699, 819, 826)

Smart optimization. The new helper functions encoded_path_eq_nibs and encoded_path_strip_prefix avoid expensive prefix_to_nibs allocation when:

  • Leaf path exactly matches the key (trie.rs:653)
  • Extension path is a prefix of the key (trie.rs:699)
  • Deletion operations (trie.rs:819, 826)

The implementations in hp.rs (77-166) are careful and correct, handling both odd and even-length encoded paths.

🔍 Observations & Minor Concerns

1. Code Duplication in Cache Access Pattern

The cache lookup pattern is repeated 3 times (reference_encode, reference_length, hash):

let cached = self.cached_references[node_id as usize].get();
let node_ref = match cached {
    Some(node_ref) => node_ref,
    None => {
        let node_ref = self.calc_reference(node_id);
        self.cached_references[node_id as usize].set(Some(node_ref));
        node_ref
    }
};

Suggestion: Consider extracting this into a helper method to reduce duplication:

#[inline]
fn get_or_calc_reference(&self, node_id: NodeId) -> NodeRef<'a> {
    self.cached_references[node_id as usize]
        .get()
        .unwrap_or_else(|| {
            let node_ref = self.calc_reference(node_id);
            self.cached_references[node_id as usize].set(Some(node_ref));
            node_ref
        })
}

This would simplify the three call sites and make future changes easier.

2. Bounds Checking in Hot Path

The array indexing self.cached_references[node_id as usize].get() performs bounds checking. The compiler might not eliminate these checks even though node_id should always be valid.

Suggestion (optional): If profiling shows bounds checking overhead, consider using get_unchecked with appropriate safety comments, though the current approach is safer and likely fine.

3. nibble/nibs Naming Inconsistency

The code uses both "nibble count" and "nib count" terminology (e.g., encoded_path_nibble_count vs nib_count variable). Minor inconsistency but not critical.

🧪 Testing

Good coverage: The existing tests in hp.rs (test_encoded_path_nibble_count, test_eq_and_strip_prefix) cover the new helper functions. The high-level MPT tests should catch any regressions in the insert/delete hot paths.

Recommendation: Verify that the test suite runs successfully and covers edge cases like:

  • Empty paths
  • Single-nibble paths
  • Maximum-length paths (64 nibbles)
  • Paths that differ at the first nibble vs. last nibble

🔒 Security

No security concerns identified. The changes:

  • Don't introduce unsafe code
  • Maintain correct bounds checking
  • Don't change the external API or serialization format

📊 Performance

The benchmark results look promising:

  • Total: 531.100 → 528.81 (0.4% improvement)
  • Parallel: 16.71 → 15.92 (4.7% improvement)

The parallel improvement is more significant, suggesting the optimizations help with concurrent workloads where cache/allocation contention matters more.

📝 Minor Nits

  1. .gitignore change: Adding bench-results/ and bench-results-large/ is good hygiene. ✅

  2. Comment removal in hp.rs:173, 198: The removed "Max path is 64 nibs" comment was informative. Consider keeping a comment explaining the exact capacity calculation, as it's not immediately obvious:

    // Exact capacity: 1 prefix byte + (nibs.len() / 2) data bytes
    let encoded_len = 1 + (nibs.len() / 2);

Final Verdict

LGTM with minor suggestions

This is a well-executed micro-optimization PR. The changes are:

  • Correct and safe
  • Well-tested by existing test suite
  • Show measurable performance improvement
  • Don't compromise code clarity significantly

The only improvement would be reducing the code duplication in the cache access pattern, but that's a minor refactoring opportunity rather than a blocker.

Recommendation: Approve and merge. Consider the helper method extraction as a follow-up cleanup if desired.

@Qumeric Qumeric marked this pull request as ready for review January 23, 2026 12:46
Copy link
Contributor

@shayanh shayanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm not if the second optimization is necessarily better. Maybe better to do 1+3 or only 1.

Comment on lines +201 to +202
let encoded_len = 1 + (nibs.len() / 2);
let mut encoded = Vec::with_capacity(encoded_len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is necessarily better.

Copy link
Contributor Author

@Qumeric Qumeric Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well bench is improved slightly. It's n=1 but it's better than nothing. Given that it's a 1 line change I am keeping it

@Qumeric Qumeric merged commit 6a7433c into main Feb 3, 2026
3 checks passed
@Qumeric Qumeric deleted the val/test-opt branch February 3, 2026 12:24
Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional flags.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants