perf: misc micro-optimizations by Qumeric · Pull Request #553 · axiom-crypto/openvm-eth

Qumeric · 2026-01-19T21:44:53Z

Use Cell instead of RefCell -- slightly uglier but faster. I tried it previous and it did not work, now it works, not sure what was the issue. This is the main optimization
Allocate exact path size instead of maximum possible
Skip nibble decode on common paths
Before, insert/delete would always call prefix_to_nibs to expand the compact path into a SmallVec, even when the key matched or the prefix was clearly a match. That allocation/expansion is now avoided on the hot path:
- Leaf: if the encoded path already matches the key nibs, update in place without decoding.
- Extension: if the encoded path is a prefix of the key nibs, descend without decoding.
- Only when we need to split/merge do we decode to nibs and compute lcp.

Bench links:

Comparison                                                                                                                                                                                                                                                                         
 ┌─────────────────────────┬────────────┬────────────┬──────────────┬────────────────────────────┐                                                                                                                                                                                            
 │         Metric          │  Baseline  │ Cell Only  │ Cell + Alloc │ Cell + Alloc + Skip Decode │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Overall                 │            │            │              │                            │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Total Time (seq)        │ 531.1s     │ 527.27s    │ 531.54s      │ 528.81s                    │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Total Time (parallel)   │ 16.71s     │ 15.82s     │ 15.81s       │ 15.92s                     │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Total Time (32 provers) │ 28.13s     │ 28.06s     │ 28.16s       │ 28.16s                     │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ reth.prove_stark        │            │            │              │                            │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Proof Time (seq)        │ 156.43s    │ 156.77s    │ 157.43s      │ 155.27s                    │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Proof Time (parallel)   │ 6.80s      │ 6.66s      │ 6.61s        │ 6.67s                      │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Proof Time (32 provers) │ 10.29s     │ 10.52s     │ 10.26s       │ 10.63s                     │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Main Cells (avg)        │ 55,946,302 │ 55,471,353 │ 55,467,351   │ 55,461,070                 │                                                                                                                                                                                            
 ├─────────────────────────┼────────────┼────────────┼──────────────┼────────────────────────────┤                                                                                                                                                                                            
 │ Total Cells (avg)       │ 94,140,625 │ 93,576,545 │ 93,572,016   │ 93,564,068                 │                                                                                                                                                                                            
 └─────────────────────────┴────────────┴────────────┴──────────────┴────────────────────────────┘

Key Observations

Cell reduction: ~0.9% fewer main cells and ~0.6% fewer total cells from baseline. ~90% of it is just RefCell -> Cell.
Parallel time: Slight improvement (~5%) from baseline (16.71s → 15.81s best). However, most of it is improvement in leaf (2.80s -> ~2.10s) -- I think it's not going to be reproducible on other blocks.

claude · 2026-01-19T21:54:40Z

Code Review: perf: misc micro-optimizations

Summary

This PR introduces three micro-optimizations to the MPT (Merkle Patricia Trie) implementation. The changes are well-targeted and demonstrate a good understanding of performance hotspots. Overall, this is a solid performance improvement with clean implementation.

✅ Strengths

1. Cell vs RefCell Change (trie.rs:63)

Excellent optimization. Replacing RefCell<Option<NodeRef<'a>>> with Cell<Option<NodeRef<'a>>> is valid here because:

NodeRef is now Copy (correctly added in node.rs:30)
The cache access pattern only requires replacing the entire value, not mutating it
Cell::get() and Cell::set() are cheaper than RefCell::borrow_mut() (no runtime borrow checking overhead)

The migration is implemented consistently across all usage sites (reference_encode, reference_length, hash, add_node, invalidate_ref_cache).

2. Exact Capacity Allocation (hp.rs:176, 201)

Good improvement. The calculation 1 + (nibs.len() / 2) correctly computes the exact encoded path size:

1 byte for the prefix
Each pair of nibbles becomes 1 byte
For odd-length paths, the formula still works (e.g., 3 nibbles → 1 + 3/2 = 1 + 1 = 2 bytes)

This eliminates over-allocation from the previous fixed 33-byte capacity.

3. Skip Nibble Decode on Hot Paths (trie.rs:653, 699, 819, 826)

Smart optimization. The new helper functions encoded_path_eq_nibs and encoded_path_strip_prefix avoid expensive prefix_to_nibs allocation when:

Leaf path exactly matches the key (trie.rs:653)
Extension path is a prefix of the key (trie.rs:699)
Deletion operations (trie.rs:819, 826)

The implementations in hp.rs (77-166) are careful and correct, handling both odd and even-length encoded paths.

🔍 Observations & Minor Concerns

1. Code Duplication in Cache Access Pattern

The cache lookup pattern is repeated 3 times (reference_encode, reference_length, hash):

let cached = self.cached_references[node_id as usize].get();
let node_ref = match cached {
    Some(node_ref) => node_ref,
    None => {
        let node_ref = self.calc_reference(node_id);
        self.cached_references[node_id as usize].set(Some(node_ref));
        node_ref
    }
};

Suggestion: Consider extracting this into a helper method to reduce duplication:

#[inline]
fn get_or_calc_reference(&self, node_id: NodeId) -> NodeRef<'a> {
    self.cached_references[node_id as usize]
        .get()
        .unwrap_or_else(|| {
            let node_ref = self.calc_reference(node_id);
            self.cached_references[node_id as usize].set(Some(node_ref));
            node_ref
        })
}

This would simplify the three call sites and make future changes easier.

2. Bounds Checking in Hot Path

The array indexing self.cached_references[node_id as usize].get() performs bounds checking. The compiler might not eliminate these checks even though node_id should always be valid.

Suggestion (optional): If profiling shows bounds checking overhead, consider using get_unchecked with appropriate safety comments, though the current approach is safer and likely fine.

3. nibble/nibs Naming Inconsistency

The code uses both "nibble count" and "nib count" terminology (e.g., encoded_path_nibble_count vs nib_count variable). Minor inconsistency but not critical.

🧪 Testing

Good coverage: The existing tests in hp.rs (test_encoded_path_nibble_count, test_eq_and_strip_prefix) cover the new helper functions. The high-level MPT tests should catch any regressions in the insert/delete hot paths.

Recommendation: Verify that the test suite runs successfully and covers edge cases like:

Empty paths
Single-nibble paths
Maximum-length paths (64 nibbles)
Paths that differ at the first nibble vs. last nibble

🔒 Security

No security concerns identified. The changes:

Don't introduce unsafe code
Maintain correct bounds checking
Don't change the external API or serialization format

📊 Performance

The benchmark results look promising:

Total: 531.100 → 528.81 (0.4% improvement)
Parallel: 16.71 → 15.92 (4.7% improvement)

The parallel improvement is more significant, suggesting the optimizations help with concurrent workloads where cache/allocation contention matters more.

📝 Minor Nits

.gitignore change: Adding bench-results/ and bench-results-large/ is good hygiene. ✅
Comment removal in hp.rs:173, 198: The removed "Max path is 64 nibs" comment was informative. Consider keeping a comment explaining the exact capacity calculation, as it's not immediately obvious:
```
// Exact capacity: 1 prefix byte + (nibs.len() / 2) data bytes
let encoded_len = 1 + (nibs.len() / 2);
```

Final Verdict

✅ LGTM with minor suggestions

This is a well-executed micro-optimization PR. The changes are:

Correct and safe
Well-tested by existing test suite
Show measurable performance improvement
Don't compromise code clarity significantly

The only improvement would be reducing the code duplication in the cache access pattern, but that's a minor refactoring opportunity rather than a blocker.

Recommendation: Approve and merge. Consider the helper method extraction as a follow-up cleanup if desired.

shayanh

LGTM. I'm not if the second optimization is necessarily better. Maybe better to do 1+3 or only 1.

shayanh · 2026-02-02T21:16:05Z

crates/mpt/src/hp.rs

+    let encoded_len = 1 + (nibs.len() / 2);
+    let mut encoded = Vec::with_capacity(encoded_len);


I'm not sure if this is necessarily better.

Well bench is improved slightly. It's n=1 but it's better than nothing. Given that it's a 1 line change I am keeping it

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional flags.

This comment was marked as outdated.

Sign in to view

Qumeric force-pushed the val/test-opt branch from 0d0cedc to 56fc20f Compare January 19, 2026 21:53

Qumeric requested review from jonathanpwang and shayanh January 22, 2026 20:55

Qumeric marked this pull request as ready for review January 23, 2026 12:46

shayanh approved these changes Feb 2, 2026

View reviewed changes

Qumeric added 4 commits February 3, 2026 12:16

perf(mpt): use Cell for cached refs

6aba58b

perf(mpt): size encoded path buffers precisely

a345c63

perf(mpt): skip nibble decode on common paths

a87849b

chore: ignore bench-results dir

37947af

Qumeric force-pushed the val/test-opt branch from 56fc20f to 37947af Compare February 3, 2026 12:22

Qumeric merged commit 6a7433c into main Feb 3, 2026
3 checks passed

Qumeric deleted the val/test-opt branch February 3, 2026 12:24

devin-ai-integration bot reviewed Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: misc micro-optimizations#553

perf: misc micro-optimizations#553
Qumeric merged 4 commits intomainfrom
val/test-opt

Qumeric commented Jan 19, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

claude bot commented Jan 19, 2026

Uh oh!

shayanh left a comment

Uh oh!

shayanh Feb 2, 2026

Uh oh!

Qumeric Feb 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		let encoded_len = 1 + (nibs.len() / 2);
		let mut encoded = Vec::with_capacity(encoded_len);

Conversation

Qumeric commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

claude bot commented Jan 19, 2026

Code Review: perf: misc micro-optimizations

Summary

✅ Strengths

1. Cell vs RefCell Change (trie.rs:63)

2. Exact Capacity Allocation (hp.rs:176, 201)

3. Skip Nibble Decode on Hot Paths (trie.rs:653, 699, 819, 826)

🔍 Observations & Minor Concerns

1. Code Duplication in Cache Access Pattern

2. Bounds Checking in Hot Path

3. nibble/nibs Naming Inconsistency

🧪 Testing

🔒 Security

📊 Performance

📝 Minor Nits

Final Verdict

Uh oh!

shayanh left a comment

Choose a reason for hiding this comment

Uh oh!

shayanh Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Qumeric Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Qumeric commented Jan 19, 2026 •

edited

Loading

Qumeric Feb 3, 2026 •

edited

Loading