Skip to content

Conversation

@matkt
Copy link
Contributor

@matkt matkt commented Dec 16, 2025

PR description

Description

This PR introduces parallel processing capabilities for Merkle Patricia Trie operations during state root computation in Bonsai storage format, significantly improving block validation performance.

Changes

Core Implementation

  • ParallelStoredMerklePatriciaTrie: New parallel implementation of StoredMerklePatriciaTrie
    • Batches pending updates (puts/removes) before processing
    • Recursively processes branch children in parallel using ForkJoinPool
    • Handles branch, extension, leaf, and null node scenarios

Configuration

  • WorldStateConfig: Added isParallelStateRootComputationEnabled flag (default: true)
  • PathBasedExtraStorageConfiguration: Added parallel state root computation configuration
  • CLI Option: --bonsai-parallel-state-root-computation-enabled to enable/disable feature

Key Features

  1. Parallel Branch Processing: When a branch node has multiple children with updates, they are processed concurrently
  2. Extension Node Expansion: Extensions are temporarily expanded into branches when beneficial for parallel processing
  3. Leaf/Null Node Handling: Intelligent expansion into branch structures when multiple diverging updates exist
  4. Smart Partitioning: Updates are grouped by size - large groups processed in parallel, small groups sequentially
  5. Commit Cache: Thread-safe caching of node updates during parallel processing

Backward Compatibility

  • Feature is opt-in via configuration flag (though enabled by default)
  • Falls back to sequential StoredMerklePatriciaTrie when disabled
  • No breaking changes to existing APIs
  • Fully compatible with existing Bonsai storage format

Configuration Examples

# Enable (default)
besu --data-storage-format=BONSAI --bonsai-parallel-state-root-computation-enabled=true

# Disable for comparison/debugging
besu --data-storage-format=BONSAI --bonsai-parallel-state-root-computation-enabled=false

matkt added 30 commits December 4, 2025 10:59
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
@matkt matkt changed the title Merkle trie optimisation feat: Add parallel state root computation support for Bonsai trie Jan 7, 2026
matkt added 5 commits January 7, 2026 19:00
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
@matkt matkt marked this pull request as ready for review January 9, 2026 13:53
matkt added 2 commits January 9, 2026 18:09
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
Signed-off-by: Karim Taam <karim.t2am@gmail.com>
this.root = loadNode(root);

// Convert pending updates to UpdateEntry objects with nibble paths
final List<UpdateEntry<V>> entries =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use simple for loops to avoid steams overhead on memory allocations and latency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this branch that has the change that replaces streams with simple for loops.

for (final Map.Entry<Byte, List<UpdateEntry<V>>> entry : largeGroups.entrySet()) {
final byte nibble = entry.getKey();
final List<UpdateEntry<V>> childUpdates = entry.getValue();
final Bytes childLocation = Bytes.concatenate(location, Bytes.of(nibble));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final Bytes childLocation = Bytes.concatenate(location, Bytes.of(nibble));
final byte[] out = new byte[pathDepth + 1];
final byte[] in = location.toArrayUnsafe();
System.arraycopy(in, 0, out, 0, pathDepth);
out[len] = nibble;
final Bytes childLocation = Bytes.wrap(out);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This the JMH benchmark that shows the different in performance. arraycopy_newArray_wrap is the new suggested implementation

Benchmark                                          (locationSize)  Mode  Cnt   Score   Error  Units
BytesConcatenateBenchmark.arraycopy_newArray_wrap               8  avgt   16   6.313 ± 0.321  ns/op
BytesConcatenateBenchmark.arraycopy_newArray_wrap              16  avgt   16   6.150 ± 0.140  ns/op
BytesConcatenateBenchmark.arraycopy_newArray_wrap              32  avgt   16   6.383 ± 0.291  ns/op
BytesConcatenateBenchmark.concat_bytesOf                        8  avgt   16  37.773 ± 6.029  ns/op
BytesConcatenateBenchmark.concat_bytesOf                       16  avgt   16  35.169 ± 2.331  ns/op
BytesConcatenateBenchmark.concat_bytesOf                       32  avgt   16  36.688 ± 1.568  ns/op

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this branch with a clean implementation

for (final Map.Entry<Byte, List<UpdateEntry<V>>> entry : smallGroups.entrySet()) {
final byte nibble = entry.getKey();
final List<UpdateEntry<V>> childUpdates = entry.getValue();
final Bytes childLocation = Bytes.concatenate(location, Bytes.of(nibble));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above, create a method that concatenates based on the underlying array.

Copy link
Contributor

@ahamlat ahamlat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed changes can be addressed in a separate PR as this PR is about state root calculation.
For the instance type (8 Cores / 8 threads) we used in the screenshot below, it shows up to 40% improvement in block processing time. We've seeing less improvement on VMs with less cores (4 cores / 8 threads).
Image

@matkt matkt enabled auto-merge (squash) January 23, 2026 06:20
@matkt matkt merged commit 2f6d7f2 into hyperledger:main Jan 23, 2026
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants