[Hot State] Compute root hash for hot state #18390

wqfish · 2025-12-27T04:41:54Z

This commit implements the logic to compute root hashes for hot state. It's an initial version with a number of TODO items, but it's a reasonable start. Most of the logic is gated behind a flag so we are not enabling this in mainnet yet, until we run things in testnet for some time and gain more confidence.

How this roughly works:

State::update computes the changes to hot state. They are saved in ExecutionOutput.
The above changes are passed to StateSummary::update and used to compute in-memory SparseMerkleTree.
The resulting Merkle trees are committed to persisted database (hot_state_merkle_db introduced in [Hot State] Add a separate StateMerkleDb for hot state #18385) so the proofs can be used in turn for future StateSummary::update.

Note

Implements hot state Merkle root computation and persistence, plus plumbs a new config across storage and execution.

Introduces HotStateConfig { max_items_per_shard, delete_on_restart, compute_root_hash } and replaces reset_hot_state param in AptosDB::open; updates all call sites and tools/tests
Execution pipeline now emits HotStateUpdates; State::update produces hot-state changes; StateSummary::update computes both hot/global SMTs; state checkpointing commits hot and cold JMT batches (to hot_state_merkle_db and main JMT) and validates usage
Adds gating: compute_root_hash default true but auto-disabled on Mainnet unless explicitly enabled; readonly/debug paths set delete_on_restart=false
Storage init/restore and proofs updated (e.g., null node handling, hot/cold proof selection); API/types extended accordingly

^{Written by Cursor Bugbot for commit 389074b. This will update automatically on new commits. Configure here.}

In #18390, we simply set the format of the hot state Merkle tree to `Map<StateKey, StateValue>`. This means that for `HotVacant` entries, for instance, entries that are read recently and determined to not exist in storage, we do not hash them into the tree. (And if they existed before, we remove them from the Merkle tree.) This is inaccurate. This commit introduces the `HotStateValue` struct and changes the format to `Map<StateKey, HotStateValue>`. This way, both `HotOccupied` and `HotVacant` entries are summarized into the root hash.

In #18390, we simply set the format of the hot state Merkle tree to `Map<StateKey, StateValue>`. This means that for `HotVacant` entries, for instance, entries that are read recently and determined to not exist in storage, we do not hash them into the tree. (And if they existed before, we remove them from the Merkle tree.) This is inaccurate. This commit introduces the `HotStateValue` struct and changes the format to `Map<StateKey, HotStateValue>`. This way, both `HotOccupied` and `HotVacant` entries are summarized into the root hash. In addition, we now check refresh interval and avoid refreshing read-only keys every block.

It should only be logged once instead of every time a message is received.

This commit implements the logic to compute root hashes for hot state. It's an initial version with a number of TODO items, but it's a reasonable start. Most of the logic is gated behind a flag so we are not enabling this in mainnet yet, until we run things in testnet for some time and gain more confidence. How this roughly works: - `State::update` computes the changes to hot state. They are saved in `ExecutionOutput`. - The above changes are passed to `StateSummary::update` and used to compute in-memory `SparseMerkleTree`. - The resulting Merkle trees are committed to persisted database (`hot_state_merkle_db` introduced in #18385) so the proofs can be used in turn for future `StateSummary::update`.

github-actions · 2026-01-14T20:19:33Z

✅ Forge suite `framework_upgrade` success on `a3ff6eef75d8e0a24caea52de8522a4e28bd1873` ==> `389074b56ad2337eb573df66c448fb0c7002239f`

Compatibility test results for a3ff6eef75d8e0a24caea52de8522a4e28bd1873 ==> 389074b56ad2337eb573df66c448fb0c7002239f (PR)
Upgrade the nodes to version: 389074b56ad2337eb573df66c448fb0c7002239f
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 542.22 txn/s, submitted: 543.81 txn/s, failed submission: 1.59 txn/s, expired: 1.59 txn/s, latency: 8199.04 ms, (p50: 11100 ms, p70: 12000, p90: 12600 ms, p99: 22600 ms), latency samples: 47761
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 837.65 txn/s, submitted: 840.96 txn/s, failed submission: 3.32 txn/s, expired: 3.32 txn/s, latency: 4885.53 ms, (p50: 1200 ms, p70: 11100, p90: 12000 ms, p99: 19800 ms), latency samples: 65640
5. check swarm health
Compatibility test for a3ff6eef75d8e0a24caea52de8522a4e28bd1873 ==> 389074b56ad2337eb573df66c448fb0c7002239f passed
Upgrade the remaining nodes to version: 389074b56ad2337eb573df66c448fb0c7002239f
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1442.55 txn/s, submitted: 1446.07 txn/s, failed submission: 3.52 txn/s, expired: 3.52 txn/s, latency: 2078.48 ms, (p50: 1400 ms, p70: 1500, p90: 2000 ms, p99: 11800 ms), latency samples: 131260
Test Ok

github-actions · 2026-01-14T20:20:21Z

✅ Forge suite `realistic_env_max_load` success on `389074b56ad2337eb573df66c448fb0c7002239f`

two traffics test: inner traffic : committed: 13628.90 txn/s, latency: 2764.96 ms, (p50: 2700 ms, p70: 2900, p90: 3100 ms, p99: 3600 ms), latency samples: 5075740
two traffics test : committed: 100.02 txn/s, latency: 760.59 ms, (p50: 700 ms, p70: 800, p90: 900 ms, p99: 1200 ms), latency samples: 1640
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 2.275, avg: 2.171", "ConsensusProposalToOrdered: max: 0.168, avg: 0.165", "ConsensusOrderedToCommit: max: 0.045, avg: 0.042", "ConsensusProposalToCommit: max: 0.213, avg: 0.207"]
Max non-epoch-change gap was: 1 rounds at version 39046 (avg 0.00) [limit 4], 1.16s no progress at version 39046 (avg 0.07s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.31s no progress at version 4827365 (avg 0.29s) [limit 16].
Test Ok

github-actions · 2026-01-14T20:33:22Z

✅ Forge suite `compat` success on `a3ff6eef75d8e0a24caea52de8522a4e28bd1873` ==> `389074b56ad2337eb573df66c448fb0c7002239f`

Forge report malformed: Expecting ',' delimiter: line 18 column 6 (char 463)
'{\n  "metrics": [\n    {\n      "test_name": "compatibility::simple-validator-upgrade::liveness-check",\n      "metric": "submitted_txn",\n      "value": 414720.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::liveness-check",\n      "metric": "expired_txn",\n      "value": 0.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::liveness-check",\n      "metric": "avg_tps",\n      "value": 12556.454017240727\n    },\n[2026-01-14T20:33:18Z INFO  aptos_forge::report] Test Ok\n    {\n      "test_name": "compatibility::simple-validator-upgrade::liveness-check",\n      "metric": "avg_latency",\n      "value": 2762.8389588155865\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::liveness-check",\n      "metric": "p50_latency",\n      "value": 2800.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::liveness-check",\n      "metric": "p90_latency",\n      "value": 3600.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::liveness-check",\n      "metric": "p99_latency",\n      "value": 4000.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::single-validator-upgrade",\n      "metric": "submitted_txn",\n      "value": 197120.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::single-validator-upgrade",\n      "metric": "expired_txn",\n      "value": 0.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::single-validator-upgrade",\n      "metric": "avg_tps",\n      "value": 5796.226114673425\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::single-validator-upgrade",\n      "metric": "avg_latency",\n      "value": 5892.972377232143\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::single-validator-upgrade",\n      "metric": "p50_latency",\n      "value": 6400.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::single-validator-upgrade",\n      "metric": "p90_latency",\n      "value": 6900.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::single-validator-upgrade",\n      "metric": "p99_latency",\n      "value": 6900.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::half-validator-upgrade",\n      "metric": "submitted_txn",\n      "value": 200320.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::half-validator-upgrade",\n      "metric": "expired_txn",\n      "value": 0.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::half-validator-upgrade",\n      "metric": "avg_tps",\n      "value": 5825.2482629034075\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::half-validator-upgrade",\n      "metric": "avg_latency",\n      "value": 5812.28981629393\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::half-validator-upgrade",\n      "metric": "p50_latency",\n      "value": 6500.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::half-validator-upgrade",\n      "metric": "p90_latency",\n      "value": 6700.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::half-validator-upgrade",\n      "metric": "p99_latency",\n      "value": 6800.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::rest-validator-upgrade",\n      "metric": "submitted_txn",\n      "value": 322780.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::rest-validator-upgrade",\n      "metric": "expired_txn",\n      "value": 0.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::rest-validator-upgrade",\n      "metric": "avg_tps",\n      "value": 9744.582367496918\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::rest-validator-upgrade",\n      "metric": "avg_latency",\n      "value": 3359.542084391846\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::rest-validator-upgrade",\n      "metric": "p50_latency",\n      "value": 3300.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::rest-validator-upgrade",\n      "metric": "p90_latency",\n      "value": 4600.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::rest-validator-upgrade",\n      "metric": "p99_latency",\n      "value": 5100.0\n    }\n  ],\n  "text": "Compatibility test results for a3ff6eef75d8e0a24caea52de8522a4e28bd1873 ==> 389074b56ad2337eb573df66c448fb0c7002239f (PR)\\n1. Check liveness of validators at old version: a3ff6eef75d8e0a24caea52de8522a4e28bd1873\\ncompatibility::simple-validator-upgrade::liveness-check : committed: 12556.45 txn/s, latency: 2762.84 ms, (p50: 2800 ms, p70: 3000, p90: 3600 ms, p99: 4000 ms), latency samples: 414720\\n2. Upgrading first Validator to new version: 389074b56ad2337eb573df66c448fb0c7002239f\\ncompatibility::simple-validator-upgrade::single-validator-upgrade : committed: 5796.23 txn/s, latency: 5892.97 ms, (p50: 6400 ms, p70: 6700, p90: 6900 ms, p99: 6900 ms), latency samples: 197120\\n3. Upgrading rest of first batch to new version: 389074b56ad2337eb573df66c448fb0c7002239f\\ncompatibility::simple-validator-upgrade::half-validator-upgrade : committed: 5825.25 txn/s, latency: 5812.29 ms, (p50: 6500 ms, p70: 6600, p90: 6700 ms, p99: 6800 ms), latency samples: 200320\\n4. upgrading second batch to new version: 389074b56ad2337eb573df66c448fb0c7002239f\\ncompatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 9744.58 txn/s, latency: 3359.54 ms, (p50: 3300 ms, p70: 3700, p90: 4600 ms, p99: 5100 ms), latency samples: 322780\\n5. check swarm health\\nCompatibility test for a3ff6eef75d8e0a24caea52de8522a4e28bd1873 ==> 389074b56ad2337eb573df66c448fb0c7002239f passed\\nTest Ok"\n}'
Trailing Log Lines:
2. Upgrading first Validator to new version: 389074b56ad2337eb573df66c448fb0c7002239f
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 5796.23 txn/s, latency: 5892.97 ms, (p50: 6400 ms, p70: 6700, p90: 6900 ms, p99: 6900 ms), latency samples: 197120
3. Upgrading rest of first batch to new version: 389074b56ad2337eb573df66c448fb0c7002239f
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 5825.25 txn/s, latency: 5812.29 ms, (p50: 6500 ms, p70: 6600, p90: 6700 ms, p99: 6800 ms), latency samples: 200320
4. upgrading second batch to new version: 389074b56ad2337eb573df66c448fb0c7002239f
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 9744.58 txn/s, latency: 3359.54 ms, (p50: 3300 ms, p70: 3700, p90: 4600 ms, p99: 5100 ms), latency samples: 322780
5. check swarm health
Compatibility test for a3ff6eef75d8e0a24caea52de8522a4e28bd1873 ==> 389074b56ad2337eb573df66c448fb0c7002239f passed
Test Ok

=== BEGIN JUNIT ===
<?xml version="1.0" encoding="UTF-8"?>
<testsuites name="forge" tests="1" failures="0" errors="0" uuid="664e9602-2f92-4d45-995a-ae52585ce2d1">
    <testsuite name="compat" tests="1" disabled="0" errors="0" failures="0">
        <testcase name="compatibility::simple-validator-upgrade">
        </testcase>
    </testsuite>
</testsuites>
=== END JUNIT ===
[2026-01-14T20:33:18Z INFO  aptos_forge::backend::k8s::cluster_helper] Deleting namespace forge-compat-pr-18390: Some(NamespaceStatus { conditions: None, phase: Some("Terminating") })
[2026-01-14T20:33:18Z INFO  aptos_forge::backend::k8s::cluster_helper] aptos-node resources for Forge removed in namespace: forge-compat-pr-18390

test result: ok. 1 passed; 0 soft failed; 0 hard failed; 0 filtered out

Debugging output:
NAME                                         READY   STATUS      RESTARTS   AGE
aptos-node-0-validator-0                     1/1     Running     0          4m28s
aptos-node-1-validator-0                     1/1     Running     0          6m2s
aptos-node-2-validator-0                     1/1     Running     0          2m43s
aptos-node-3-validator-0                     1/1     Running     0          104s
forge-testnet-deployer-28js7                 0/1     Completed   0          9m3s
genesis-aptos-genesis-eforgefc98eeab-mxkpm   0/1     Completed   0          8m25s

In #18390, we simply set the format of the hot state Merkle tree to `Map<StateKey, StateValue>`. This means that for `HotVacant` entries, for instance, entries that are read recently and determined to not exist in storage, we do not hash them into the tree. (And if they existed before, we remove them from the Merkle tree.) This is inaccurate. This commit introduces the `HotStateValue` struct and changes the format to `Map<StateKey, HotStateValue>`. This way, both `HotOccupied` and `HotVacant` entries are summarized into the root hash. In addition, we now check refresh interval and avoid refreshing read-only keys every block.

This was referenced Dec 27, 2025

[Hot State] Use config to replace hard-coded parameters #18366

Merged

[Hot State] Delete unused code #18365

Merged

[Layered Map] Expose inner layers #18353

Merged

[Hot State] Add a separate StateMerkleDb for hot state #18385

Merged

wqfish added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Dec 27, 2025

wqfish force-pushed the pr18390 branch from 767f972 to dc161b5 Compare December 27, 2025 20:14

wqfish changed the title ~~[Hot State] Save insertions and evictions in ExecutionOutput~~ [Hot State] Compute root hash for hot state Dec 27, 2025

This comment has been minimized.

Sign in to view

wqfish force-pushed the pr18390 branch from dc161b5 to d3d915e Compare December 28, 2025 00:20

This comment has been minimized.

Sign in to view

wqfish force-pushed the pr18390 branch from d3d915e to cdeea67 Compare December 28, 2025 01:49

This comment has been minimized.

Sign in to view

wqfish force-pushed the pr18390 branch from cdeea67 to ccc651f Compare December 28, 2025 03:14

This comment has been minimized.

Sign in to view

wqfish force-pushed the pr18390 branch 3 times, most recently from f1dac58 to dd9df3a Compare January 1, 2026 19:28

This comment has been minimized.

Sign in to view

grao1991 approved these changes Jan 14, 2026

View reviewed changes

lightmark approved these changes Jan 14, 2026

View reviewed changes

wqfish added 2 commits January 14, 2026 19:35

[Storage] Move log out of loop in StateSnapshotCommitter

6153c30

It should only be logged once instead of every time a message is received.

wqfish force-pushed the pr18390 branch from 7e7f2f0 to 389074b Compare January 14, 2026 19:36

wqfish enabled auto-merge (rebase) January 14, 2026 19:38

This comment has been minimized.

Sign in to view

wqfish merged commit eac80f9 into main Jan 14, 2026
123 of 151 checks passed

wqfish deleted the pr18390 branch January 14, 2026 20:33

[Hot State] Compute root hash for hot state #18390

[Hot State] Compute root hash for hot state #18390

Conversation

wqfish commented Dec 27, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Jan 14, 2026

✅ Forge suite framework_upgrade success on a3ff6eef75d8e0a24caea52de8522a4e28bd1873 ==> 389074b56ad2337eb573df66c448fb0c7002239f

Uh oh!

github-actions bot commented Jan 14, 2026

✅ Forge suite realistic_env_max_load success on 389074b56ad2337eb573df66c448fb0c7002239f

Uh oh!

This comment has been minimized.

github-actions bot commented Jan 14, 2026

✅ Forge suite compat success on a3ff6eef75d8e0a24caea52de8522a4e28bd1873 ==> 389074b56ad2337eb573df66c448fb0c7002239f

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wqfish commented Dec 27, 2025 •

edited by cursor bot

Loading

✅ Forge suite `framework_upgrade` success on `a3ff6eef75d8e0a24caea52de8522a4e28bd1873` ==> `389074b56ad2337eb573df66c448fb0c7002239f`

✅ Forge suite `realistic_env_max_load` success on `389074b56ad2337eb573df66c448fb0c7002239f`

✅ Forge suite `compat` success on `a3ff6eef75d8e0a24caea52de8522a4e28bd1873` ==> `389074b56ad2337eb573df66c448fb0c7002239f`