Skip to content

Conversation

@wqfish
Copy link
Contributor

@wqfish wqfish commented Dec 27, 2025

This commit implements the logic to compute root hashes for hot state. It's an initial version with a number of TODO items, but it's a reasonable start. Most of the logic is gated behind a flag so we are not enabling this in mainnet yet, until we run things in testnet for some time and gain more confidence.

How this roughly works:

  • State::update computes the changes to hot state. They are saved in ExecutionOutput.
  • The above changes are passed to StateSummary::update and used to compute in-memory SparseMerkleTree.
  • The resulting Merkle trees are committed to persisted database (hot_state_merkle_db introduced in [Hot State] Add a separate StateMerkleDb for hot state #18385) so the proofs can be used in turn for future StateSummary::update.

Note

Implements hot state Merkle root computation and persistence, plus plumbs a new config across storage and execution.

  • Introduces HotStateConfig { max_items_per_shard, delete_on_restart, compute_root_hash } and replaces reset_hot_state param in AptosDB::open; updates all call sites and tools/tests
  • Execution pipeline now emits HotStateUpdates; State::update produces hot-state changes; StateSummary::update computes both hot/global SMTs; state checkpointing commits hot and cold JMT batches (to hot_state_merkle_db and main JMT) and validates usage
  • Adds gating: compute_root_hash default true but auto-disabled on Mainnet unless explicitly enabled; readonly/debug paths set delete_on_restart=false
  • Storage init/restore and proofs updated (e.g., null node handling, hot/cold proof selection); API/types extended accordingly

Written by Cursor Bugbot for commit 389074b. This will update automatically on new commits. Configure here.

@wqfish wqfish added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Dec 27, 2025
@wqfish wqfish changed the title [Hot State] Save insertions and evictions in ExecutionOutput [Hot State] Compute root hash for hot state Dec 27, 2025
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@wqfish wqfish force-pushed the pr18390 branch 3 times, most recently from f1dac58 to dd9df3a Compare January 1, 2026 19:28
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

wqfish added a commit that referenced this pull request Jan 12, 2026
In #18390, we simply set the format of the hot state Merkle tree to
`Map<StateKey, StateValue>`. This means that for `HotVacant` entries, for
instance, entries that are read recently and determined to not exist in storage,
we do not hash them into the tree. (And if they existed before, we remove them
from the Merkle tree.) This is inaccurate.

This commit introduces the `HotStateValue` struct and changes the format to
`Map<StateKey, HotStateValue>`. This way, both `HotOccupied` and `HotVacant`
entries are summarized into the root hash.
wqfish added a commit that referenced this pull request Jan 12, 2026
In #18390, we simply set the format of the hot state Merkle tree to
`Map<StateKey, StateValue>`. This means that for `HotVacant` entries, for
instance, entries that are read recently and determined to not exist in storage,
we do not hash them into the tree. (And if they existed before, we remove them
from the Merkle tree.) This is inaccurate.

This commit introduces the `HotStateValue` struct and changes the format to
`Map<StateKey, HotStateValue>`. This way, both `HotOccupied` and `HotVacant`
entries are summarized into the root hash.
wqfish added a commit that referenced this pull request Jan 13, 2026
In #18390, we simply set the format of the hot state Merkle tree to
`Map<StateKey, StateValue>`. This means that for `HotVacant` entries, for
instance, entries that are read recently and determined to not exist in storage,
we do not hash them into the tree. (And if they existed before, we remove them
from the Merkle tree.) This is inaccurate.

This commit introduces the `HotStateValue` struct and changes the format to
`Map<StateKey, HotStateValue>`. This way, both `HotOccupied` and `HotVacant`
entries are summarized into the root hash.

In addition, we now check refresh interval and avoid refreshing read-only keys
every block.
wqfish added a commit that referenced this pull request Jan 13, 2026
In #18390, we simply set the format of the hot state Merkle tree to
`Map<StateKey, StateValue>`. This means that for `HotVacant` entries, for
instance, entries that are read recently and determined to not exist in storage,
we do not hash them into the tree. (And if they existed before, we remove them
from the Merkle tree.) This is inaccurate.

This commit introduces the `HotStateValue` struct and changes the format to
`Map<StateKey, HotStateValue>`. This way, both `HotOccupied` and `HotVacant`
entries are summarized into the root hash.

In addition, we now check refresh interval and avoid refreshing read-only keys
every block.
wqfish added a commit that referenced this pull request Jan 13, 2026
In #18390, we simply set the format of the hot state Merkle tree to
`Map<StateKey, StateValue>`. This means that for `HotVacant` entries, for
instance, entries that are read recently and determined to not exist in storage,
we do not hash them into the tree. (And if they existed before, we remove them
from the Merkle tree.) This is inaccurate.

This commit introduces the `HotStateValue` struct and changes the format to
`Map<StateKey, HotStateValue>`. This way, both `HotOccupied` and `HotVacant`
entries are summarized into the root hash.

In addition, we now check refresh interval and avoid refreshing read-only keys
every block.
It should only be logged once instead of every time a message is received.
This commit implements the logic to compute root hashes for hot state. It's an initial version with a number of TODO items, but it's a reasonable start. Most of the logic is gated behind a flag so we are not enabling this in mainnet yet, until we run things in testnet for some time and gain more confidence.

How this roughly works:
- `State::update` computes the changes to hot state. They are saved in `ExecutionOutput`.
- The above changes are passed to `StateSummary::update` and used to compute in-memory `SparseMerkleTree`.
- The resulting Merkle trees are committed to persisted database (`hot_state_merkle_db` introduced in #18385) so the proofs can be used in turn for future `StateSummary::update`.
@wqfish wqfish enabled auto-merge (rebase) January 14, 2026 19:38
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

✅ Forge suite framework_upgrade success on a3ff6eef75d8e0a24caea52de8522a4e28bd1873 ==> 389074b56ad2337eb573df66c448fb0c7002239f

Compatibility test results for a3ff6eef75d8e0a24caea52de8522a4e28bd1873 ==> 389074b56ad2337eb573df66c448fb0c7002239f (PR)
Upgrade the nodes to version: 389074b56ad2337eb573df66c448fb0c7002239f
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 542.22 txn/s, submitted: 543.81 txn/s, failed submission: 1.59 txn/s, expired: 1.59 txn/s, latency: 8199.04 ms, (p50: 11100 ms, p70: 12000, p90: 12600 ms, p99: 22600 ms), latency samples: 47761
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 837.65 txn/s, submitted: 840.96 txn/s, failed submission: 3.32 txn/s, expired: 3.32 txn/s, latency: 4885.53 ms, (p50: 1200 ms, p70: 11100, p90: 12000 ms, p99: 19800 ms), latency samples: 65640
5. check swarm health
Compatibility test for a3ff6eef75d8e0a24caea52de8522a4e28bd1873 ==> 389074b56ad2337eb573df66c448fb0c7002239f passed
Upgrade the remaining nodes to version: 389074b56ad2337eb573df66c448fb0c7002239f
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1442.55 txn/s, submitted: 1446.07 txn/s, failed submission: 3.52 txn/s, expired: 3.52 txn/s, latency: 2078.48 ms, (p50: 1400 ms, p70: 1500, p90: 2000 ms, p99: 11800 ms), latency samples: 131260
Test Ok

@github-actions
Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 389074b56ad2337eb573df66c448fb0c7002239f

two traffics test: inner traffic : committed: 13628.90 txn/s, latency: 2764.96 ms, (p50: 2700 ms, p70: 2900, p90: 3100 ms, p99: 3600 ms), latency samples: 5075740
two traffics test : committed: 100.02 txn/s, latency: 760.59 ms, (p50: 700 ms, p70: 800, p90: 900 ms, p99: 1200 ms), latency samples: 1640
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 2.275, avg: 2.171", "ConsensusProposalToOrdered: max: 0.168, avg: 0.165", "ConsensusOrderedToCommit: max: 0.045, avg: 0.042", "ConsensusProposalToCommit: max: 0.213, avg: 0.207"]
Max non-epoch-change gap was: 1 rounds at version 39046 (avg 0.00) [limit 4], 1.16s no progress at version 39046 (avg 0.07s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.31s no progress at version 4827365 (avg 0.29s) [limit 16].
Test Ok

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

✅ Forge suite compat success on a3ff6eef75d8e0a24caea52de8522a4e28bd1873 ==> 389074b56ad2337eb573df66c448fb0c7002239f

Forge report malformed: Expecting ',' delimiter: line 18 column 6 (char 463)
'{\n  "metrics": [\n    {\n      "test_name": "compatibility::simple-validator-upgrade::liveness-check",\n      "metric": "submitted_txn",\n      "value": 414720.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::liveness-check",\n      "metric": "expired_txn",\n      "value": 0.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::liveness-check",\n      "metric": "avg_tps",\n      "value": 12556.454017240727\n    },\n[2026-01-14T20:33:18Z INFO  aptos_forge::report] Test Ok\n    {\n      "test_name": "compatibility::simple-validator-upgrade::liveness-check",\n      "metric": "avg_latency",\n      "value": 2762.8389588155865\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::liveness-check",\n      "metric": "p50_latency",\n      "value": 2800.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::liveness-check",\n      "metric": "p90_latency",\n      "value": 3600.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::liveness-check",\n      "metric": "p99_latency",\n      "value": 4000.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::single-validator-upgrade",\n      "metric": "submitted_txn",\n      "value": 197120.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::single-validator-upgrade",\n      "metric": "expired_txn",\n      "value": 0.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::single-validator-upgrade",\n      "metric": "avg_tps",\n      "value": 5796.226114673425\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::single-validator-upgrade",\n      "metric": "avg_latency",\n      "value": 5892.972377232143\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::single-validator-upgrade",\n      "metric": "p50_latency",\n      "value": 6400.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::single-validator-upgrade",\n      "metric": "p90_latency",\n      "value": 6900.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::single-validator-upgrade",\n      "metric": "p99_latency",\n      "value": 6900.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::half-validator-upgrade",\n      "metric": "submitted_txn",\n      "value": 200320.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::half-validator-upgrade",\n      "metric": "expired_txn",\n      "value": 0.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::half-validator-upgrade",\n      "metric": "avg_tps",\n      "value": 5825.2482629034075\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::half-validator-upgrade",\n      "metric": "avg_latency",\n      "value": 5812.28981629393\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::half-validator-upgrade",\n      "metric": "p50_latency",\n      "value": 6500.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::half-validator-upgrade",\n      "metric": "p90_latency",\n      "value": 6700.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::half-validator-upgrade",\n      "metric": "p99_latency",\n      "value": 6800.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::rest-validator-upgrade",\n      "metric": "submitted_txn",\n      "value": 322780.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::rest-validator-upgrade",\n      "metric": "expired_txn",\n      "value": 0.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::rest-validator-upgrade",\n      "metric": "avg_tps",\n      "value": 9744.582367496918\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::rest-validator-upgrade",\n      "metric": "avg_latency",\n      "value": 3359.542084391846\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::rest-validator-upgrade",\n      "metric": "p50_latency",\n      "value": 3300.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::rest-validator-upgrade",\n      "metric": "p90_latency",\n      "value": 4600.0\n    },\n    {\n      "test_name": "compatibility::simple-validator-upgrade::rest-validator-upgrade",\n      "metric": "p99_latency",\n      "value": 5100.0\n    }\n  ],\n  "text": "Compatibility test results for a3ff6eef75d8e0a24caea52de8522a4e28bd1873 ==> 389074b56ad2337eb573df66c448fb0c7002239f (PR)\\n1. Check liveness of validators at old version: a3ff6eef75d8e0a24caea52de8522a4e28bd1873\\ncompatibility::simple-validator-upgrade::liveness-check : committed: 12556.45 txn/s, latency: 2762.84 ms, (p50: 2800 ms, p70: 3000, p90: 3600 ms, p99: 4000 ms), latency samples: 414720\\n2. Upgrading first Validator to new version: 389074b56ad2337eb573df66c448fb0c7002239f\\ncompatibility::simple-validator-upgrade::single-validator-upgrade : committed: 5796.23 txn/s, latency: 5892.97 ms, (p50: 6400 ms, p70: 6700, p90: 6900 ms, p99: 6900 ms), latency samples: 197120\\n3. Upgrading rest of first batch to new version: 389074b56ad2337eb573df66c448fb0c7002239f\\ncompatibility::simple-validator-upgrade::half-validator-upgrade : committed: 5825.25 txn/s, latency: 5812.29 ms, (p50: 6500 ms, p70: 6600, p90: 6700 ms, p99: 6800 ms), latency samples: 200320\\n4. upgrading second batch to new version: 389074b56ad2337eb573df66c448fb0c7002239f\\ncompatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 9744.58 txn/s, latency: 3359.54 ms, (p50: 3300 ms, p70: 3700, p90: 4600 ms, p99: 5100 ms), latency samples: 322780\\n5. check swarm health\\nCompatibility test for a3ff6eef75d8e0a24caea52de8522a4e28bd1873 ==> 389074b56ad2337eb573df66c448fb0c7002239f passed\\nTest Ok"\n}'
Trailing Log Lines:
2. Upgrading first Validator to new version: 389074b56ad2337eb573df66c448fb0c7002239f
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 5796.23 txn/s, latency: 5892.97 ms, (p50: 6400 ms, p70: 6700, p90: 6900 ms, p99: 6900 ms), latency samples: 197120
3. Upgrading rest of first batch to new version: 389074b56ad2337eb573df66c448fb0c7002239f
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 5825.25 txn/s, latency: 5812.29 ms, (p50: 6500 ms, p70: 6600, p90: 6700 ms, p99: 6800 ms), latency samples: 200320
4. upgrading second batch to new version: 389074b56ad2337eb573df66c448fb0c7002239f
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 9744.58 txn/s, latency: 3359.54 ms, (p50: 3300 ms, p70: 3700, p90: 4600 ms, p99: 5100 ms), latency samples: 322780
5. check swarm health
Compatibility test for a3ff6eef75d8e0a24caea52de8522a4e28bd1873 ==> 389074b56ad2337eb573df66c448fb0c7002239f passed
Test Ok

=== BEGIN JUNIT ===
<?xml version="1.0" encoding="UTF-8"?>
<testsuites name="forge" tests="1" failures="0" errors="0" uuid="664e9602-2f92-4d45-995a-ae52585ce2d1">
    <testsuite name="compat" tests="1" disabled="0" errors="0" failures="0">
        <testcase name="compatibility::simple-validator-upgrade">
        </testcase>
    </testsuite>
</testsuites>
=== END JUNIT ===
[2026-01-14T20:33:18Z INFO  aptos_forge::backend::k8s::cluster_helper] Deleting namespace forge-compat-pr-18390: Some(NamespaceStatus { conditions: None, phase: Some("Terminating") })
[2026-01-14T20:33:18Z INFO  aptos_forge::backend::k8s::cluster_helper] aptos-node resources for Forge removed in namespace: forge-compat-pr-18390

test result: ok. 1 passed; 0 soft failed; 0 hard failed; 0 filtered out

Debugging output:
NAME                                         READY   STATUS      RESTARTS   AGE
aptos-node-0-validator-0                     1/1     Running     0          4m28s
aptos-node-1-validator-0                     1/1     Running     0          6m2s
aptos-node-2-validator-0                     1/1     Running     0          2m43s
aptos-node-3-validator-0                     1/1     Running     0          104s
forge-testnet-deployer-28js7                 0/1     Completed   0          9m3s
genesis-aptos-genesis-eforgefc98eeab-mxkpm   0/1     Completed   0          8m25s

@wqfish wqfish merged commit eac80f9 into main Jan 14, 2026
123 of 151 checks passed
@wqfish wqfish deleted the pr18390 branch January 14, 2026 20:33
wqfish added a commit that referenced this pull request Jan 14, 2026
In #18390, we simply set the format of the hot state Merkle tree to
`Map<StateKey, StateValue>`. This means that for `HotVacant` entries, for
instance, entries that are read recently and determined to not exist in storage,
we do not hash them into the tree. (And if they existed before, we remove them
from the Merkle tree.) This is inaccurate.

This commit introduces the `HotStateValue` struct and changes the format to
`Map<StateKey, HotStateValue>`. This way, both `HotOccupied` and `HotVacant`
entries are summarized into the root hash.

In addition, we now check refresh interval and avoid refreshing read-only keys
every block.
wqfish added a commit that referenced this pull request Jan 14, 2026
In #18390, we simply set the format of the hot state Merkle tree to
`Map<StateKey, StateValue>`. This means that for `HotVacant` entries, for
instance, entries that are read recently and determined to not exist in storage,
we do not hash them into the tree. (And if they existed before, we remove them
from the Merkle tree.) This is inaccurate.

This commit introduces the `HotStateValue` struct and changes the format to
`Map<StateKey, HotStateValue>`. This way, both `HotOccupied` and `HotVacant`
entries are summarized into the root hash.

In addition, we now check refresh interval and avoid refreshing read-only keys
every block.
wqfish added a commit that referenced this pull request Jan 14, 2026
In #18390, we simply set the format of the hot state Merkle tree to
`Map<StateKey, StateValue>`. This means that for `HotVacant` entries, for
instance, entries that are read recently and determined to not exist in storage,
we do not hash them into the tree. (And if they existed before, we remove them
from the Merkle tree.) This is inaccurate.

This commit introduces the `HotStateValue` struct and changes the format to
`Map<StateKey, HotStateValue>`. This way, both `HotOccupied` and `HotVacant`
entries are summarized into the root hash.

In addition, we now check refresh interval and avoid refreshing read-only keys
every block.
wqfish added a commit that referenced this pull request Jan 14, 2026
In #18390, we simply set the format of the hot state Merkle tree to
`Map<StateKey, StateValue>`. This means that for `HotVacant` entries, for
instance, entries that are read recently and determined to not exist in storage,
we do not hash them into the tree. (And if they existed before, we remove them
from the Merkle tree.) This is inaccurate.

This commit introduces the `HotStateValue` struct and changes the format to
`Map<StateKey, HotStateValue>`. This way, both `HotOccupied` and `HotVacant`
entries are summarized into the root hash.

In addition, we now check refresh interval and avoid refreshing read-only keys
every block.
wqfish added a commit that referenced this pull request Jan 14, 2026
In #18390, we simply set the format of the hot state Merkle tree to
`Map<StateKey, StateValue>`. This means that for `HotVacant` entries, for
instance, entries that are read recently and determined to not exist in storage,
we do not hash them into the tree. (And if they existed before, we remove them
from the Merkle tree.) This is inaccurate.

This commit introduces the `HotStateValue` struct and changes the format to
`Map<StateKey, HotStateValue>`. This way, both `HotOccupied` and `HotVacant`
entries are summarized into the root hash.

In addition, we now check refresh interval and avoid refreshing read-only keys
every block.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants