Skip to content

Conversation

mkurnikov
Copy link
Collaborator

@mkurnikov mkurnikov commented Oct 9, 2025

No PR:

"move_vm_runtime::interpreter::<impl move_vm_runtime::frame::Frame>::execute_code_impl" [21551, 21500, 21260, 21358, 20950]
"move_vm_runtime::interpreter::InterpreterImpl<LoaderImpl>::execute_main" [5613, 5966, 5909]
"<move_vm_runtime::runtime_type_checks::FullRuntimeTypeCheck as move_vm_runtime::runtime_type_checks::RuntimeTypeCheck>::post_execution_type_stack_transition" [6065]
"<move_vm_runtime::runtime_type_checks::FullRuntimeTypeCheck as move_vm_runtime::runtime_type_checks::RuntimeTypeCheck>::pre_execution_type_stack_transition" [382]

Time: 211.330649ms

No PR + force-inline:

"move_vm_runtime::interpreter::<impl move_vm_runtime::frame::Frame>::execute_code_impl" [50737, 78432, 76243, 78095, 50323]
"move_vm_runtime::interpreter::InterpreterImpl<LoaderImpl>::execute_main" [12476, 14612, 16349, 10668, 15971]
(can't measure other ones, they are inlined)

Time: 193.759057ms

With this PR (first two commits):

"move_vm_runtime::interpreter::<impl move_vm_runtime::frame::Frame>::execute_code_impl" [22544, 22267, 22899, 22171, 22457]
"move_vm_runtime::interpreter::InterpreterImpl<LoaderImpl>::execute_main" [6374, 6694, 6671]

With this PR (first two commits) + fully enabled remaining force-inline:

"move_vm_runtime::interpreter::<impl move_vm_runtime::frame::Frame>::execute_code_impl" [35691, 35798, 32380, 33664, 35465]
"move_vm_runtime::interpreter::InterpreterImpl<LoaderImpl>::execute_main" [11612, 13760, 15406, 9882, 15171]
(execute_main gets 66% more mononorphizations)
"<move_vm_runtime::runtime_type_checks::FullRuntimeTypeCheck as move_vm_runtime::runtime_type_checks::RuntimeTypeCheck>::post_execution_type_stack_transition" [20084]
"<move_vm_runtime::runtime_type_checks::FullRuntimeTypeCheck as move_vm_runtime::runtime_type_checks::RuntimeTypeCheck>::pre_execution_type_stack_transition" [1274]

With this PR (all commits):

"move_vm_runtime::interpreter::<impl move_vm_runtime::frame::Frame>::execute_code_impl" [22544, 22267, 22899, 22171, 22457]
"move_vm_runtime::interpreter::InterpreterImpl<LoaderImpl>::execute_main" [6265, 6094, 7452, 7775, 7722]
"<move_vm_runtime::runtime_type_checks::FullRuntimeTypeCheck as move_vm_runtime::runtime_type_checks::RuntimeTypeCheck>::post_execution_type_stack_transition" [6065]
"<move_vm_runtime::runtime_type_checks::FullRuntimeTypeCheck as move_vm_runtime::runtime_type_checks::RuntimeTypeCheck>::pre_execution_type_stack_transition" [504]

Time: 195.773863ms

With this PR (all commits) + force_inline:

"move_vm_runtime::interpreter::<impl move_vm_runtime::frame::Frame>::execute_code_impl" [35691, 35798, 32380, 33664, 35465]
"move_vm_runtime::interpreter::InterpreterImpl<LoaderImpl>::execute_main" [8391, 8437, 10081, 6875, 9863]

Time: 194.416079ms

Numbers in the square brackets are number of ASM instructions. (It's actually ASM instructions + debug symbols, but irrelevant here).

UPD.
cargo build --bin aptos compilation times:

Without: 3.16m
With: 3.39m

(so I guess ASM lines IS a good proxy for compilation time).

So in the end, I achieve 80-90% of the original performance improvement for 10% of the compilation time.

@mkurnikov mkurnikov requested a review from vgao1996 October 9, 2025 16:19
Copy link
Contributor

@vgao1996 vgao1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the overall direction!

As we discussed, let me double check the compilation time in release mode.

@mkurnikov mkurnikov force-pushed the disable-some-force-inline branch from ba3d5f3 to 8b6df7c Compare October 10, 2025 17:24
@mkurnikov
Copy link
Collaborator Author

I reverted to your feature flag most of the inlines that wasn't necessary (didn't contribute much), need to re-profile now, perf results could've changed a bit.

@mkurnikov
Copy link
Collaborator Author

mkurnikov commented Oct 11, 2025

Re-profiled, this achieves 6-7% improvement for 5.5% increase in execute_code_impl function size, and there's about 20% change in the cargo build --release --bin aptos time if inlining is enabled as a default for all usages of the packages.

@mkurnikov mkurnikov force-pushed the disable-some-force-inline branch from 8b6df7c to 453d350 Compare October 14, 2025 21:48
Copy link
Contributor

@vgao1996 vgao1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me! Please address the comments from both @georgemitenkov and me

@mkurnikov mkurnikov force-pushed the disable-some-force-inline branch from 453d350 to cadb963 Compare October 17, 2025 15:42
@mkurnikov
Copy link
Collaborator Author

Ok, so I addresses all comments.

The perf results are 2.5-3% improvement:

  • baseline - 477 tps
  • with PR - 490 tps

Enabling force-inline give me 3-3.5%, so it's 70-80% of the original.
Note that benchmark noise is +- 0.7%, so the bottom line is that PR is consistent 2% improvement.

Compilation time difference is only around 5%, might be just noise. The difference from the previous results is because I disabled inlining for the set_new_call_frame - it only decreased performance, and it was the biggest culprit in compilation time.

@mkurnikov mkurnikov force-pushed the disable-some-force-inline branch from cadb963 to a514c7b Compare October 17, 2025 16:27
@mkurnikov mkurnikov force-pushed the disable-some-force-inline branch from a514c7b to e15edfe Compare October 17, 2025 16:28
@mkurnikov mkurnikov enabled auto-merge (squash) October 17, 2025 16:32

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on cef91202017aab6af466f948737fcd7d97dc7d13 ==> e15edfe79372c45eda22ad00662751a76d45d294

Compatibility test results for cef91202017aab6af466f948737fcd7d97dc7d13 ==> e15edfe79372c45eda22ad00662751a76d45d294 (PR)
1. Check liveness of validators at old version: cef91202017aab6af466f948737fcd7d97dc7d13
compatibility::simple-validator-upgrade::liveness-check : committed: 14310.38 txn/s, latency: 2419.19 ms, (p50: 2500 ms, p70: 2600, p90: 2800 ms, p99: 3800 ms), latency samples: 465500
2. Upgrading first Validator to new version: e15edfe79372c45eda22ad00662751a76d45d294
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 5314.11 txn/s, latency: 6489.07 ms, (p50: 7200 ms, p70: 7200, p90: 7300 ms, p99: 7400 ms), latency samples: 182640
3. Upgrading rest of first batch to new version: e15edfe79372c45eda22ad00662751a76d45d294
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 5347.76 txn/s, latency: 6460.85 ms, (p50: 7200 ms, p70: 7300, p90: 7300 ms, p99: 7400 ms), latency samples: 183780
4. upgrading second batch to new version: e15edfe79372c45eda22ad00662751a76d45d294
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 8328.65 txn/s, latency: 4104.38 ms, (p50: 4500 ms, p70: 4600, p90: 4600 ms, p99: 4800 ms), latency samples: 276100
5. check swarm health
Compatibility test for cef91202017aab6af466f948737fcd7d97dc7d13 ==> e15edfe79372c45eda22ad00662751a76d45d294 passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on e15edfe79372c45eda22ad00662751a76d45d294

two traffics test: inner traffic : committed: 13770.31 txn/s, latency: 2734.64 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3600 ms), latency samples: 5235920
two traffics test : committed: 99.99 txn/s, latency: 804.71 ms, (p50: 700 ms, p70: 800, p90: 900 ms, p99: 2500 ms), latency samples: 1760
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 2.216, avg: 2.103", "ConsensusProposalToOrdered: max: 0.165, avg: 0.163", "ConsensusOrderedToCommit: max: 0.090, avg: 0.071", "ConsensusProposalToCommit: max: 0.250, avg: 0.235"]
Max non-epoch-change gap was: 1 rounds at version 62095 (avg 0.00) [limit 4], 1.16s no progress at version 62095 (avg 0.07s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.29s no progress at version 2210946 (avg 0.29s) [limit 16].
Test Ok

Copy link
Contributor

✅ Forge suite framework_upgrade success on cef91202017aab6af466f948737fcd7d97dc7d13 ==> e15edfe79372c45eda22ad00662751a76d45d294

Compatibility test results for cef91202017aab6af466f948737fcd7d97dc7d13 ==> e15edfe79372c45eda22ad00662751a76d45d294 (PR)
Upgrade the nodes to version: e15edfe79372c45eda22ad00662751a76d45d294
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2070.95 txn/s, submitted: 2078.89 txn/s, failed submission: 7.94 txn/s, expired: 7.94 txn/s, latency: 1409.46 ms, (p50: 1500 ms, p70: 1500, p90: 1800 ms, p99: 2100 ms), latency samples: 187761
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1923.70 txn/s, submitted: 1930.31 txn/s, failed submission: 6.61 txn/s, expired: 6.61 txn/s, latency: 1500.97 ms, (p50: 1500 ms, p70: 1800, p90: 2100 ms, p99: 2800 ms), latency samples: 174582
5. check swarm health
Compatibility test for cef91202017aab6af466f948737fcd7d97dc7d13 ==> e15edfe79372c45eda22ad00662751a76d45d294 passed
Upgrade the remaining nodes to version: e15edfe79372c45eda22ad00662751a76d45d294
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2210.38 txn/s, submitted: 2218.11 txn/s, failed submission: 7.73 txn/s, expired: 7.73 txn/s, latency: 1347.47 ms, (p50: 1200 ms, p70: 1500, p90: 1700 ms, p99: 2100 ms), latency samples: 200081
Test Ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants