Skip to content

Conversation

@jsign
Copy link
Collaborator

@jsign jsign commented Dec 5, 2025

🗒️ Description

This PR fixes the test_xcall benchmarks to work with Osaka. test_xcall has a CALL-like variant of the bytecode attack exploiting unchunkified code, highly relevant for zkVM worst-cases.

I think the solution should feel quite natural — I’ll add some review comments.

I checked this is doing what we want with two independent checks:

  1. Hack a bit the Geth codebase to inspect how many distinct bytecodes were accessed during the block execution.
  2. Actually running the worst-case block for zkVM execution and inspecting that indeed the witness to be sent to the guest program indeed is blowed up — since bytecode must be part of the input to the guest program.

For 1. and --block-gas-limit 30M, my Geth reports accessing ~260MiB of bytecodes were the upper limit is ~280MiB (30M/2600 — but this isn’t accounting for glue opcodes, intrinsic costs of splitting in txs, etc). So this looks fine.

For 2. and --block-gas-limit 30M, I also confirmed it really blows up the guest program witness, which is prob the best ultimate confirmation this is doing what we wanted. i.e: the json-encoded files have ~524MiB size which makes sense since bytes are hex encoded (with other things).

🔗 Related Issues or PRs

#1695

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx tox -e static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).

@jsign jsign added the A-test-benchmark Area: Tests Benchmarks—Performance measurement (eg. `tests/benchmark/*`, `p/t/s/e/benchmark/*`) label Dec 5, 2025
@jsign jsign force-pushed the jsign-osaka-test_xcall branch from b757b8e to 3aa28b0 Compare December 5, 2025 15:35
Comment on lines +112 to +114
initcode, factory_address, factory_caller_address = (
_deploy_max_contract_factory(pre, fork)
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted the deployment of the factory contract to a separate function, mainly to avoid such a big method here which hurts readability, but mostly because I think for the upcoming test_extcode_ops I'll fix, that code is duplicated so I can re-use the function and we can delete code.

_deploy_max_contract_factory(pre, fork)
)

# Deploy num_contracts via multiple txs (each capped by tx gas limit).
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the setup phase, we can't create all the required bytecodes in a single tx because this is disallowed by Osaka (unfortuantely for us here).

This means we have to split the creation in multiple transactions. This will have an unavoidable hit in filling performance, since we actually used the previous strategy for that reason.

For 60M it takes ~2m in my machine, which maybe isn't as bad -- but still I'm not sure we can workaround this with Osaka limitations.

Comment on lines 118 to 140
# Rough estimate (rounded down) of contracts per tx based on dominant
# cost factor only. E.g., 17M gas limit + 24KiB contracts = ~3 per tx.
# The goal is to involve the minimum amount of gas pricing to avoid
# complexity and potential brittleness.
# If this estimation is incorrect in the future (i.e. tx gas limit cap)
# is increased or cost per byte, the post-state check will detect it
# and can be adjusted with a more complex formula.
num_contracts_per_tx = fork.transaction_gas_limit_cap() // (
gas_costs.G_CODE_DEPOSIT_BYTE * max_contract_size
)
attack_txs = math.ceil(num_contracts / num_contracts_per_tx)

contracts_deployment_txs = []
for _ in range(attack_txs):
contracts_deployment_txs.append(
Transaction(
to=factory_caller_address,
gas_limit=fork.transaction_gas_limit_cap(),
gas_price=10**6,
data=Hash(num_contracts_per_tx),
sender=pre.fund_eoa(),
)
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I preferred to go heavy in comments for future maintainability.

But basically we do a rough estimation of how many 24KiB contracts we can create per tx, and then do that. Note I do ceil in attack_txs instead of trying to do precise creation, just because it doesn't hurt plus it avoids complexity in this code. Since this is the setup phase and the tx is gas capped, I think it is okay.

The factory contract has a new CALLDATA parameter which is how many contracts to create. Although we might only be interested in this being always the max, I think it is better to leave it generic so we can try reusing it as much as possible. It just changes the contract creation loop count to whatever CALLDATALOAD says.

Comment on lines +142 to +159
post = {}
for i in range(num_contracts):
deployed_contract_address = compute_create2_address(
address=factory_address,
salt=i,
initcode=initcode,
)
post[deployed_contract_address] = Account(nonce=1)

attack_call = Bytecode()
if opcode == Op.EXTCODECOPY:
attack_call = Op.EXTCODECOPY(
address=Op.SHA3(32 - 20 - 1, 85), dest_offset=96, size=1000
)
else:
# For the rest of the opcodes, we can use the same generic attack call
# since all only minimally need the `address` of the target.
attack_call = Op.POP(opcode(address=Op.SHA3(32 - 20 - 1, 85)))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this code is the same as before -- just that the diff is a bit unfortunate.

Something I noticed is that this has this opcode == Op.EXTCODECOPY which shouldn't be relevant for this method (look at test parametrization). Originally when I created this benchmark, the same test name was used for CALL-like opcodes and also EXTCODESIZE... but I think somebody split them after and maybe this was left here?

I'll leave this as is for now, and when I fix test_extcode_ops I'll try to re-use the logic since I'm pretty sure it was repeated there when it was splitted.

Comment on lines +160 to +171
attack_code = (
# Setup memory for later CREATE2 address generation loop.
# 0xFF+[Address(20bytes)]+[seed(32bytes)]+[initcode keccak(32bytes)]
Op.MSTORE(0, factory_address)
+ Op.MSTORE8(32 - 20 - 1, 0xFF)
+ Op.MSTORE(32, Op.CALLDATALOAD(0))
+ Op.MSTORE(64, initcode.keccak256())
# Main loop
+ While(
body=attack_call + Op.MSTORE(32, Op.ADD(Op.MLOAD(32), 1)),
)
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same attack code as before but with a twist. The previous version started with seed=0 and simply ran the loop until it ran out of gas, since we could do all in a singe tx.

Now we have to split the attack in multiple txs, thus I allow reading the seed from CALLDATALOAD(0) so we can create the attack txs in a way that makes sense.

Comment on lines +173 to +180
if len(attack_code) > max_contract_size:
# TODO: A workaround could be to split the opcode code into multiple
# contracts and call them in sequence.
raise ValueError(
f"Code size {len(attack_code)} exceeds maximum "
f"code size {max_contract_size}"
)
attack_address = pre.deploy_contract(code=attack_code)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unchanged code.

)


def _deploy_max_contract_factory(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way the contract factory was created remains unchanged.

Note that the factory_code already used storage slot 0 as the internal seed. This was quite nice since it means that when we call the factory mutliple times now, it keeps creating the next bytecode in a coherent way from the previous tx.

Comment on lines +71 to +74
tx_gas_limit_cap = fork.transaction_gas_limit_cap()
assert tx_gas_limit_cap is not None, (
"This benchmark requires a tx gas limit cap"
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we mark this test for Osaka forward only? (Or maybe that already happens since AFAIK in all the benchmark releases only Prague was filled, since I'm not interested we want backwards performance runs. For zkVM also makes sense that previous forks aren't interesting for this attack).

I think we could make it work backwards, it would just blow up a the logic below reg tx preparations for setup and attack. This test is already quite complex so not sure it is worth it.

Comment on lines +187 to +219
full_txs = attack_gas_limit // tx_gas_limit_cap
remainder = attack_gas_limit % tx_gas_limit_cap

num_targeted_contracts_per_full_tx = (
# Base available gas:
# TX_GAS_LIMIT - intrinsic - (out of loop MSTOREs)
tx_gas_limit_cap
- intrinsic_gas_cost_calc()
- gas_costs.G_VERY_LOW * 4
) // loop_cost
contract_start_index = 0
opcode_txs = []
for _ in range(full_txs):
opcode_txs.append(
Transaction(
to=attack_address,
gas_limit=tx_gas_limit_cap,
gas_price=10**9,
data=Hash(contract_start_index),
sender=pre.fund_eoa(),
)
)
contract_start_index += num_targeted_contracts_per_full_tx
if remainder > 0:
opcode_txs.append(
Transaction(
to=attack_address,
gas_limit=remainder,
gas_price=10**9,
data=Hash(contract_start_index),
sender=pre.fund_eoa(),
)
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: creating the attack txs to execute in the same block, such that none surpas the transaction gas limit cap. In contract_start_index we keep the offset we pass as CALLDATALOAD for the attack contract that I explained in my previous comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to mention this (did not check other logic): this looks almost similar to the split_transactions logic (

remaining_gas if i == num_splits - 1 else gas_limit_cap
) of the BenchmarkTestFiller, however - that split txs would just split the tx (without calldata) up in multiple. Which we cannot do here due to the test logic.
@LouisTsai-Csie we could maybe think of adding cases to the split txs logic here? I feel that inserting calldata is something we will see in other benchmark tests often (for instance as a "pointer" to where a certain tx should pick up where the other tx has exited - we could also do this in the contract code itself, but this means spending gas on this logic, which we want to spend on the target (the scenario/opcode we are benching))

@jsign jsign marked this pull request as ready for review December 5, 2025 16:36
@codecov
Copy link

codecov bot commented Dec 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.31%. Comparing base (519eb26) to head (5dee307).
⚠️ Report is 14 commits behind head on forks/osaka.

Additional details and impacted files
@@               Coverage Diff                @@
##           forks/osaka    #1850       +/-   ##
================================================
+ Coverage        53.45%   87.31%   +33.86%     
================================================
  Files              743      541      -202     
  Lines            44076    32832    -11244     
  Branches          3891     3015      -876     
================================================
+ Hits             23559    28668     +5109     
+ Misses           20306     3557    -16749     
- Partials           211      607      +396     
Flag Coverage Δ
unittests 87.31% <ø> (+33.86%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-test-benchmark Area: Tests Benchmarks—Performance measurement (eg. `tests/benchmark/*`, `p/t/s/e/benchmark/*`)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants