Skip to content

feat(tests): enhance eip7883 test coverage #1929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

LouisTsai-Csie
Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie commented Jul 21, 2025

🗒️ Description

EIP-7883: ModExp Gas Cost Increase

This EIP changes the gas cost, so it falls under this category in the checklist. While this category is meant specifically for gas cost changes, I’ve added a broader set of scenarios, similar to what would be done for a new precompile, since the existing tests from Byzantium are incomplete.

Call contexts

  • CALL / DELEGATECALL / STATICCALL / CALLCODE
  • Transaction Entry-point
  • Initcode call: IMO this is unnecessary, since we still need to use *CALL to trigger the precompile in the create/create2 initcode, it is exactly the same testing scenario as the call context one.
  • Precompile as Set-code Delegated Address: please check this in eip-7702 test, it is not located in eip-7883 test.

Inputs

  • precompile/test/inputs/valid
  • precompile/test/inputs/valid/boundary
  • precompile/test/inputs/valid/crypto
  • precompile/test/inputs/all_zeros
  • precompile/test/inputs/max_values
  • precompile/test/inputs/invalid
  • precompile/test/inputs/invalid/crypto
  • precompile/test/inputs/invalid/corrupted

Value Transfer

  • Minimum Fee Precompile
  • No-Fee Precompile: Do not need this one, as Modexp is not no-fee precompile

Out-of-bounds checks

  • precompile/test/out_of_bounds/max
  • precompile/test/out_of_bounds/max_plus_one

Input Lengths

  • Zero-length Input
  • Static Required Input Length: The input length is dynamic
  • Dynamic Required Input Length

Gas usage

  • Constant Gas Cost: We do not need this one, as the gas cost is dynamic
  • Variable Gas Cost: See analysis below
  • Excessive Gas: Already have this in benchmark test

Fork transition

  • Pre-fork Block Call
  • Cold/Warm Precompile Address State: We do not need this one as the Modexp is not a new precompile.

🔗 Related Issues or PRs

Issue #1791, #1790, #1971

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

@LouisTsai-Csie LouisTsai-Csie changed the title refactor(eip7883): update vector input structure feat(tests): enhance eip7883 test coverage Jul 21, 2025
@LouisTsai-Csie LouisTsai-Csie force-pushed the enhance-eip7823-coverage branch from 016bf91 to 4d57f68 Compare July 21, 2025 07:45
@LouisTsai-Csie LouisTsai-Csie self-assigned this Jul 21, 2025
code += Op.RETURNDATACOPY(0, i * 32, 32)
code += Op.SSTORE(
call_contract_post_storage.store_next(modexp_expected[i * 32 : (i + 1) * 32]),
Op.MLOAD(0),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use a hash-based method (see the comment) here for simplicity but failed, so I simply store all the data in storage for comparison. I still prefer the hash-based method as it reduces the SSTORE operation count

Copy link
Contributor

@spencer-tb spencer-tb Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do this today but had issues with guido-4 and geth-fail vectors. I then realised we have a bug in the chunking method for cases where len(modexp_expected) // 32 = 0. For guido-4 this equates to 8 // 32 = 0. So then we get for i in range(0), meaning we don't store the result for guido-4.

We should try and get the hash method working nonetheless!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve updated the implementation to a hash-based method, but some issues have come up.

From EIP-198:

Consumes floor(mult_complexity(max(length_of_MODULUS, length_of_BASE)) * max(ADJUSTED_EXPONENT_LENGTH, 1) / GQUADDIVISOR) gas, and if there is enough gas, returns (BASE**EXPONENT) % MODULUS as a byte array with the same length as the modulus.

Given a ModExp input where (base, exponent, modulus) yields the value 0x01 and the modulus length is 4, what should the output be? Should it be 0x00000001 (left-padded) or 0x01000000 (right-padded)?

Referring to Mario’s vector update (check vector.json change in this PR), I initially thought it should be the latter.

However, after reviewing the relevant EIPs, such as eip-198, eip-2565, eip-7883, and eip-7823, and implementing the hashing comparison, I now think the correct format might be the former (left-padded).

The previous test cases did not fail because, as you mentioned, for some cases the length was smaller than 32, so len(modexp_expected) // 32 evaluated to 0, resulting in no loop execution. Even if the calculated result was non-zero, we should be checking the range from 0 to len(modexp_expected) // 32 + 1. Otherwise, we miss trailing bytes.

Short summary: the legacy approach never fully validated the output format. After switching to the updated verification method, I think these vectors might have an issue:

geth-fail-length

Input:  000000000000000000000000000000000000000000000000000000000000000000
Output: 000000000000000000000000000000000000000000000000000000000000000001
  • base_len = 0 → base = 0
  • exp_len = 34 → exponent = 34 zero bytes (value 0)
  • mod_len = 33 → modulus = 0x...0002 (value 2)
    Per EIP-198, exponent = 0 and modulus > 1 → result is 1, left-padded to modulus length.

guido-4-even

Input:  0001000000000000
Output: 0000000000000001

The result might need to be left-padded, not right-padded.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I update the implementation and then run consume engine command on the server, it now works fine.

Screenshot 2025-08-12 at 4 30 39 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing catch! I struggled to find the issue last night. Could you please update the vectors?
It would be good to run these against the clients!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we never properly tested the expectation of these cases, from what I can see:

  • marius-1-even - 12 bytes expected
  • guido-1-even - 16 bytes expected
  • guido-2-even - 16 bytes expected
  • guido-4-even - 8 bytes expected
  • marcin-1-exp-heavy - 8 bytes expected
  • marcin-2-exp-heavy - 16 bytes expected
  • marcin-3-exp-heavy - 24 bytes expected

Only the call, the gas calculation and the return length!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I update the implementation and then run consume engine command on the server, it now works fine.
Screenshot 2025-08-12 at 4 30 39 PM

Nice! Thanks for taking care of this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already updated the test vector. My current understanding also, is that we never really test the actual data for the return data! For the geth-fail-length, it would be critical as the output provided is even incorrect.

@pytest.mark.parametrize(
"modexp_input,modexp_expected,gas_old,gas_new",
[
pytest.param(Spec.modexp_input, Spec.modexp_expected, 200, 500), # Should be 1200
Copy link
Collaborator Author

@LouisTsai-Csie LouisTsai-Csie Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this test case, it should take 1200 gas cost after Osaka is activated, but in this test case, it only takes 500. Still looking into the root cause here.

This is one is the "marcin-1-balanced", you can find it in the vectors.json file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent the entire afternoon debugging this issue without understanding what went wrong. So I decided towrite down every step I took, traces, opcode sequences, the LLM conversation history, and all the tricks I tried.
It turned into a lengthy and tedious comment. But after finishing it, I realized the root cause was simply that I forgot to pass the calldata to the transaction. So, it’s fixed now.


senders = [pre.fund_eoa() for _ in range(3)]
contracts = [pre.deploy_contract(code) for _ in range(3)]
timestamps = [14_999, 15_000, 15_001]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I do not know why these timestamp values are 14_999-15_001, I use this trick as this is how Spencer test CLZ opcode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All transition forks that are defined in our tests have a transition time hard-coded to 15k seconds.
So on 14,999 we should still see Prague rules, and Osaka rules in the rest.

@LouisTsai-Csie LouisTsai-Csie added fork:osaka Osaka hardfork type:test Type: Add/refactor fw unit tests; no fw or el client test case changes labels Jul 22, 2025
@LouisTsai-Csie LouisTsai-Csie marked this pull request as ready for review July 22, 2025 15:09
@LouisTsai-Csie LouisTsai-Csie force-pushed the enhance-eip7823-coverage branch from 256b98c to ed33e8d Compare July 23, 2025 06:48
exponent="FF" * (Spec.MAX_LENGTH_BYTES + 1),
modulus="FF" * (Spec.MAX_LENGTH_BYTES + 1),
case_id="all-too-long",
),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these test cases, the input exceeds the boundary for the ModExp precompile. However, it’s unclear whether the failure is due to exceeding the transaction gas limit or violating the input boundary constraints.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should double check this and do the actual calculation of the gas required to do this operation, It might even be higher than the tx gas limit cap introduced in Osaka, and in that case the test might be unnecessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with different input values and evaluated the gas cost. Based on the results, I removed some of the test cases.

If the exponent length exceeds the limit, it’s highly unlikely that the base or modulus could also exceed it. In such cases, the total gas cost will break the transaction gas cap.

For combinations like (base, exponent), (exponent, modulus), or (base, exponent, modulus) that exceed the limit, the gas cost is greater than 500 million.

But I do not evaluate it using fuzzing or formal verification, so there might be some corner case that support such combination. I am wondering how can I prove this attribute.

@LouisTsai-Csie LouisTsai-Csie force-pushed the enhance-eip7823-coverage branch 3 times, most recently from 5c7c640 to 87c8cac Compare July 30, 2025 04:03
@marioevz marioevz requested a review from spencer-tb July 31, 2025 00:12
@LouisTsai-Csie LouisTsai-Csie force-pushed the enhance-eip7823-coverage branch from fce1649 to 13ab385 Compare July 31, 2025 03:40
Comment on lines +342 to +338
# Test case coverage table:
# ┌─────┬──────┬─────┬──────┬───────┬─────────┬─────────────────────────────────────────────┐
# │ ID │ Comp │ Rel │ Iter │ Clamp │ Gas │ Description │
# ├─────┼──────┼─────┼──────┼───────┼─────────┼─────────────────────────────────────────────┤
# │ Z0 │ - │ - │ - │ - │ 500 │ Zero case - empty inputs │
# │ S0 │ S │ = │ A │ True │ 500 │ Small, equal, zero exponent, clamped │
# │ S1 │ S │ = │ B │ True │ 500 │ Small, equal, small exp, clamped │
# │ S2 │ S │ = │ B │ False │ 4080 │ Small, equal, large exp, unclamped │
# │ S3 │ S │ = │ C │ False │ 2032 │ Small, equal, large exp+zero low256 │
# │ S4 │ S │ = │ D │ False │ 2048 │ Small, equal, large exp+non-zero low256 │
# │ S5 │ S │ > │ A │ True │ 500 │ Small, base>mod, zero exp, clamped │
# │ S6 │ S │ < │ B │ True │ 500 │ Small, base<mod, small exp, clamped │
# │ L0 │ L │ = │ A │ True │ 500 │ Large, equal, zero exp, clamped │
# │ L1 │ L │ = │ B │ False │ 12750 │ Large, equal, large exp, unclamped │
# │ L2 │ L │ = │ C │ False │ 6350 │ Large, equal, large exp+zero low256 │
# │ L3 │ L │ = │ D │ False │ 6400 │ Large, equal, large exp+non-zero low256 │
# │ L4 │ L │ > │ B │ True │ 500 │ Large, base>mod, small exp, clamped │
# │ L5 │ L │ < │ C │ False │ 9144 │ Large, base<mod, large exp+zero low256 │
# └─────┴──────┴─────┴──────┴───────┴─────────┴─────────────────────────────────────────────┘
Copy link
Collaborator Author

@LouisTsai-Csie LouisTsai-Csie Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To verify the table, you can check this script, compare to eip-7883, and use it ot generate the test case as well as the labels: https://gist.github.com/1c8fd82ac1e75e9cfd1c79e5a2f5fbe6.git

I was inspired by this paper: https://arxiv.org/pdf/2504.12034

@@ -73,6 +73,8 @@ def from_bytes(cls, input_data: Bytes | str) -> "ModExpInput":

modulus = input_data[current_index : current_index + modulus_length]

modulus = modulus.ljust(min(1024, modulus_length), b"\x00")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is necessary but requires additional refactoring.

For case 4 in modexpFiller.json, the test description is:

4 - Would also parse as a base of 3, an exponent of 65535, and a modulus of 2**255. It attempts to read 32 bytes for the modulus starting from 0x80, but since there’s no further data, it right-pads the result with 31 zero bytes.

Although the operation modulus = input_data[current_index : current_index + modulus_length] is technically valid, as Python silently pads with null bytes when slicing beyond the end, we actually need to left-pad with \x00 instead.

Example: https://onecompiler.com/python/43smfe6dm

Regarding the 1024-byte length restriction, in case 2, the exponent length is 2**256 - 1, which can cause current_index + modulus_length to overflow. Therefore, I added this condition to prevent overflow. It aligns with the logic in EIP-7883, although such a restriction does not exist in EIP-198.

Some idea for refactoring in ModexpInput: We should consider not passing ModexpInput data type to the test, but bytes data instead, this would be much more flexible for strange testing scenario.

@LouisTsai-Csie
Copy link
Collaborator Author

I didn’t create a separate PR for issue #1971, as it depends on the infrastructure introduced in this PR.

There are 37 test cases in the legacy modexpFiller.json test. Most of them have been ported, but the following cases are still pending and require further analysis:

  • Case 2: Would parse a base length of 0, a modulus length of 32, and an exponent length of 2256 - 1, where the base is empty, the modulus is 2256 - 2 and the exponent is (2256 - 3) * 256(2**256 - 33) (yes, that's a really big number). It would then immediately fail, as it's not possible to provide enough gas to make that computation.
  • Case 28: base length 4TiB
  • Case 29: exp length 4TiB; returns 0 because mod is zero
  • Case 30: base and mod have zero-length. exp's length is 2^255. Since mod is zero, the result should be zero.
  • Case 36: the input found on 10 Oct. 2017 that overflows the gas calculation
  • Case 37: input found in July 2022, overflows the gas calculation
  1. Based on the description of case 2, 28, 29 and 30, they will fail as eip7623 introduces the upper bound for each field, we already have similar test in test_modexp_invalid_inputs.
  2. For case 36, 37, I am checking if it is valid after Fusaka.

Copy link
Contributor

@spencer-tb spencer-tb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Just some small comments for now.

Could you double check my understanding for my comment here:
#1929 (comment)

@@ -0,0 +1,162 @@
[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I had no idea we had these vectors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from here, could you help check if there is anything missing?

code += Op.RETURNDATACOPY(0, i * 32, 32)
code += Op.SSTORE(
call_contract_post_storage.store_next(modexp_expected[i * 32 : (i + 1) * 32]),
Op.MLOAD(0),
Copy link
Contributor

@spencer-tb spencer-tb Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do this today but had issues with guido-4 and geth-fail vectors. I then realised we have a bug in the chunking method for cases where len(modexp_expected) // 32 = 0. For guido-4 this equates to 8 // 32 = 0. So then we get for i in range(0), meaning we don't store the result for guido-4.

We should try and get the hash method working nonetheless!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fork:osaka Osaka hardfork type:test Type: Add/refactor fw unit tests; no fw or el client test case changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants