-
Notifications
You must be signed in to change notification settings - Fork 167
feat(tests): enhance eip7883 test coverage #1929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(tests): enhance eip7883 test coverage #1929
Conversation
016bf91
to
4d57f68
Compare
code += Op.RETURNDATACOPY(0, i * 32, 32) | ||
code += Op.SSTORE( | ||
call_contract_post_storage.store_next(modexp_expected[i * 32 : (i + 1) * 32]), | ||
Op.MLOAD(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to use a hash-based method (see the comment) here for simplicity but failed, so I simply store all the data in storage for comparison. I still prefer the hash-based method as it reduces the SSTORE
operation count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do this today but had issues with guido-4
and geth-fail
vectors. I then realised we have a bug in the chunking method for cases where len(modexp_expected) // 32 = 0
. For guido-4
this equates to 8 // 32 = 0
. So then we get for i in range(0)
, meaning we don't store the result for guido-4
.
We should try and get the hash method working nonetheless!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve updated the implementation to a hash-based method, but some issues have come up.
From EIP-198:
Consumes floor(mult_complexity(max(length_of_MODULUS, length_of_BASE)) * max(ADJUSTED_EXPONENT_LENGTH, 1) / GQUADDIVISOR) gas, and if there is enough gas, returns (BASE**EXPONENT) % MODULUS as a byte array with the same length as the modulus.
Given a ModExp
input where (base, exponent, modulus) yields the value 0x01
and the modulus length is 4, what should the output be? Should it be 0x00000001
(left-padded) or 0x01000000
(right-padded)?
Referring to Mario’s vector update (check vector.json
change in this PR), I initially thought it should be the latter.
However, after reviewing the relevant EIPs, such as eip-198, eip-2565, eip-7883, and eip-7823, and implementing the hashing comparison, I now think the correct format might be the former (left-padded).
The previous test cases did not fail because, as you mentioned, for some cases the length was smaller than 32, so len(modexp_expected) // 32
evaluated to 0, resulting in no loop execution. Even if the calculated result was non-zero, we should be checking the range from 0 to len(modexp_expected) // 32 + 1
. Otherwise, we miss trailing bytes.
Short summary: the legacy approach never fully validated the output format. After switching to the updated verification method, I think these vectors might have an issue:
geth-fail-length
Input: 000000000000000000000000000000000000000000000000000000000000000000
Output: 000000000000000000000000000000000000000000000000000000000000000001
- base_len = 0 → base = 0
- exp_len = 34 → exponent = 34 zero bytes (value 0)
- mod_len = 33 → modulus = 0x...0002 (value 2)
Per EIP-198, exponent = 0 and modulus > 1 → result is 1, left-padded to modulus length.
guido-4-even
Input: 0001000000000000
Output: 0000000000000001
The result might need to be left-padded, not right-padded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing catch! I struggled to find the issue last night. Could you please update the vectors?
It would be good to run these against the clients!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we never properly tested the expectation of these cases, from what I can see:
- marius-1-even - 12 bytes expected
- guido-1-even - 16 bytes expected
- guido-2-even - 16 bytes expected
- guido-4-even - 8 bytes expected
- marcin-1-exp-heavy - 8 bytes expected
- marcin-2-exp-heavy - 16 bytes expected
- marcin-3-exp-heavy - 24 bytes expected
Only the call, the gas calculation and the return length!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already updated the test vector. My current understanding also, is that we never really test the actual data for the return data! For the geth-fail-length
, it would be critical as the output provided is even incorrect.
@pytest.mark.parametrize( | ||
"modexp_input,modexp_expected,gas_old,gas_new", | ||
[ | ||
pytest.param(Spec.modexp_input, Spec.modexp_expected, 200, 500), # Should be 1200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this test case, it should take 1200 gas cost after Osaka is activated, but in this test case, it only takes 500. Still looking into the root cause here.
This is one is the "marcin-1-balanced", you can find it in the vectors.json
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent the entire afternoon debugging this issue without understanding what went wrong. So I decided towrite down every step I took, traces, opcode sequences, the LLM conversation history, and all the tricks I tried.
It turned into a lengthy and tedious comment. But after finishing it, I realized the root cause was simply that I forgot to pass the calldata to the transaction. So, it’s fixed now.
|
||
senders = [pre.fund_eoa() for _ in range(3)] | ||
contracts = [pre.deploy_contract(code) for _ in range(3)] | ||
timestamps = [14_999, 15_000, 15_001] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I do not know why these timestamp values are 14_999-15_001, I use this trick as this is how Spencer test CLZ opcode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All transition forks that are defined in our tests have a transition time hard-coded to 15k seconds.
So on 14,999 we should still see Prague rules, and Osaka rules in the rest.
256b98c
to
ed33e8d
Compare
exponent="FF" * (Spec.MAX_LENGTH_BYTES + 1), | ||
modulus="FF" * (Spec.MAX_LENGTH_BYTES + 1), | ||
case_id="all-too-long", | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these test cases, the input exceeds the boundary for the ModExp precompile. However, it’s unclear whether the failure is due to exceeding the transaction gas limit or violating the input boundary constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should double check this and do the actual calculation of the gas required to do this operation, It might even be higher than the tx gas limit cap introduced in Osaka, and in that case the test might be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested with different input values and evaluated the gas cost. Based on the results, I removed some of the test cases.
If the exponent length exceeds the limit, it’s highly unlikely that the base or modulus could also exceed it. In such cases, the total gas cost will break the transaction gas cap.
For combinations like (base, exponent), (exponent, modulus), or (base, exponent, modulus) that exceed the limit, the gas cost is greater than 500 million.
But I do not evaluate it using fuzzing or formal verification, so there might be some corner case that support such combination. I am wondering how can I prove this attribute.
5c7c640
to
87c8cac
Compare
fce1649
to
13ab385
Compare
# Test case coverage table: | ||
# ┌─────┬──────┬─────┬──────┬───────┬─────────┬─────────────────────────────────────────────┐ | ||
# │ ID │ Comp │ Rel │ Iter │ Clamp │ Gas │ Description │ | ||
# ├─────┼──────┼─────┼──────┼───────┼─────────┼─────────────────────────────────────────────┤ | ||
# │ Z0 │ - │ - │ - │ - │ 500 │ Zero case - empty inputs │ | ||
# │ S0 │ S │ = │ A │ True │ 500 │ Small, equal, zero exponent, clamped │ | ||
# │ S1 │ S │ = │ B │ True │ 500 │ Small, equal, small exp, clamped │ | ||
# │ S2 │ S │ = │ B │ False │ 4080 │ Small, equal, large exp, unclamped │ | ||
# │ S3 │ S │ = │ C │ False │ 2032 │ Small, equal, large exp+zero low256 │ | ||
# │ S4 │ S │ = │ D │ False │ 2048 │ Small, equal, large exp+non-zero low256 │ | ||
# │ S5 │ S │ > │ A │ True │ 500 │ Small, base>mod, zero exp, clamped │ | ||
# │ S6 │ S │ < │ B │ True │ 500 │ Small, base<mod, small exp, clamped │ | ||
# │ L0 │ L │ = │ A │ True │ 500 │ Large, equal, zero exp, clamped │ | ||
# │ L1 │ L │ = │ B │ False │ 12750 │ Large, equal, large exp, unclamped │ | ||
# │ L2 │ L │ = │ C │ False │ 6350 │ Large, equal, large exp+zero low256 │ | ||
# │ L3 │ L │ = │ D │ False │ 6400 │ Large, equal, large exp+non-zero low256 │ | ||
# │ L4 │ L │ > │ B │ True │ 500 │ Large, base>mod, small exp, clamped │ | ||
# │ L5 │ L │ < │ C │ False │ 9144 │ Large, base<mod, large exp+zero low256 │ | ||
# └─────┴──────┴─────┴──────┴───────┴─────────┴─────────────────────────────────────────────┘ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To verify the table, you can check this script, compare to eip-7883, and use it ot generate the test case as well as the labels: https://gist.github.com/1c8fd82ac1e75e9cfd1c79e5a2f5fbe6.git
I was inspired by this paper: https://arxiv.org/pdf/2504.12034
3d0391f
to
47ba51b
Compare
@@ -73,6 +73,8 @@ def from_bytes(cls, input_data: Bytes | str) -> "ModExpInput": | |||
|
|||
modulus = input_data[current_index : current_index + modulus_length] | |||
|
|||
modulus = modulus.ljust(min(1024, modulus_length), b"\x00") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is necessary but requires additional refactoring.
For case 4 in modexpFiller.json
, the test description is:
4 - Would also parse as a base of 3, an exponent of 65535, and a modulus of 2**255. It attempts to read 32 bytes for the modulus starting from 0x80, but since there’s no further data, it right-pads the result with 31 zero bytes.
Although the operation modulus = input_data[current_index : current_index + modulus_length]
is technically valid, as Python silently pads with null bytes when slicing beyond the end, we actually need to left-pad with \x00
instead.
Example: https://onecompiler.com/python/43smfe6dm
Regarding the 1024-byte length restriction, in case 2, the exponent length is 2**256 - 1
, which can cause current_index + modulus_length
to overflow. Therefore, I added this condition to prevent overflow. It aligns with the logic in EIP-7883, although such a restriction does not exist in EIP-198.
Some idea for refactoring in ModexpInput
: We should consider not passing ModexpInput
data type to the test, but bytes
data instead, this would be much more flexible for strange testing scenario.
I didn’t create a separate PR for issue #1971, as it depends on the infrastructure introduced in this PR. There are 37 test cases in the legacy
|
47ba51b
to
55d8fec
Compare
efb5a00
to
6e7819f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. Just some small comments for now.
Could you double check my understanding for my comment here:
#1929 (comment)
@@ -0,0 +1,162 @@ | |||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I had no idea we had these vectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is from here, could you help check if there is anything missing?
code += Op.RETURNDATACOPY(0, i * 32, 32) | ||
code += Op.SSTORE( | ||
call_contract_post_storage.store_next(modexp_expected[i * 32 : (i + 1) * 32]), | ||
Op.MLOAD(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do this today but had issues with guido-4
and geth-fail
vectors. I then realised we have a bug in the chunking method for cases where len(modexp_expected) // 32 = 0
. For guido-4
this equates to 8 // 32 = 0
. So then we get for i in range(0)
, meaning we don't store the result for guido-4
.
We should try and get the hash method working nonetheless!
6e7819f
to
ec39341
Compare
🗒️ Description
EIP-7883: ModExp Gas Cost Increase
This EIP changes the gas cost, so it falls under this category in the checklist. While this category is meant specifically for gas cost changes, I’ve added a broader set of scenarios, similar to what would be done for a new precompile, since the existing tests from
Byzantium
are incomplete.Call contexts
Inputs
Value Transfer
Modexp
is not no-fee precompileOut-of-bounds checks
Input Lengths
Gas usage
Fork transition
🔗 Related Issues or PRs
Issue #1791, #1790, #1971
✅ Checklist
tox
checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
type(scope):
.mkdocs serve
locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.@ported_from
marker.