-
Notifications
You must be signed in to change notification settings - Fork 6.3k
External calls returning only value types no longer allocate persistent memory in IR and legacy pipelines #16444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
| // ==== | ||
| // compileViaYul: true | ||
| // ---- | ||
| // test() -> 0x20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was removed since it was actually testing the wrong behaviour. Its expectation (0x20) relied on the bug this PR fixes. The test declared bytes32 as return type (a value type that decodes directly to the stack). The old behavior unnecessarily allocated 32 bytes of memory. The correct behavior is 0 bytes, now covered by testBytes32() -> 0 in the new test file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test was fine. What it tests for is that when a function says it returns some type of a known size we allocate only that much memory for the result, even if it actually returns more data. That was the whole point of #12684.
Now, your optimization made it allocate even less than that, so for the test to still make sense you need to make ShortReturn return something that is not a value type, say uint[10]. And adjust the expectation, because it will now be bigger than a single slot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I would not call what you're doing here a bugfix. What the codegen was doing was not wrong and I don't think it was done by mistake. It was just not as optimal as it could be, so it's an optimization.
7ff937f to
f76d5c2
Compare
This comment was marked as outdated.
This comment was marked as outdated.
|
While working on the fix for the IR codegen, I wondered how the evmasm codegen handles this case. Looking into it, I found that evmasm codegen has the same issue when I'm not sure why this was added back then, but the You can use this script to demonstrate the bug on the legacy pipeline: #!/bin/bash
SOLC_BIN1="${SOLC_BIN1:-./solc-0.8.33}"
SOLC_BIN2="${SOLC_BIN2:-./build/solc/solc}"
CONTRACT=$(mktemp /tmp/repro_XXXXXX.sol)
ASM1=$(mktemp /tmp/asm1_XXXXXX.txt)
ASM2=$(mktemp /tmp/asm2_XXXXXX.txt)
cat > "$CONTRACT" << 'EOF'
// SPDX-License-Identifier: GPL-3.0
pragma solidity ^0.8.0;
interface IERC20 {
function transfer(address to, uint256 amount) external returns (bool);
}
contract C {
function batch(address[] calldata to) external {
IERC20 token = IERC20(address(0xdeadbeef));
for (uint256 i = 0; i < to.length; i++) {
require(token.transfer(to[i], 1 ether));
}
}
}
EOF
echo "=== Contract code ==="
cat "$CONTRACT"
echo ""
"$SOLC_BIN1" --asm "$CONTRACT" 2>&1 > "$ASM1"
"$SOLC_BIN2" --asm "$CONTRACT" 2>&1 > "$ASM2"
echo "=== solc-0.8.33: After 'call' - has '0x40 mstore' ==="
grep -A 30 "call$" "$ASM1" | head -32
echo ""
echo "=== solc-fix: After 'call' - NO '0x40 mstore' ==="
grep -A 30 "call$" "$ASM2" | head -32
echo ""
echo "=== Summary ==="
echo -n "solc-0.8.33 '0x40 mstore' after call: "
grep -A 30 "call$" "$ASM1" | grep -c "mstore"
echo -n "solc-fix '0x40 mstore' after call: "
grep -A 30 "call$" "$ASM2" | grep -c "mstore"
echo "=== Full assembly diff ==="
diff --color=always -u "$ASM1" "$ASM2" || trueBelow is the output difference between |
b076e02 to
596eac5
Compare
e4717cd to
a837c17
Compare
| // compileViaYul: false | ||
| // ---- | ||
| // testStaticArray() -> 64 | ||
| // testMemoryGrowsInLoop() -> 640 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test documents a behavioral difference in the legacy pipeline: the V1 decoder decodes static arrays in place, without allocating separate memory for the decoded copy. This distinction is specific to the legacy pipeline since via-ir always uses the V2 decoder. For V2 behavior, see the test functionCall/external_call_reference_type_allocates_memory.sol. I am not aware if this difference is documented elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this matter? From the caller's perspective, they still get a fresh piece of memory. The fact that it's not allocated on its own but only as a part of a bigger block is not visible to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one way it could matter, but only for dynamic types (or static types nested in dynamic ones) so it does not apply here. If the data is encoded in such a way that multiple offsets point at the same tail (say you have two identical arrays so you save space by reusing the same data), doing this would result in the user getting two decoded arrays also sharing the same memory. And since those are not read-only, changing one would also change the other.
I don't think this is something that can happen here. But would also be good to have a semantic test proving it, for both decoders.
|
| project | bytecode_size | deployment_gas | method_gas |
|---|---|---|---|
| brink | -0.23% ✅ |
||
| colony | -0.53% ✅ |
||
| elementfi | -0.26% ✅ |
||
| ens | -0.34% ✅ |
||
| euler | |||
| gnosis | |||
| gp2 | -0.16% ✅ |
||
| pool-together | -0.2% ✅ |
||
| uniswap | -0.36% ✅ |
-0.44% ✅ |
-0.12% ✅ |
| yield_liquidator | -0.38% ✅ |
-0.34% ✅ |
-0.1% ✅ |
| zeppelin |
ir-optimize-evm+yul
| project | bytecode_size | deployment_gas | method_gas |
|---|---|---|---|
| brink | -0.73% ✅ |
||
| colony | -1.02% ✅ |
||
| elementfi | -0.52% ✅ |
||
| ens | -0.7% ✅ |
-1.06% ✅ |
-0.14% ✅ |
| euler | -0.62% ✅ |
-0.58% ✅ |
-0.06% ✅ |
| gnosis | |||
| gp2 | -0.26% ✅ |
||
| pool-together | -0.34% ✅ |
||
| uniswap | -0.48% ✅ |
-0.54% ✅ |
-0.26% ✅ |
| yield_liquidator | -0.85% ✅ |
-0.71% ✅ |
-0.09% ✅ |
| zeppelin | -0.22% ✅ |
-0.2% ✅ |
-0.2% ✅ |
ir-optimize-evm-only
| project | bytecode_size | deployment_gas | method_gas |
|---|---|---|---|
| brink | -0.34% ✅ |
||
| colony | -0.92% ✅ |
||
| elementfi | -0.29% ✅ |
||
| ens | -0.34% ✅ |
-0.49% ✅ |
-0.12% ✅ |
| euler | |||
| gnosis | |||
| gp2 | -0.16% ✅ |
||
| pool-together | -0.18% ✅ |
||
| uniswap | -0.31% ✅ |
-0.36% ✅ |
-0.12% ✅ |
| yield_liquidator | -0.36% ✅ |
-0.31% ✅ |
-0.07% ✅ |
| zeppelin | -0.12% ✅ |
legacy-no-optimize
| project | bytecode_size | deployment_gas | method_gas |
|---|---|---|---|
| brink | 0% |
||
| colony | -0.53% ✅ |
||
| elementfi | -0.33% ✅ |
||
| ens | -0.35% ✅ |
||
| euler | -0.55% ✅ |
-0.58% ✅ |
-0.02% ✅ |
| gnosis | -0.16% ✅ |
||
| gp2 | -0.12% ✅ |
||
| pool-together | -0.19% ✅ |
||
| uniswap | -0.36% ✅ |
-0.52% ✅ |
-0.07% ✅ |
| yield_liquidator | -0.48% ✅ |
-0.43% ✅ |
-0.03% ✅ |
| zeppelin |
legacy-optimize-evm+yul
| project | bytecode_size | deployment_gas | method_gas |
|---|---|---|---|
| brink | 0% |
||
| colony | -1.02% ✅ |
||
| elementfi | -0.58% ✅ |
||
| ens | -0.62% ✅ |
-1.1% ✅ |
-0.04% ✅ |
| euler | -0.94% ✅ |
-0.97% ✅ |
-0.02% ✅ |
| gnosis | -0.34% ✅ |
||
| gp2 | -0.22% ✅ |
||
| pool-together | -0.36% ✅ |
||
| uniswap | -0.58% ✅ |
-0.8% ✅ |
-0.12% ✅ |
| yield_liquidator | -0.97% ✅ |
-0.84% ✅ |
-0.03% ✅ |
| zeppelin | -0.22% ✅ |
-0.21% ✅ |
-0.16% ✅ |
legacy-optimize-evm-only
| project | bytecode_size | deployment_gas | method_gas |
|---|---|---|---|
| brink | 0% |
||
| colony | -0.92% ✅ |
||
| elementfi | -0.53% ✅ |
||
| ens | -0.56% ✅ |
-1.02% ✅ |
-0.04% ✅ |
| euler | -0.85% ✅ |
-0.88% ✅ |
-0.02% ✅ |
| gnosis | -0.29% ✅ |
||
| gp2 | -0.2% ✅ |
||
| pool-together | -0.32% ✅ |
||
| uniswap | -0.54% ✅ |
-0.74% ✅ |
-0.11% ✅ |
| yield_liquidator | -0.87% ✅ |
-0.77% ✅ |
-0.03% ✅ |
| zeppelin |
!V = version mismatch
!B = no value in the "before" version
!A = no value in the "after" version
!T = one or both values were not numeric and could not be compared
-0 = very small negative value rounded to zero
+0 = very small positive value rounded to zero
| templ("finalizeAllocation", m_utils.finalizeAllocationFunction()); | ||
|
|
||
| // Only update free memory pointer if any return type needs memory (reference types). | ||
| // Value types are decoded directly to the stack via mload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this may be true now, I think that just straight out hard-coding this assumption here is fragile and can lead to bugs.
The assumption here is actually a bit broader than just that we're only dealing with value types. We can safely free this memory if and only if we know there will be no other memory allocations before we're done using this piece of memory. There is no guarantee that the decoder won't in the future allocate any extra memory for other purposes, even when it's only dealing with value types.
A safer approach would be to modify tupleDecoder() to return a value that tells us whether the function it generated needs any allocations. It would at least clearly signal to anyone modifying it that the downstream code relies on whether memory was allocated. It would also be more thorough - if there are any other cases in which we don't allocate, we'll avoid allocating in those too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, checking tupleDecoder() I noticed that this is not the only case where we can do this optimization. We're doing pretty much the same thing in YulUtilFunctions::copyConstructorArgumentsToMemoryFunction() when we're decoding arguments in a constructor.
| needToUpdateFreeMemoryPtr = true; | ||
| else | ||
| for (auto const& retType: returnTypes) | ||
| if (dynamic_cast<ReferenceType const*>(retType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we've been doing the same assumption here all along, just the opposite way, by checking if we have a ReferenceType. I think we should eliminate it while we're at it and have abiDecode() tell us if it allocated as well. In this case it's actually even more straightforward because that function itself manages the free mem ptr, so it directly knows if it has moved the value.
| // The old decoder did not allocate any memory (i.e. did not touch the free | ||
| // memory pointer), but kept references to the return data for | ||
| // (statically-sized) arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you removing this? Isn't it still true? I don't see any allocation for static types in CompilerUtils::abiDecode().
Though I also do not see any decoding for such types there, but I guess it's because it does not support nested arrays and assumes that for other types the memory layout matches calldata layout? What about arrays of structs though?
Also, if that's the case then the logic checking ReferenceType below is also wrong...
| let newFreePtr := add(_4, and(add(_6, 31), not(31))) | ||
| if or(gt(newFreePtr, 0xffffffffffffffff), lt(newFreePtr, _4)) | ||
| { | ||
| mstore(0, shl(224, 0x4e487b71)) | ||
| mstore(4, 0x41) | ||
| revert(0, 0x24) | ||
| } | ||
| mstore(64, newFreePtr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised how much code this simple change removes. Looks like finalize_allocation() and its panic_error_0x41() are getting inlined quite a lot, which bloats the bytecode.
| // compileViaYul: false | ||
| // ---- | ||
| // testStaticArray() -> 64 | ||
| // testMemoryGrowsInLoop() -> 640 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this matter? From the caller's perspective, they still get a fresh piece of memory. The fact that it's not allocated on its own but only as a part of a bigger block is not visible to the user.
| // compileViaYul: false | ||
| // ---- | ||
| // testStaticArray() -> 64 | ||
| // testMemoryGrowsInLoop() -> 640 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one way it could matter, but only for dynamic types (or static types nested in dynamic ones) so it does not apply here. If the data is encoded in such a way that multiple offsets point at the same tail (say you have two identical arrays so you save space by reusing the same data), doing this would result in the user getting two decoded arrays also sharing the same memory. And since those are not read-only, changing one would also change the other.
I don't think this is something that can happen here. But would also be good to have a semantic test proving it, for both decoders.
|
Please give your branches more meaningful names. I'll have no idea what this is when I see it later in git output :) The name is visible in the merge commit and it's convenient when these tell you at a glance what the branch was about. Please don't break that. |
Why allocate new memory if you already allocated a nice big chunk of it and can just returns pointers to what's inside? |
Fixes #16440
Value types are ABI-decoded via
mloaddirectly onto the stack, so the memory used forreturndatadoesn't need to be preserved. Reference types decode to memory pointers, so we must callfinalizeAllocationto preserve that memory. SkippingfinalizeAllocationfor value-only returns avoids unbounded memory growth in loops as pointed out by #16440.Note: This PR addresses the case where return types are value types. A further optimization could skip allocation when the return value is unused (even for reference types), but that is left for future work.