Skip to content

Conversation

@r0qs
Copy link
Member

@r0qs r0qs commented Feb 4, 2026

Fixes #16440

Value types are ABI-decoded via mload directly onto the stack, so the memory used for returndata doesn't need to be preserved. Reference types decode to memory pointers, so we must call finalizeAllocation to preserve that memory. Skipping finalizeAllocation for value-only returns avoids unbounded memory growth in loops as pointed out by #16440.

Note: This PR addresses the case where return types are value types. A further optimization could skip allocation when the return value is unused (even for reference types), but that is left for future work.

@r0qs r0qs self-assigned this Feb 4, 2026
// ====
// compileViaYul: true
// ----
// test() -> 0x20
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was removed since it was actually testing the wrong behaviour. Its expectation (0x20) relied on the bug this PR fixes. The test declared bytes32 as return type (a value type that decodes directly to the stack). The old behavior unnecessarily allocated 32 bytes of memory. The correct behavior is 0 bytes, now covered by testBytes32() -> 0 in the new test file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test was fine. What it tests for is that when a function says it returns some type of a known size we allocate only that much memory for the result, even if it actually returns more data. That was the whole point of #12684.

Now, your optimization made it allocate even less than that, so for the test to still make sense you need to make ShortReturn return something that is not a value type, say uint[10]. And adjust the expectation, because it will now be bigger than a single slot.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I would not call what you're doing here a bugfix. What the codegen was doing was not wrong and I don't think it was done by mistake. It was just not as optimal as it could be, so it's an optimization.

@r0qs r0qs force-pushed the fix-16440 branch 7 times, most recently from 7ff937f to f76d5c2 Compare February 4, 2026 20:25
@r0qs

This comment was marked as outdated.

@r0qs
Copy link
Member Author

r0qs commented Feb 4, 2026

While working on the fix for the IR codegen, I wondered how the evmasm codegen handles this case. Looking into it, I found that evmasm codegen has the same issue when ABICoderV2 is used (the default since 0.8.0): it always updates the free memory pointer after external call returns, even for value types (https://github.com/argotorg/solidity/blob/develop/libsolidity/codegen/ExpressionCompiler.cpp#L2949).

I'm not sure why this was added back then, but the ABICoderV2 decoder for value types doesn't actually allocate memory. It just loads values onto the stack via mload (https://github.com/argotorg/solidity/blob/develop/libsolidity/codegen/ABIFunctions.cpp#L1106).

You can use this script to demonstrate the bug on the legacy pipeline:

#!/bin/bash

SOLC_BIN1="${SOLC_BIN1:-./solc-0.8.33}"
SOLC_BIN2="${SOLC_BIN2:-./build/solc/solc}"
CONTRACT=$(mktemp /tmp/repro_XXXXXX.sol)
ASM1=$(mktemp /tmp/asm1_XXXXXX.txt)
ASM2=$(mktemp /tmp/asm2_XXXXXX.txt)

cat > "$CONTRACT" << 'EOF'
// SPDX-License-Identifier: GPL-3.0
pragma solidity ^0.8.0;

interface IERC20 {
    function transfer(address to, uint256 amount) external returns (bool);
}

contract C {
    function batch(address[] calldata to) external {
        IERC20 token = IERC20(address(0xdeadbeef));
        for (uint256 i = 0; i < to.length; i++) {
            require(token.transfer(to[i], 1 ether));
        }
    }
}
EOF

echo "=== Contract code ==="
cat "$CONTRACT"
echo ""

"$SOLC_BIN1" --asm "$CONTRACT" 2>&1 > "$ASM1"
"$SOLC_BIN2" --asm "$CONTRACT" 2>&1 > "$ASM2"

echo "=== solc-0.8.33: After 'call' - has '0x40 mstore' ==="
grep -A 30 "call$" "$ASM1" | head -32
echo ""

echo "=== solc-fix: After 'call' - NO '0x40 mstore' ==="
grep -A 30 "call$" "$ASM2" | head -32
echo ""

echo "=== Summary ==="
echo -n "solc-0.8.33 '0x40 mstore' after call: "
grep -A 30 "call$" "$ASM1" | grep -c "mstore"
echo -n "solc-fix '0x40 mstore' after call: "
grep -A 30 "call$" "$ASM2" | grep -c "mstore"

echo "=== Full assembly diff ==="
diff --color=always -u "$ASM1" "$ASM2" || true

Below is the output difference between solc-0.8.33 and the binary from this branch:

=== Contract code ===
// SPDX-License-Identifier: GPL-3.0
pragma solidity ^0.8.0;

interface IERC20 {
    function transfer(address to, uint256 amount) external returns (bool);
}

contract C {
    function batch(address[] calldata to) external {
        IERC20 token = IERC20(address(0xdeadbeef));
        for (uint256 i = 0; i < to.length; i++) {
            require(token.transfer(to[i], 1 ether));
        }
    }
}

=== solc-0.8.33: After 'call' - has '0x40 mstore' ===
      call
      iszero
      dup1
      iszero
      tag_20
      jumpi
      returndatacopy(0x00, 0x00, returndatasize)
      revert(0x00, returndatasize)
    tag_20:
      pop
      pop
      pop
      pop
      mload(0x40)
      returndatasize
      not(0x1f)
      0x1f
      dup3
      add
      and
      dup3
      add
      dup1
      0x40
      mstore
      pop
      dup2
      add
      swap1
      tag_21
      swap2

=== solc-fix: After 'call' - NO '0x40 mstore' ===
      call
      iszero
      dup1
      iszero
      tag_20
      jumpi
      returndatacopy(0x00, 0x00, returndatasize)
      revert(0x00, returndatasize)
    tag_20:
      pop
      pop
      pop
      pop
      mload(0x40)
      returndatasize
      dup2
      add
      swap1
      tag_21
      swap2
      swap1
      tag_22
      jump	// in
    tag_21:
        /* "/tmp/repro_sYrNrr.sol":338:377  require(token.transfer(to[i], 1 ether)) */
      tag_23
      jumpi
      revert(0x00, 0x00)
    tag_23:
        /* "/tmp/repro_sYrNrr.sol":319:322  i++ */
      dup1

=== Summary ===
solc-0.8.33 '0x40 mstore' after call: 1
solc-fix '0x40 mstore' after call: 0
=== Full assembly diff ===
--- /tmp/asm1_bo3eOS.txt	2026-02-05 02:44:24.431799228 +0100
+++ /tmp/asm2_eAS7VB.txt	2026-02-05 02:44:24.437799228 +0100
@@ -172,17 +172,6 @@
       pop
       mload(0x40)
       returndatasize
-      not(0x1f)
-      0x1f
-      dup3
-      add
-      and
-      dup3
-      add
-      dup1
-      0x40
-      mstore
-      pop
       dup2
       add
       swap1
@@ -871,7 +860,7 @@
       pop
       jump	// out

-    auxdata: 0xa264697066735822122085e9c5f03553bb1994ea8743afffeac4aa07f091c829a12afc87a843bf50318564736f6c63430008210033
+    auxdata: 0xa2646970667358221220d9c4845dd4821f965af9924299056cbc3bc4309708157ab785c6d7f67ccb565f64736f6c63430008220033
 }

@r0qs r0qs force-pushed the fix-16440 branch 2 times, most recently from b076e02 to 596eac5 Compare February 4, 2026 23:48
@r0qs r0qs changed the title External calls returning only value types no longer allocate persistent memory in via-IR codegen External calls returning only value types no longer allocate persistent memory in via-ir and legacy pipelines Feb 4, 2026
@r0qs r0qs changed the title External calls returning only value types no longer allocate persistent memory in via-ir and legacy pipelines External calls returning only value types no longer allocate persistent memory in IR and legacy pipelines Feb 4, 2026
@r0qs r0qs force-pushed the fix-16440 branch 2 times, most recently from e4717cd to a837c17 Compare February 5, 2026 00:51
// compileViaYul: false
// ----
// testStaticArray() -> 64
// testMemoryGrowsInLoop() -> 640
Copy link
Member Author

@r0qs r0qs Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test documents a behavioral difference in the legacy pipeline: the V1 decoder decodes static arrays in place, without allocating separate memory for the decoded copy. This distinction is specific to the legacy pipeline since via-ir always uses the V2 decoder. For V2 behavior, see the test functionCall/external_call_reference_type_allocates_memory.sol. I am not aware if this difference is documented elsewhere.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this matter? From the caller's perspective, they still get a fresh piece of memory. The fact that it's not allocated on its own but only as a part of a bigger block is not visible to the user.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one way it could matter, but only for dynamic types (or static types nested in dynamic ones) so it does not apply here. If the data is encoded in such a way that multiple offsets point at the same tail (say you have two identical arrays so you save space by reusing the same data), doing this would result in the user getting two decoded arrays also sharing the same memory. And since those are not read-only, changing one would also change the other.

I don't think this is something that can happen here. But would also be good to have a semantic test proving it, for both decoders.

@r0qs
Copy link
Member Author

r0qs commented Feb 5, 2026

ir-no-optimize

project bytecode_size deployment_gas method_gas
brink -0.23% ✅
colony -0.53% ✅
elementfi -0.26% ✅
ens -0.34% ✅
euler
gnosis
gp2 -0.16% ✅
pool-together -0.2% ✅
uniswap -0.36% ✅ -0.44% ✅ -0.12% ✅
yield_liquidator -0.38% ✅ -0.34% ✅ -0.1% ✅
zeppelin

ir-optimize-evm+yul

project bytecode_size deployment_gas method_gas
brink -0.73% ✅
colony -1.02% ✅
elementfi -0.52% ✅
ens -0.7% ✅ -1.06% ✅ -0.14% ✅
euler -0.62% ✅ -0.58% ✅ -0.06% ✅
gnosis
gp2 -0.26% ✅
pool-together -0.34% ✅
uniswap -0.48% ✅ -0.54% ✅ -0.26% ✅
yield_liquidator -0.85% ✅ -0.71% ✅ -0.09% ✅
zeppelin -0.22% ✅ -0.2% ✅ -0.2% ✅

ir-optimize-evm-only

project bytecode_size deployment_gas method_gas
brink -0.34% ✅
colony -0.92% ✅
elementfi -0.29% ✅
ens -0.34% ✅ -0.49% ✅ -0.12% ✅
euler
gnosis
gp2 -0.16% ✅
pool-together -0.18% ✅
uniswap -0.31% ✅ -0.36% ✅ -0.12% ✅
yield_liquidator -0.36% ✅ -0.31% ✅ -0.07% ✅
zeppelin -0.12% ✅

legacy-no-optimize

project bytecode_size deployment_gas method_gas
brink 0%
colony -0.53% ✅
elementfi -0.33% ✅
ens -0.35% ✅
euler -0.55% ✅ -0.58% ✅ -0.02% ✅
gnosis -0.16% ✅
gp2 -0.12% ✅
pool-together -0.19% ✅
uniswap -0.36% ✅ -0.52% ✅ -0.07% ✅
yield_liquidator -0.48% ✅ -0.43% ✅ -0.03% ✅
zeppelin

legacy-optimize-evm+yul

project bytecode_size deployment_gas method_gas
brink 0%
colony -1.02% ✅
elementfi -0.58% ✅
ens -0.62% ✅ -1.1% ✅ -0.04% ✅
euler -0.94% ✅ -0.97% ✅ -0.02% ✅
gnosis -0.34% ✅
gp2 -0.22% ✅
pool-together -0.36% ✅
uniswap -0.58% ✅ -0.8% ✅ -0.12% ✅
yield_liquidator -0.97% ✅ -0.84% ✅ -0.03% ✅
zeppelin -0.22% ✅ -0.21% ✅ -0.16% ✅

legacy-optimize-evm-only

project bytecode_size deployment_gas method_gas
brink 0%
colony -0.92% ✅
elementfi -0.53% ✅
ens -0.56% ✅ -1.02% ✅ -0.04% ✅
euler -0.85% ✅ -0.88% ✅ -0.02% ✅
gnosis -0.29% ✅
gp2 -0.2% ✅
pool-together -0.32% ✅
uniswap -0.54% ✅ -0.74% ✅ -0.11% ✅
yield_liquidator -0.87% ✅ -0.77% ✅ -0.03% ✅
zeppelin

!V = version mismatch
!B = no value in the "before" version
!A = no value in the "after" version
!T = one or both values were not numeric and could not be compared
-0 = very small negative value rounded to zero
+0 = very small positive value rounded to zero

templ("finalizeAllocation", m_utils.finalizeAllocationFunction());

// Only update free memory pointer if any return type needs memory (reference types).
// Value types are decoded directly to the stack via mload.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this may be true now, I think that just straight out hard-coding this assumption here is fragile and can lead to bugs.

The assumption here is actually a bit broader than just that we're only dealing with value types. We can safely free this memory if and only if we know there will be no other memory allocations before we're done using this piece of memory. There is no guarantee that the decoder won't in the future allocate any extra memory for other purposes, even when it's only dealing with value types.

A safer approach would be to modify tupleDecoder() to return a value that tells us whether the function it generated needs any allocations. It would at least clearly signal to anyone modifying it that the downstream code relies on whether memory was allocated. It would also be more thorough - if there are any other cases in which we don't allocate, we'll avoid allocating in those too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, checking tupleDecoder() I noticed that this is not the only case where we can do this optimization. We're doing pretty much the same thing in YulUtilFunctions::copyConstructorArgumentsToMemoryFunction() when we're decoding arguments in a constructor.

needToUpdateFreeMemoryPtr = true;
else
for (auto const& retType: returnTypes)
if (dynamic_cast<ReferenceType const*>(retType))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we've been doing the same assumption here all along, just the opposite way, by checking if we have a ReferenceType. I think we should eliminate it while we're at it and have abiDecode() tell us if it allocated as well. In this case it's actually even more straightforward because that function itself manages the free mem ptr, so it directly knows if it has moved the value.

Comment on lines -2945 to -2947
// The old decoder did not allocate any memory (i.e. did not touch the free
// memory pointer), but kept references to the return data for
// (statically-sized) arrays
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing this? Isn't it still true? I don't see any allocation for static types in CompilerUtils::abiDecode().

Though I also do not see any decoding for such types there, but I guess it's because it does not support nested arrays and assumes that for other types the memory layout matches calldata layout? What about arrays of structs though?

Also, if that's the case then the logic checking ReferenceType below is also wrong...

Comment on lines -717 to -724
let newFreePtr := add(_4, and(add(_6, 31), not(31)))
if or(gt(newFreePtr, 0xffffffffffffffff), lt(newFreePtr, _4))
{
mstore(0, shl(224, 0x4e487b71))
mstore(4, 0x41)
revert(0, 0x24)
}
mstore(64, newFreePtr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised how much code this simple change removes. Looks like finalize_allocation() and its panic_error_0x41() are getting inlined quite a lot, which bloats the bytecode.

// compileViaYul: false
// ----
// testStaticArray() -> 64
// testMemoryGrowsInLoop() -> 640
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this matter? From the caller's perspective, they still get a fresh piece of memory. The fact that it's not allocated on its own but only as a part of a bigger block is not visible to the user.

// compileViaYul: false
// ----
// testStaticArray() -> 64
// testMemoryGrowsInLoop() -> 640
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one way it could matter, but only for dynamic types (or static types nested in dynamic ones) so it does not apply here. If the data is encoded in such a way that multiple offsets point at the same tail (say you have two identical arrays so you save space by reusing the same data), doing this would result in the user getting two decoded arrays also sharing the same memory. And since those are not read-only, changing one would also change the other.

I don't think this is something that can happen here. But would also be good to have a semantic test proving it, for both decoders.

@cameel
Copy link
Collaborator

cameel commented Feb 5, 2026

r0qs wants to merge 2 commits into develop from fix-16440

Please give your branches more meaningful names. I'll have no idea what this is when I see it later in git output :) The name is visible in the merge commit and it's convenient when these tell you at a glance what the branch was about. Please don't break that.

@cameel
Copy link
Collaborator

cameel commented Feb 5, 2026

I'm not sure why this was added back then, but the ABICoderV2 decoder for value types doesn't actually allocate memory.

Why allocate new memory if you already allocated a nice big chunk of it and can just returns pointers to what's inside?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory allocation on non-memory returndata

2 participants