Skip to content

Conversation

@akiramenai
Copy link
Collaborator

No description provided.

nikic and others added 30 commits September 29, 2025 16:21
If the uint64_t constructor is used, assert that the value is actually a
signed or unsigned N-bit integer depending on whether the isSigned flag
is set. Provide an implicitTrunc flag to restore the previous behavior,
where the argument is silently truncated instead.

In this commit, implicitTrunc is enabled by default, which means that
the new assertions are disabled and no actual change in behavior occurs.
The plan is to flip the default once all places violating the assertion
have been fixed. See #80309 for the scope of the necessary changes.

The primary motivation for this change is to avoid incorrectly specified
isSigned flags. A recurring problem we have is that people write
something like `APInt(BW, -1)` and this works perfectly fine -- until
the code path is hit with `BW > 64`. Most of our i128 specific
miscompilations are caused by variants of this issue.

The cost of the change is that we have to specify the correct isSigned
flag (and make sure there are no excess bits) for uses where BW is
always <= 64 as well.
This makes sure that APInt::getAllOnes() keeps working after the APInt
constructor assertions are enabled.

I'm relaxing the requirement for the signed case to either an all zeros
or all ones integer. This is basically saying that we can interpret the
zero-width integer as either positive or negative.
…CI) (#80309)

This fixes all the places that hit the new assertion added in
llvm/llvm-project#106524 in tests. That is,
cases where the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.

The fixes either set the correct value for isSigned, set the
implicitTrunc flag, or perform more calculations inside APInt.

Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
Currently it only deals with the case where we're subtracting adds with
at most one non-constant operand. This patch extends it to cancel out
common operands for the subtraction of arbitrary add expressions.

The background here is that I want to replace a getMinusSCEV() call in
LAA with computeConstantDifference():

https://github.com/llvm/llvm-project/blob/93fecc2577ece0329f3bbe2719bbc5b4b9b30010/llvm/lib/Analysis/LoopAccessAnalysis.cpp#L1602-L1603

This particular call is very expensive in some cases (e.g. lencod with
LTO) and computeConstantDifference() could achieve this much more
cheaply, because it does not need to construct new SCEV expressions.

However, the current computeConstantDifference() implementation is too
weak for this and misses many basic cases. This is a step towards making
it more powerful while still keeping it pretty fast.
Produces -Wrange-loop-construct on some buildbots.
The Mul factor was zero-extended here, resulting in incorrect
results for integers larger than 64-bit.

As we currently only multiply by 1 or -1, just split this into
two cases -- there's no need for a full multiplication here.

Fixes llvm/llvm-project#102597.
This transform is working on signed integer, so this is the
logically correct API.

Split off from llvm/llvm-project#80309.
The -1 constant should be sign extended, not zero extended.

Split out from llvm/llvm-project#80309.
We were missing the signed flag on the negative value, so the
range was incorrectly interpreted for integers larger than 64-bit.

Split out from llvm/llvm-project#80309.
… a few places. (#104555)

PR #80309 proposes to have users of APInt's uint64_t
constructor opt-in to implicit truncation. Currently, that patch
requires SelectionDAG::getConstant to opt-in.

This patch adds getSignedConstant so we can start fixing some of the
cases that require implicit truncation.
Split out from llvm/llvm-project#80309 to
avoid assertion failures in the future.
Split out from llvm/llvm-project#80309 to
avoid assertion failures in the future.
This handles the edge case where BitWidth is 1 and doing the
increment gets a value that's not valid in that width, while we
just want wrap-around.

Split out of llvm/llvm-project#80309.
The result here may require truncation. Fix this by removing the
calculateOffsetDiff() helper entirely. As far as I can tell, this
code does not actually have to deal with different bitwidths.

findBaseConstants() will produce ranges of constants with equal
types, which is what maximizeConstantsInRange() will then work
on.

Fixes assertion reported at:
llvm/llvm-project#114539 (comment)
This is creating an APInt with value 2, which may not be well-defined
for i1 values. Fix this by replacing the Y.umul_sat(2) with
Y.uadd_sat(Y).
…878)

Add `ICmpInst::compare()` overload accepting `KnownBits`, similar to the
existing one accepting `APInt`. This is not directly part of KnownBits
(or APInt) for layering reasons.
The (extended) bit width might not fit into the (non-extended)
type, resulting in an incorrect truncation of the compared value.

Fix this by using m_SpecificInt(), which is both simpler and
handles this correctly.

Fixes the assertion failure reported in:
llvm/llvm-project#114539 (comment)
vladimirradosavljevic and others added 29 commits October 17, 2025 21:57
…module bytecode exceeds the 24KB limit

Key changes introduced:

 - Transitioned from a MachineFunction pass to a Module-level pass
 - Implemented module-wide code size estimation
   (not 100% exact but sufficiently accurate for our purpose)
 - Applied transformations by iterating over functions within the module.
   For each function:
   - Handled both loop and non-loop basic blocks using the default heuristic
     (i.e., based on the presence of the OptSize attribute)
   - If the module size exceeds the defined limit, switched to a fallback mode
     that treats all functions as if they have OptSize flag.
 - Repeatedly traversed all functions, targeting only basic blocks at a specific
   loop nesting depth (starting from depth 0 for non-loop blocks), and
   re-evaluated module size. The process stops once the size drops below the
   limit, or reaching the loop depth limit.

This enables a balanced distribution of runtime gas regressions across functions
within the module, resulting from aggressive constant unfolding.
…uracy

EVM opcodes can be classified based on how their behavior
or output depends on the transaction state:
 - Readnone (Pure)
 - Volatile (State-Dependent)
 - Side-Effecting (State-Changing)
(Reference: EVM opcodes categorization)

This patch adjusts the memory attributes of LLVM intrinsics corresponding
to these opcodes. At the LLVM IR level, the transaction-scoped EVM
state is modeled as reads/writes to inaccessible memory. This state does
not include the heap, which is modeled separately via regular LLVM pointer
parameters. State-dependent intrinsics are now marked as reading from
inaccessible memory. State-changing intrinsics are marked as both reading
from and writing to it.
To capture memory dependencies between plain loads/stores to storage
(or transient storage) and context (CALL* or CREATE* like) intrinsics, we
extended EVM alias analysis to determine aliasing between the call and
the memory location in a custom way.
… signextend(b, x)

Signed-off-by: Vladimir Radosavljevic <[email protected]>
Implement instCombineIntrinsic in TTI and do the folding.
Doing this early in the pipeline, we give LLVM opportunity
to do more optimizations.

Signed-off-by: Vladimir Radosavljevic <[email protected]>
In this change, NVPTX AA is moved before Basic AA to potentially improve
compile time. Additionally, it introduces a flag in the
`ExternalAAWrapper` that allows other backends to run their
target-specific AA passes before Basic AA, if desired.

The change works for both New Pass Manager and Legacy Pass Manager.

Original implementation by Princeton Ferro <[email protected]>
The patch introduces a few SelectionDAG patterns to improve codegen for the
`br (brcond (setcc))` case. Additionally, due to the late expansion of
JUMP_UNLESS, the patch introduces a peephole pass that optimizes JUMPI
predicates:

* `ISZERO ISZERO` is folded to nothing
* `EQ ISZERO` is folded to `SUB`
* `SUB ISZERO` is folded to `EQ`
* `ISZERO ISZERO OR* PseudoJUMPI` -> `OR* PseudoJUMPI`
* `EQ ISZERO OR* PseudoJUMPI` -> `SUB OR* PseudoJUMPI`
* `SUB ISZERO OR* PseudoJUMPI` -> `EQ OR* PseudoJUMPI`
…alue when BB is unreachable-terminated

Add positive and negative tests to show why in some
cases is preferable to inline when BB is
unreachable-terminated. Functions callee_inline is
taken from real world example where inlining this
function reduces one SLOAD after optimizations (we have
the same load from the callers and in the callee).

Signed-off-by: Vladimir Radosavljevic <[email protected]>
…le-terminated

For EVM, to return from the contract we are using
different instructions (e.g. return, revert) which
are followed by unreachable. In the Inliner heuristic,
if BB is unreachable-terminated threshold is set to 0
and these callsites are unlikely to be inlined. Instead,
add small threshold and continue to calculate cost model,
since in some cases we benefit from inlining.

Signed-off-by: Vladimir Radosavljevic <[email protected]>
…#124193)

This resolves the `-Wignored-qualifiers` warning introduced by the new
warnign in llvm/llvm-project#121419. First
caught in buildbot `ppc64le-lld-multistage-test`

https://lab.llvm.org/buildbot/#/builders/168/builds/7756

---------

Co-authored-by: Henry Jiang <[email protected]>
This enables optimization for EVM and implements
isLegalAddressingMode, allowing r + imm addressing
mode for all address spaces except CALLDATALOAD.
Benchmarks indicate that enabling it for CALLDATALOAD
is not beneficial. This change primarily allows
address mode calculations (r + imm) to be sunk
to their uses, which can reduce spills and
reloads in some cases.

Signed-off-by: Vladimir Radosavljevic <[email protected]>
From the description, staticcall can't change storage,
so update the code to ensure no storage modifications
occur during staticcall.

Signed-off-by: Vladimir Radosavljevic <[email protected]>
This will help optimizations that use alias analysis
to search past it, and to optimize further. Benchmark
numbers showed that this is not profitable to do for
heap address space.

Signed-off-by: Vladimir Radosavljevic <[email protected]>
@akiramenai akiramenai force-pushed the dborisenkov-eravmless branch from d623479 to 7a9f81a Compare October 17, 2025 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.