[clang-tidy] Adds do-while support to performance-inefficient-string-concatenation#186607
Merged
[clang-tidy] Adds do-while support to performance-inefficient-string-concatenation#186607
Conversation
Member
|
@llvm/pr-subscribers-clang-tools-extra Author: Berkay Sahin (berkaysahiin) ChangesCloses #186362 Full diff: https://github.com/llvm/llvm-project/pull/186607.diff 4 Files Affected:
diff --git a/clang-tools-extra/clang-tidy/performance/InefficientStringConcatenationCheck.cpp b/clang-tools-extra/clang-tidy/performance/InefficientStringConcatenationCheck.cpp
index 92e3220fdb817..1067fca289a2c 100644
--- a/clang-tools-extra/clang-tidy/performance/InefficientStringConcatenationCheck.cpp
+++ b/clang-tools-extra/clang-tidy/performance/InefficientStringConcatenationCheck.cpp
@@ -53,11 +53,11 @@ void InefficientStringConcatenationCheck::registerMatchers(
Finder->addMatcher(cxxOperatorCallExpr(anyOf(AssignOperator, PlusOperator)),
this);
} else {
- Finder->addMatcher(
- cxxOperatorCallExpr(anyOf(AssignOperator, PlusOperator),
- hasAncestor(stmt(anyOf(cxxForRangeStmt(),
- whileStmt(), forStmt())))),
- this);
+ Finder->addMatcher(cxxOperatorCallExpr(anyOf(AssignOperator, PlusOperator),
+ hasAncestor(stmt(anyOf(
+ cxxForRangeStmt(), whileStmt(),
+ forStmt(), doStmt())))),
+ this);
}
}
diff --git a/clang-tools-extra/docs/ReleaseNotes.rst b/clang-tools-extra/docs/ReleaseNotes.rst
index 4b207609d598d..a3714736aa988 100644
--- a/clang-tools-extra/docs/ReleaseNotes.rst
+++ b/clang-tools-extra/docs/ReleaseNotes.rst
@@ -315,6 +315,10 @@ Changes in existing checks
- Fixes false negatives when using ``std::set`` from ``libstdc++``.
+- Improved :doc:`performance-inefficient-string-concatenation
+ <clang-tidy/checks/performance/performance-inefficient-string-concatenation>` check by
+ adding support for detecting inefficient string concatenation in ``do-while`` loops.
+
- Improved :doc:`performance-inefficient-vector-operation
<clang-tidy/checks/performance/inefficient-vector-operation>` check by
correctly handling vector-like classes when ``push_back``/``emplace_back`` are
diff --git a/clang-tools-extra/docs/clang-tidy/checks/performance/inefficient-string-concatenation.rst b/clang-tools-extra/docs/clang-tidy/checks/performance/inefficient-string-concatenation.rst
index 92b6b4e0370d6..1dacf91389154 100644
--- a/clang-tools-extra/docs/clang-tidy/checks/performance/inefficient-string-concatenation.rst
+++ b/clang-tools-extra/docs/clang-tidy/checks/performance/inefficient-string-concatenation.rst
@@ -55,5 +55,5 @@ Options
.. option:: StrictMode
- When `false`, the check will only check the string usage in ``while``, ``for``
+ When `false`, the check will only check the string usage in ``while``, ``do-while``, ``for``
and ``for-range`` statements. Default is `false`.
diff --git a/clang-tools-extra/test/clang-tidy/checkers/performance/inefficient-string-concatenation.cpp b/clang-tools-extra/test/clang-tidy/checkers/performance/inefficient-string-concatenation.cpp
index 72080ed39e59b..adc37e4c4bedf 100644
--- a/clang-tools-extra/test/clang-tidy/checkers/performance/inefficient-string-concatenation.cpp
+++ b/clang-tools-extra/test/clang-tidy/checkers/performance/inefficient-string-concatenation.cpp
@@ -32,5 +32,11 @@ int main() {
f(mystr2 + mystr1);
mystr1 = g(mystr1);
}
+
+ do {
+ mystr1 = mystr1 + mystr2;
+ // CHECK-MESSAGES: :[[@LINE-1]]:5: warning: string concatenation results in allocation of unnecessary temporary strings; consider using 'operator+=' or 'string::append()' instead
+ } while (0);
+
return 0;
}
|
Member
|
@llvm/pr-subscribers-clang-tidy Author: Berkay Sahin (berkaysahiin) ChangesCloses #186362 Full diff: https://github.com/llvm/llvm-project/pull/186607.diff 4 Files Affected:
diff --git a/clang-tools-extra/clang-tidy/performance/InefficientStringConcatenationCheck.cpp b/clang-tools-extra/clang-tidy/performance/InefficientStringConcatenationCheck.cpp
index 92e3220fdb817..1067fca289a2c 100644
--- a/clang-tools-extra/clang-tidy/performance/InefficientStringConcatenationCheck.cpp
+++ b/clang-tools-extra/clang-tidy/performance/InefficientStringConcatenationCheck.cpp
@@ -53,11 +53,11 @@ void InefficientStringConcatenationCheck::registerMatchers(
Finder->addMatcher(cxxOperatorCallExpr(anyOf(AssignOperator, PlusOperator)),
this);
} else {
- Finder->addMatcher(
- cxxOperatorCallExpr(anyOf(AssignOperator, PlusOperator),
- hasAncestor(stmt(anyOf(cxxForRangeStmt(),
- whileStmt(), forStmt())))),
- this);
+ Finder->addMatcher(cxxOperatorCallExpr(anyOf(AssignOperator, PlusOperator),
+ hasAncestor(stmt(anyOf(
+ cxxForRangeStmt(), whileStmt(),
+ forStmt(), doStmt())))),
+ this);
}
}
diff --git a/clang-tools-extra/docs/ReleaseNotes.rst b/clang-tools-extra/docs/ReleaseNotes.rst
index 4b207609d598d..a3714736aa988 100644
--- a/clang-tools-extra/docs/ReleaseNotes.rst
+++ b/clang-tools-extra/docs/ReleaseNotes.rst
@@ -315,6 +315,10 @@ Changes in existing checks
- Fixes false negatives when using ``std::set`` from ``libstdc++``.
+- Improved :doc:`performance-inefficient-string-concatenation
+ <clang-tidy/checks/performance/performance-inefficient-string-concatenation>` check by
+ adding support for detecting inefficient string concatenation in ``do-while`` loops.
+
- Improved :doc:`performance-inefficient-vector-operation
<clang-tidy/checks/performance/inefficient-vector-operation>` check by
correctly handling vector-like classes when ``push_back``/``emplace_back`` are
diff --git a/clang-tools-extra/docs/clang-tidy/checks/performance/inefficient-string-concatenation.rst b/clang-tools-extra/docs/clang-tidy/checks/performance/inefficient-string-concatenation.rst
index 92b6b4e0370d6..1dacf91389154 100644
--- a/clang-tools-extra/docs/clang-tidy/checks/performance/inefficient-string-concatenation.rst
+++ b/clang-tools-extra/docs/clang-tidy/checks/performance/inefficient-string-concatenation.rst
@@ -55,5 +55,5 @@ Options
.. option:: StrictMode
- When `false`, the check will only check the string usage in ``while``, ``for``
+ When `false`, the check will only check the string usage in ``while``, ``do-while``, ``for``
and ``for-range`` statements. Default is `false`.
diff --git a/clang-tools-extra/test/clang-tidy/checkers/performance/inefficient-string-concatenation.cpp b/clang-tools-extra/test/clang-tidy/checkers/performance/inefficient-string-concatenation.cpp
index 72080ed39e59b..adc37e4c4bedf 100644
--- a/clang-tools-extra/test/clang-tidy/checkers/performance/inefficient-string-concatenation.cpp
+++ b/clang-tools-extra/test/clang-tidy/checkers/performance/inefficient-string-concatenation.cpp
@@ -32,5 +32,11 @@ int main() {
f(mystr2 + mystr1);
mystr1 = g(mystr1);
}
+
+ do {
+ mystr1 = mystr1 + mystr2;
+ // CHECK-MESSAGES: :[[@LINE-1]]:5: warning: string concatenation results in allocation of unnecessary temporary strings; consider using 'operator+=' or 'string::append()' instead
+ } while (0);
+
return 0;
}
|
localspook
approved these changes
Mar 14, 2026
clang-tools-extra/docs/clang-tidy/checks/performance/inefficient-string-concatenation.rst
Outdated
Show resolved
Hide resolved
…nt-string-concatenation.rst Co-authored-by: Victor Chernyakin <chernyakin.victor.j@outlook.com>
vbvictor
approved these changes
Mar 14, 2026
zeyi2
approved these changes
Mar 15, 2026
Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com>
Sukumarsawant
added a commit
to Sukumarsawant/llvm-project
that referenced
this pull request
Mar 17, 2026
Added tests and resolved conflicts
added a smoke test
[clang-format] Fix incorrect trailing comment and escaped newlines when AlignArrayOfStructures is enabled (#180305)
This change fixes how the spaces are modified during alignment.
Previously it was inconsistent whether the `StartOfTokenColumn` and
`PreviousEndOfTokenColumn` members of `WhitespaceManager::Change`s were
also updated when their `Spaces` member was changed to align tokens.
A new function has been added that properly maintains the relationship
between these members, and all places that directly modified `Spaces`
have been replaced with calls to this new function.
Fixes https://github.com/llvm/llvm-project/issues/138151. Fixes
https://github.com/llvm/llvm-project/issues/85937. Fixes
https://github.com/llvm/llvm-project/issues/53442. Tests have been added
to ensure they stay fixed.
Attribution Note - I have been authorized to contribute this change on
behalf of my company: ArenaNet LLC
libclc: Disable contract in trig reductions (#186432)
libclc: Remove attempt at subnormal flush from trig functions (#186424)
[clang-format] Ignore imports in comments for Java import sorting (#177326)
Java source files can contain apparent import statements inside block
comments (e.g., showing a code example). These can get mixed up with
real import statements when run through clang-format.
This patch tracks block comments (/* ... */) so that we skip lines that
are inside them.
Fixes #176771
---------
Co-authored-by: Natalia Kokoromyti <knatalia@yost-cm-01-imme.stanford.edu>
Co-authored-by: owenca <owenpiano@gmail.com>
[lldb/test] Fix MTC dylib path for newer Darwin embedded devices (NFC)
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
[clang-tidy] Fix virtual inheritance FP in misc-multiple-inheritance (#186103)
Avoid double-counting concrete bases introduced through virtual
inheritance in `misc-multiple-inheritance`.
As of AI-Usage: Gemini 3 is used for pre-commit reviewing.
Closes https://github.com/llvm/llvm-project/issues/186059
[SPIRV][NFC] Drop uses of BranchInst (#186514)
Also simplify the code to use successors().
[lldb][NativePDB] Require `target-windows` for MSVC test (#186578)
Fixes the failure on the lldb-remote-linux-win buildbot
(https://github.com/llvm/llvm-project/pull/186124#issuecomment-4060098881).
The test runs MSVC to produce an executable that only runs on Windows.
[lldb] Fix heap.py crashes on recent Darwin embedded targets
Two fixes for the ptr_refs/cstr_refs/find_variable heap commands:
1. Move the `task` variable declaration into the common expression
preamble. Previously it was only declared inside the `search_heap`
code path, causing compilation errors when using `--ignore-heap`
with stack or segment scanning.
2. On recent iOS, some shared cache __DATA_CONST pages are remapped to
non-accessible at runtime, even though the Mach-O section metadata
still marks them as readable. The segment scan would crash with
EXC_BAD_ACCESS when reading these pages. Fix by querying actual
VM region permissions via SBProcess.GetMemoryRegionInfo() and
splitting sections at region boundaries to only scan readable
portions.
rdar://172543652
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
[Transforms][NFC] Drop uses of BranchInst in headers (#186580)
Replace BranchInst with CondBrInst/UncondBrInst/Instruction in headers
and handle the related fall out.
The removed code in simplifyUncondBranch was made dead in
0895b836d74ed333468ddece2102140494eb33b6, where FoldBranchToCommonDest
was changed to only handle conditional branches.
[Transforms/Utils][NFC] Drop uses of BranchInst (#186586)
[x86][GlobalISel] Select MOV32ri64 for unsigned 32-bit i64 constants (#185182)
x86 GlobalISel currently selects `MOV64ri32` for signed 32-bit `i64`
constants and falls back to `MOV64ri` otherwise.
That misses the unsigned 32-bit case, where `MOV32ri64` is a better
match.
FastISel already handles this case by using `MOV32ri64` for
zero-extended
32-bit values.
Update `X86InstructionSelector::selectConstant()` to select `MOV32ri64`
for `i64` constants that fit in `uint32_t`, while keeping `MOV64ri32`
for signed 32-bit values and `MOV64ri` for larger constants.
This reduces the encoding size for these constants and fixes the
`0xffffffff` boundary case to use the correct zero-extending move.
[X86] apply mulx optimization for two-wide mul instruction (mull, mulq) (#185127)
References: https://github.com/llvm/llvm-project/pull/184462
In the discussion for the linked PR, which removes unnecessary register
to register moves when one operand is in %rdx for mulx, the point was
brought up that this pattern also happens for mull and mulq.
The IR below:
```llvm
declare i32 @foo32()
declare i64 @foo64()
define i32 @mul32_no_implicit_copy(i32 %a0) {
%a1 = call i32 @foo32()
%a2 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %a0, i32 %a1)
%a3 = extractvalue { i32, i1 } %a2, 0
ret i32 %a3
}
define i64 @mul64_no_implicit_copy(i64 %a0) {
%a1 = call i64 @foo64()
%a2 = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a0, i64 %a1)
%a3 = extractvalue { i64, i1 } %a2, 0
ret i64 %a3
}
```
Generates this code on current HEAD:
```asm
mul32_no_implicit_copy: # @mul32_no_implicit_copy
push rbx
mov ebx, edi
call foo32@PLT
mov ecx, eax
mov eax, ebx
mul ecx
pop rbx
ret
mul64_no_implicit_copy: # @mul64_no_implicit_copy
push rbx
mov rbx, rdi
call foo64@PLT
mov rcx, rax
mov rax, rbx
mul rcx
pop rbx
ret
```
Where the register shuffling before the mul is the same pattern as for
mulx in the previous PR.
With this branch it generates this code now:
```asm
mul32_no_implicit_copy:
pushq %rbx
movl %edi, %ebx
callq foo32@PLT
mull %ebx
popq %rbx
retq
mul64_no_implicit_copy:
pushq %rbx
movq %rdi, %rbx
callq foo64@PLT
mulq %rbx
popq %rbx
retq
```
[StructurizeCFG] Fix incorrect zero-cost hoisting in nested control flow (#183792)
hoistZeroCostElseBlockPhiValues() hoists zero-cost instructions from
else blocks to their common dominator with the then block. When the
merge point has additional predecessors beyond the simple if-else
pattern, the hoisted instruction ends up in a dominator that feeds
a Flow phi on every edge, including edges where the else block was
never taken. simplifyHoistedPhis() then replaces poison entries in
those Flow phis with the hoisted value, causing it to leak into
unrelated paths.
This manifests as miscompilation in sorting kernels compiled with
code coverage: the PGO counter blocks create deeply nested CFGs
where the hoisted shufflevector (used for swapping sort keys)
reaches the no-swap path, corrupting sort results.
Fix by requiring a simple if-else CFG shape before hoisting: ThenBB
must branch directly to ElseSucc and ElseSucc must have exactly 2
predecessors. This matches the structure that simplifyHoistedPhis
assumes.
[RISCV] Add more extensions to spacemit-x100 (#186351)
[RISCV][NFC] Move extension test for spacemit-x60 to a separate file (#186357)
[CIR] Add Commutative/Idempotent traits to binary ops (#185163)
Add missing MLIR traits to CIR binary operations:
- AndOp, OrOp: Commutative, Idempotent
- AddOp, MulOp, XorOp, MaxOp: Commutative
Add these ops to the CIRCanonicalize pass op list so trait-based
folding is exercised by applyOpPatternsGreedily.
[clang-tidy] Fix false positive in `readability-else-after-return` on `return` jumped over by `goto` (#186370)
Given this code:
```cpp
if (...) {
goto skip_over_return;
return;
skip_over_return:
foo();
} else {
...
}
```
...the check suggests removing the `else`, which is not a valid
transformation. This is because it looks at *all* the substatements of
the then-branch for interrupting statements. This PR changes it to only
look at the *final* substatement.
Technically, this introduces a false negative on code like this:
```cpp
if (...) {
return;
dead_code();
} else { // <-- Could in theory remove this 'else'
...
}
```
But, that code is objectively bad, so I don't think we're losing
anything.
This change has the side effect of making the check a bit more general;
it now recognizes attributed interrupting statements (e.g.
`[[clang::musttail]] return f();`).
[Transforms/Scalar][NFC] Drop uses of BranchInst (#186592)
I ended up relaxing some of the checks that LoopInterchange made, the
assumptions that certain instructions were branches seemed to not be
used at all.
[LV] Move predication, early exit & region handling to VPlan0 (NFCI) (#185305)
Move handleEarlyExits, predication and region creation to operate
directly on VPlan0. This means they only have to run once, reducing
compile time a bit; the relative order remains unchanged.
Introducing the regions at this point in particular unlocks performing
more transforms once, on the initial VPlan, instead of running them for
each VF.
Whether a scalar epilogue is required is still determined by legacy cost
model, so we need to still account for that in the VF specific VPlan
logic.
PR: https://github.com/llvm/llvm-project/pull/185305
[IPO][InstCombine][Vectorize][NFCI] Drop uses of BranchInst (#186596)
Refactor remaining parts of Transforms apart from Scalar and Utils.
[IR][NFC] Remove BranchInst successor functions (#186604)
The efficient access is now handled by UncondBrInst/CondBrInst,
Instruction functions handle the more generic cases. These functions are
now largely unused now that most uses of BranchInst are gone.
Preliminary work for making the CondBrInst operand order consistent.
[WebAssembly][NFC] Rename and test FastISel selectBr (#186577)
selectBr only handles conditional branches and also wasn't tested.
Clarify the name and add test that enforces that there's no fallback.
[X86] Reject 'p' constraint without 'a' modifier in inline asm (#185799)
The 'p' constraint produces an address operand that should only be
printed with the 'a' modifier (e.g., %a0). Without it, GCC and Clang
produce different and arguably incorrect output
https://github.com/llvm/llvm-project/issues/185343#issuecomment-4029670370
Reject the combination to catch misuse early.
[llvm-mc] Default output assembly variant to AssemblerDialect (#186317)
Previously, llvm-mc always defaulted to output assembly variant 0
regardless of the target's AssemblerDialect. This was inconsistent:
llvm-mc -x86-asm-syntax=intel changed the input parser to Intel syntax
but output stayed AT&T, unlike clang's -masm=intel which affects both.
When --output-asm-variant is not explicitly specified, fall back to
MAI->getAssemblerDialect() instead of hardcoding variant 0. This
makes the output match the target's configured dialect:
- X86: -x86-asm-syntax=intel now produces Intel output
- AArch64: Apple triples default to Apple syntax output
- SystemZ: z/OS triples default to HLASM syntax output
Tests that relied on a specific output variant now use explicit
--output-asm-variant=0.
[lldb] Rename Status variables to avoid confusion (NFC) (#186486)
Rename Status variables that are named `error` to `status` to avoid
confusion with llvm::Error as the latter becomes more and more
prevalent.
[lldb] Skip tests that are incompatible with MTE (#186043)
Skip tests that are incompatible with MTE.
Depends on:
- https://github.com/llvm/llvm-project/pull/185780
[IR] Add Instruction::successors() (#186606)
Nowadays all terminators store all successor operands consecutively, so
we can expose the range of successors through a unified interface.
Rename succ_op_iterator to succ_iterator for consistency, also with
Machine IR.
Preliminary work for replacing the succ_iterator in CFG.h with an
iterator that iterates directly over the uses.
[msan][NFCI] Replace unnecessary shadow cast with assertion (#186498)
Fabian Wolff pointed out that #176031 made the output of CreateIntCast()
unused in handleBitwiseAnd().
Upon closer inspection, the CreateIntCast()s are unnecessary, because the
arguments to handleBitwiseAnd() (and visitOr()) are integers or vectors of
integers, for which the shadow types are the same as the original types.
This patch removes the unnecessary if and shadow cast, and adds
assertions.
[CIR] Add cir.min op and refactor cir.max lowering (#185276)
Add cir.min operation for integer minimum computation. Refactor cir.max
lowering into a shared lowerMinMaxOp template reused by both ops.
[IR] Make BranchInst operand order consistent (#186609)
Ensure that successors are always reported in the same order in which
they are stored in the operand list.
Improved ISD::SRL handling in isKnownToBeAPowerOfTwo (#182562)
Fixes #181651
Added DemandedElts argument to isConstOrConstSplat and to
isKnowTobePowerOfTwo calls and OrZero || isKnownNeverZero(Val, Depth) is
checked before isKnowTobePowerOfTwo. Also added unit tests.
[X86] lowerV4F32Shuffle - prefer INSERTPS over SHUFPS when zeroing upper/lower v2f32 (#186612)
Followup to #186468 - use INSERTPS over SHUFPS if the implicit zeroing doesn't cross the the 64-bit halves
[LLVM] Change IRBuilder::CreateAggregateRet to accept an ArrayRef (#186605)
Change `IRBuilder::CreateAggregateRet()` to accept an `ArrayRef` instead
of a pointer and size, and extend IRBuilder unit test to exercise it.
[PhaseOrdering][X86] Add average round tests based off #128424 (#186615)
[CIR] Remove cir.unary(plus, ...) and emit nothing for unary plus (#185278)
Traditional codegen never emits any operation for unary plus — it just
visits the subexpression as a pure identity at the codegen level. Align
CIRGen with this behavior by removing Plus from UnaryOpKind entirely
and having VisitUnaryPlus directly visit the subexpression with the
appropriate promotion/demotion handling.
[VPlan] Add hasPredecessors and hasSuccessors to VPBlockBase (NFC).
Add/move helpers to VPBlockBase, and use in a few more places.
Split off from https://github.com/llvm/llvm-project/pull/156262 as
suggested.
[clang-format] Fix a crash on fuzzer-generated invalid C++ code (#186566)
Fixes #185421
[VPlan] Consolidate VPRegionBlock constructors (NFC).
Unify VPRegionBlock constructors into a single one, in preparation for
https://github.com/llvm/llvm-project/pull/156262. Split off as
suggested.
[X86] isSplatValueForTargetNode - test source value for vector uniform shift ops (#186619)
For old SSE style vector shifts, we just need to check the shifted value is a splat as the shift amount is uniform
Avoids an unnecessary variable shuffle in i512 ashr expansion
[IR] Implement successors as Use iterators (#186616)
This is possible since now all successor operands are stored
consecutively.
There is just one out-of-line function call instead of one call to
getSuccessor() per operand.
[VPlan] Remove special handling for canonical increment (NFC).
The canonical IV increment should be proven as uniform-across-VF-and-UF
by the existing logic. Remove explicit handling, in preparation for
https://github.com/llvm/llvm-project/pull/156262. Split off as
suggested.
[VPlan] Create zero resume value for CanIV directly (NFC).
The start value of the canonical IV is always 0. Assert and generate
zero VPValue manually in preparation for
https://github.com/llvm/llvm-project/pull/156262. Split off as
suggested.
[Docs] typo settting -> setting (#178665)
[libc++][Android] Update Compiler for Android CI (#186531)
Upgrade Android compiler from r563880 to r584948b because libc++ does
not support LLVM 20 anymore
[clang][Driver][Darwin] Optionally use xcselect to find macOS SDK (#119670)
This is a scaled down version of https://reviews.llvm.org/D136315.
The intent is largely the same as before[^1], but I've scaled down the
scope to try to avoid the issues that the previous patch caused:
- the changes are now opt-in based on enabling `CLANG_USE_XCSELECT`
- this only works when targeting macOS on a macOS host (this is the only
case supported by `libxcselect`[^2])
- calling `libxcselect` is done only when the target is `*-apple-macos*`
to avoid breaking many tests
Another reason to leave this as opt-in for now is that there are some
bugs in libxcselect that need fixing before it is safe to use by default
for all users. This has been reported to Apple as FB16081077.
[^1]: See also https://reviews.llvm.org/D109460 and #45225.
[^2]: https://developer.apple.com/documentation/xcselect?language=objc
[clang-tidy] Add redundant qualified alias check (#180404)
Introduce `readability-redundant-qualified-alias` to flag identity type
aliases that repeat a qualified name and suggest using-declarations when
safe. The check is conservative: it skips macros, elaborated keywords,
dependent types, and templates. `OnlyNamespaceScope` controls whether
local/class scopes are included (default `false`).
Depends on: #183940 #183941
[CIR] Split CIR_UnaryOp into individual operations (#185280)
Split the monolithic cir.unary operation (which dispatched on a
UnaryOpKind enum) into four separate operations: cir.inc, cir.dec,
cir.minus, and cir.not.
Changes:
- Add CIR_UnaryOpInterface with getInput()/getResult() methods
- Add CIR_UnaryOp and CIR_UnaryOpWithOverflowFlag base classes
- Define IncOp, DecOp, MinusOp, NotOp with per-op folds
- Add Involution trait to NotOp for not(not(x)) -> x folding
- Replace createUnaryOp() with createInc/Dec/Minus/Not builders
- Split LLVM lowering into four separate patterns
- Split LoweringPrepare complex-type handling per unary op
- Update CIRCanonicalize and CIRSimplify for new op types
- Update all codegen files to use bool params instead of UnaryOpKind
- Remove CIR_UnaryOpKind enum and old CIR_UnaryOp definition
Assembly format change:
cir.unary(inc, %x) nsw : !s32i, !s32i -> cir.inc nsw %x : !s32i
cir.unary(not, %x) : !u32i, !u32i -> cir.not %x : !u32i
[AggressiveInstCombine] Recognize table based log2 and replace with ctlz+sub. (#185160)
Recognize table based log2 implementations like
```
unsigned log2(unsigned v) {
static const unsigned char table[] = {
0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
return table[(unsigned)(v * 0x07C4ACDDU) >> 27];
}
```
and replaces with 31 - llvm.ctlz(v).
Similar for i64 log2. Other sizes can be supported with correct multiply
constant and table values, but I have not found examples yet.
This code is based on the existing tryToRecognizeTableBasedCttz. Like
that function, we support
any combination of multiply constant and table values that produce the
correct result.
It handles the same pattern as #177110, but does not match the outer
subtract from that patch. It is assumed that InstCombine or other
optimizations can combine (sub 31 (sub 31, cttz V)) later.
I have limited this to targets that have a fast ctlz. The backend does
not yet have a table based lowering for ctlz so this reduces the chance
of regressions.
[MLIR][Python] Refine the behavior of Python-defined dialect reloading (#186128)
This includes several changes:
- `Dialect.load(reload=False)` will fail if the dialect was already
loaded in a different context. To prevent the further program abortion.
- `Dialect.load(reload=True)` implies `replace=True` in
dialect/operation registering.
- `PyGlobals::registerDialectImpl` now has a parameter `replace`.
- `register_dialect` and `register_operation` is no longer exposed in
`mlir.dialects.ext`.
This should solve the registering problem found in writing transform
test cases by @rolfmorel.
[libc++][test] Use loop with compare_exchange_weak calls (#185953)
On AIX, this test sometimes fails with error `Assertion failed: y ==
true`. The test assumes `compare_exchange_weak` should succeed on a
single call, however according to the standard:
> A weak compare-and-exchange operation may fail spuriously. That is,
even when the contents of memory referred to by expected and ptr are
equal, it may return false and store back to expected the same memory
contents that were originally there.
This spurious failure enables implementation of compare-and-exchange on
a broader class of machines, e.g., load-locked store-conditional
machines. A consequence of spurious failure is that nearly all uses of
weak compare-and-exchange will be in a loop.
[atomics.ref.ops]/27
[orc-rt] Rename "ResourceManager" to "Service". NFCI. (#186639)
The name "Service" better reflects the general purpose of this class: It
provides *something* (often resource management) to the Session, is
owned by the Session, and receives notifications from the Session when
the controller detaches / is detached, and when the Session is shut
down.
An example of a non-resource-managing Service (to be added in an
upcoming patch) is a detach / shutdown notification service: Clients can
add this service to register arbitrary callbacks to be run on detach /
shutdown. The advantage of this over the current Session detach /
shutdown callback system is that clients can control both the order of
the callbacks, and their order relative to notification of other
services.
[orc-rt] Return ref from Session::addService, add createService. (#186640)
Session::addService now returns a reference to the added Service. This
allows clients to hold a reference for further direct interaction with
the Service object.
This commit also introduces a new Session::createService convenience
method that creates the service and returns a reference to it.
[mlir] Fix op comparisons in extensible dialects (#186637)
The extensible dialect system defined `compareProperties` to false
because it doesn't use properties. However, this should have been
`true`, as the empty properties are trivially always equal to
themselves. Doing otherwise means that no operations in extensible
dialects that aren't the exact same operation will ever compare equal
for the purposes of operations like CSE.
[clang-format] Upgrade ShortFunctionStyle to a struct (#134337)
The current clang-format configuration
option AllowShortFunctionsOnASingleLine uses a single enum
(ShortFunctionStyle) to control when short function definitions can be
merged onto a single line. This enum provides predefined combinations of
conditions
(e.g., None, Empty only, Inline only, Inline including Empty, All).
This approach has limitations:
1. **Lack of Granularity:** Users cannot specify arbitrary combinations
of conditions. For example, a user might want to allow merging
for both empty functions and short top-level functions, but not for
short functions defined within classes. This is not possible with the
current enum options except by choosing All, which might merge more than
desired.
2. **Inflexibility:** Adding new conditions for merging (e.g.,
distinguishing between member functions and constructors, handling
lambdas specifically) would require adding many new combined enum
values, leading to a combinatorial explosion and making the
configuration complex.
3. **Implicit Behavior:** Some options imply others
(e.g., Inline implies Empty), which might not always be intuitive or
desired.
The goal is to replace this single-choice enum with a more flexible
mechanism allowing users to specify a set of conditions that must be met
for a short function to be merged onto a single line.
---------
Co-authored-by: owenca <owenpiano@gmail.com>
[clang][bytecode] Remove unused members from `EvalEmitter` (#186601)
Remove the DenseMap handling lambda paramter mappings from
`EvalEmitter`. This was always unused. Remove it and use `if constexpr`
to keep things compiling.
[CMake] Disable PCH reuse for plugins in non-PIC builds (#186643)
Plugins are always PIC and therefore cannot reuse non-PIC PCH.
[Analysis][NFC] Move BranchProbabilityInfo constr to cpp (#186648)
The implementation details of the analysis are irrelevant for users,
therefore move these to the .cpp file.
[clang-format] Add option AllowShortRecordOnASingleLine (#154580)
This patch supersedes PR #151970 by adding the option
``AllowShortRecordOnASingleLine`` that allows the following formatting:
```c++
struct foo {};
struct bar { int i; };
struct baz
{
int i;
int j;
int k;
};
```
---------
Co-authored-by: owenca <owenpiano@gmail.com>
[clang][ssaf][NFC] Prefix ssaf-{linker,format} dirs with 'clang-' (#186610)
Addresses:
https://github.com/llvm/llvm-project/pull/185631#issuecomment-4054586633
[X86] Add missing VPSRAQ broadcast-from-mem patterns for non-VLX targets (#186654)
[clang-tidy] Adds do-while support to performance-inefficient-string-concatenation (#186607)
Closes #186362
---------
Co-authored-by: Victor Chernyakin <chernyakin.victor.j@outlook.com>
Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com>
[X86] known-never-zero.ll - add vector test coverage for #186335 (#186660)
Support float8_e3m4 and float8_e4m3 in np_to_memref (#186453)
This patch adds support for `float8_e3m4` and `float8_e4m3` in
`np_to_memref.py` by adding the appropriate ctypes structures
[Transforms/Utils][NFC] Replace SmallPtrSet with vector (#186664)
Typically most blocks in a function are reachable, so use a vector
indexed by block number instead of a SmallPtrSet.
[SimplifyCFG][NFC] Renumber blocks when changing func (#186666)
Keep numbering dense when changing the function. SimplifyCFG is a good
candidate, because it is likely to remove blocks and preserves few
analyses.
[CFG][InstCombine][NFC] Use block numbers when finding backedges (#186668)
The functions traverse all basic blocks, so SmallPtrSets use a single
vector indexed by block number.
[CodeGenPrepare][NFC] Get BPI/BFI from pass/analysis manager (#186651)
BranchProbabilityInfo will compute it's own dominator tree and
post-dominator tree if none is specified; avoid this by using the
analysis manager/pass manager to get the analysis, which will reuse the
previously computed DomTree.
[X86] combineConcatVectorOps - concat(vtruncs(x),vtruncs(y)) -> packss(shuffle(x,y),shuffle(x,y)) (#186678)
Although at worst this isn't a reduction in instruction count, the shuffle/packss sequence is much easier for further folds / shuffle combining
Revert "[CI] Try lowering max parallel link jobs on Windows (#185255)"
This reverts commit af22b50fac2311ff3f859e4e8bdec552c7aa8d5a.
This seems to have had no noticeable effect on the frequency of failures
so likely was not the issue.
Revert "Support float8_e3m4 and float8_e4m3 in np_to_memref (#186453)" (#186677)
This reverts commit 57427f84fe5fdda71aef4be257ed28d7b4f55d05.
For some reason mlir-nvidia CI is failing to import `float8_e3m4` from
`ml_dtypes`. See
https://lab.llvm.org/buildbot/#/builders/138/builds/27095.
[X86] combineConcatVectorOps - concat(vtruncus(smax(x,0)),vtruncus(smax(y,0))) -> packus(shuffle(x,y),shuffle(x,y)) (#186681)
Followup to vtruncs/packss handling
Update GitHub Artifact Actions (major) (#184052)
This PR contains the following updates:
| Package | Type | Update | Change |
|---|---|---|---|
|
[actions/download-artifact](https://redirect.github.com/actions/download-artifact)
| action | major | `v7.0.0` → `v8.0.1` |
|
[actions/upload-artifact](https://redirect.github.com/actions/upload-artifact)
| action | major | `v6.0.0` → `v7.0.0` |
|
[actions/upload-artifact](https://redirect.github.com/actions/upload-artifact)
| action | major | `6.0.0` → `7.0.0` |
[BPF] Use ".L" local prefix label for basic blocks (#95103)
Previously, PrivateLabelPrefix was default-initialized to "L", so basic
block labels were added to the symbol table. This seems like an
oversight, so use ".L" for all private labels.
[clang-tidy][NFC] Use universal type_traits mock (#186652)
[Utils] Format git-llvm-push
Use single quotes for string arguments inside f-strings or otherwise the
version of black that we use fails to parse. Also reformat the file
given that hasn't been working for a while (wholesale or incrementally)
to the above issue.
[clang][doc] Improve error handling for `LibTooling` example code avoiding core dump (#98129)
Resolves #97983
[Clang][Docs] Clarify [[unlikely]] example in compound statement (#186590)
The first code example in the "confusing standard behavior" section
had a comment claiming `[[unlikely]]` makes the branch unlikely,
contradicting a later example showing the same placement being ignored.
Rewords the comment to clarify this is the C++ Standard's
recommendation that Clang does not follow, since the attribute is not on
the substatement.
Continues the work from #126372.
Fixes #126362.
[libc][Github] Bump clang in libc container to v23 (#186697)
Back to HEAD now that apt.llvm.org is working again for ToT.
[gn] port 629edaf67844c01db37 (CLANG_USE_XCSELECT)
[gn] port f002fc0ee8734283
[IR] Don't allow successors() over block without terminators (#186646)
There's no point constructing a dominator tree or similar on
known-broken IR. Generally, functions should be able to assume that IR
is valid (i.e., passes the verifier). Users of this "feature" were:
- Verifier, fixed by verifying existence of terminators first.
- FuzzMutate, worked around by temporarily inserting terminators.
- OpenMP to run analyses while building the IR, worked around by
temporarily inserting terminators.
- Polly to work with an empty dominator tree, fixed by temporarily
adding an unreachable inst.
- MergeBlockIntoPredecessor, inadvertently, fixed by adding terminator
before updating MemorySSA.
- Some sloppily written unit tests.
[IR] Add initial support for the byte type (#178666)
Following the [byte type RFC](https://discourse.llvm.org/t/rfc-add-a-new-byte-type-to-llvm-ir/89522)
and the discussions within the [LLVM IR Formal Specification WG](https://discourse.llvm.org/t/rfc-forming-a-working-group-on-formal-specification-for-llvm/89056), this PR introduces initial support for the byte type in LLVM. This PR:
- Adds the byte type to LLVM's type system
- Extends the `bitcast` instruction to accept the byte operands
- Adds parsing tests for all new functionality
- Fixes failing regressions tests (IR2Vec and IRNormalizer)
---------
Co-authored-by: George Mitenkov <georgemitenk0v@gmail.com>
[orc-rt] Don't return Error in Service::OnComplete. (#186708)
The Session can't do anything useful with these errors, it can only
report them. It's cleaner if the Service objects just report the error
directly.
[clang-tidy][NFC] Use universal memory mock for smart ptrs (#186649)
[orc-rt] Fix unittests after 53a1e056f38. (#186711)
Updates unittests to reflect Service interface changes.
Revert "[IR] Add initial support for the byte type" (#186713)
Reverts llvm/llvm-project#178666 to unblock CI.
`CodeGen/X86/byte-constants.ll` is at fault.
Will look into it and hopefully fix it by tomorrow.
[NFC] Delete `MCPseudoProbeDecoder`'s move constructor (#186698)
`MCPseudoProbeDecoder` cannot be copeied/moved due to its address
dependence on the DummyInlineRoot member address. Explicitly delete the move constructor.
[RISCV] Add `sifive-x160` and `sifive-x180` processor definitions (#186264)
This PR adds new processor definitions for two SiFive cores:
- X160
(https://www.sifive.com/document-file/sifive-intelligence-x160-gen2-product-brief):
A RV32 core with Zve32f
- X180
(https://www.sifive.com/document-file/sifive-intelligence-x180-gen2-product-brief):
A RVV-capable RV64 core
Both of them have VLEN=128.
Scheduling model supports will be added in follow-up patches.
[orc-rt] Add a simple iterator_range class. (#186720)
This will be used to simplify operations on iterator ranges in the ORC
runtime.
[LoongArch] Remove unreachable Value check in fixupLeb128 (#186297)
Value is guaranteed to be zero after the loop:
for (I = 0; Value; ++I, Value >>= 7)
Therefore the subsequent `if (Value)` condition is always false.
Remove the unreachable code. Reported by PVS-Studio.
Fixed: #170122
[lld][ELF] Fix crash when relaxation pass encounters synthetic sections
In LoongArch and RISC-V, the relaxation pass iterates over input sections
within executable output sections. When a linker script places a synthetic
section (e.g., .got) into such an output section, the linker would crash
because synthetic sections do not have the relaxAux field initialized.
The relaxAux data structure is only allocated for non-synthetic sections
in initSymbolAnchors. This patch adds the necessary null checks in the
relaxation loops (relaxOnce and finalizeRelax) to skip sections that
do not require relaxation.
A null check is also added to elf::initSymbolAnchors to ensure the
subsequent sorting of anchors is safe.
Fixes: #184757
Reviewers: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/184758
[clang] Skip dllexport of inherited constructors with unsatisfied constraints (#186497)
When a class is marked `__declspec(dllexport)`, Clang eagerly creates
inherited constructors via `findInheritingConstructor` and propagates
the dllexport attribute to all members. This bypasses overload
resolution, which would normally filter out constructors whose requires
clause is not satisfied. As a result, Clang attempted to instantiate
constructor bodies that should never be available, causing spurious
compilation errors.
Add constraint satisfaction checks in `checkClassLevelDLLAttribute` to
match MSVC behavior:
1. Before eagerly creating inherited constructors, verify that the base
constructor's `requires` clause is satisfied. Skip creation otherwise.
2. Before applying dllexport to non-inherited methods of class template
specializations, verify constraint satisfaction. This handles the case
where `dllexport` propagates to a base template specialization whose own
members have unsatisfied constraints.
Inherited constructors skip the second check since their constraints
were already verified at creation time.
Fixes #185924
Followup to https://github.com/llvm/llvm-project/pull/182706
Assisted by: Cursor // Claude Opus 4.6
[orc-rt] Add LockedAccess utility. (#186737)
LockedAccess provides pointer-like access to a value while holding a
lock. All accessors are rvalue-ref-qualified, restricting usage to
temporaries to prevent accidental lock lifetime extension. A with_ref
method is provided for multi-statement critical sections.
[CIR] Add Pure trait to IsFPClassOp (#186625)
IsFPClassOp is a pure classification check on a floating-point value
with no memory effects.
[clangd] Report reference to UsingType's target decl at the correct location (#186310)
Fixes https://github.com/clangd/clangd/issues/2617
[SelectionDAG] Add CTTZ_ELTS[_ZERO_POISON] nodes. NFCI (#185600)
Currently llvm.experimental.cttz.elts are directly lowered from the
intrinsic.
If the type isn't legal then the target tells SelectionDAGBuilder to
expand it into a reduction, but this means we can't split the operation.
E.g. it's possible to split a cttz.elts nxv32i1 into two nxv16i1,
instead of expanding it into a nxv32i64 reduction.
vp.cttz.elts can be split because it has a dedicated SelectionDAG node.
This adds CTTZ_ELTS and CTTZ_ELTS[_ZERO_POISON] nodes and just enough
legalization to get tests passing. A follow up patch will add splitting
and move the expansion into LegalizeDAG.
[mlir][linalg] Use inferConvolutionDims for generic convolution downscaling (#180586)
The goal of this PR is to implement a generic, structure-aware
convolution downscaling transformation that works for any
convolution-like operation regardless of its specific layout or naming,
rather than relying on pattern-matching against specific named
operations.
Each pattern we currently have, have hardcoded dimension indices
specific to its layout (e.g., NHWC vs NCHW).
This approach :-
1. Requires maintaining many similar patterns.
2. Is brittle when new layouts are introduced.
3. Cannot handle batchless versions of the conv variants.
This PR thus creates a single downscaleSizeOneWindowedConvolution
function that uses `inferConvolutionDims` to semantically understand the
convolution structure (batch dims, output image dims, filter loop dims,
etc.) rather than hardcoding indices.
It works with any layout - NHWC, NCHW, or any other - because it reasons
about the meaning of dimensions, not their positions.
If the input to the downscaling pattern is a named op -> the output will
be a named op. Else it'd be a generic op input/output.
And for this reason we now remove the second RUN line as the infra tests
both named as well as generic ops.
Signed-off-by: Abhishek Varma <abhvarma@amd.com>
[clang-tidy] Fix an edge case in readability-implicit-bool-conversion (#186234)
Fix a FP for condition expressions wrapped by `ExprWithCleanups`.
Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com>
Co-authored-by: Zeyi Xu <zeyi2@nekoarch.cc>
[X86][APX] Combine MOVABS+JMP to JMPABS when in no-PIC large code model (#186402)
[CodeGen] Call getMCPU once instead of commonly twice (NFC) (#186581)
[ARM] Try to lower sign bit SELECT_CC to shift (#186349)
Lower a `x < 0 ? 1 : 0` style SELECT_CC to `x>>(bw-1)`. This will become
more important with an upcoming change, but also appears to be somewhat
useful by itself.
[C++20] [Modules] Don't add discardable variables to module initializers (#186752)
Close https://github.com/llvm/llvm-project/issues/170099
The root cause of the problem is, we shouldn't add the inline variable
(which is discardable in linker's point of view) to the module's
initializers.
I verified with GCC's generated code to make the behavior consistent.
This is also a small optimization by the way.
[LV] Add more tests for blend masks. NFC (#186751)
To be used in #184838
[LangRef] Fix typo in signatures for rounding intrinsics (#186709)
Fixes #186536
[lldb-dap] Mark return value as readonly (#186329)
Marked return value as readonly to give VS Code a hint that this
variable doesn't support `setVariable` request.
[orc-rt] Add Controller Interface (CI) symbol table to Session. (#186747)
The Controller Interface is the extended set of symbols (mostly wrapper
functions) that the controller can call prior to loading any JIT'd code.
It is expected that it will be used to inspect the process and create /
configure services to enable JITing.
[AArch64] Add extra test coverage to legalize-shuffle-1x.ll. NFC
[AMDGPU] Initialize more fields in the SIInsertWaitcnts constructor. NFC. (#186394)
ST, TII, TRI and MRI can all be initialized in the constructor and hence
be references instead of pointers.
[AVR] Optimize expansion of pseudo instruction SPWRITE for no SPH devices (#152905)
fixes https://github.com/llvm/llvm-project/issues/148560
[AMDGPU] Simplify state clearing in SIInsertWaitcnts. NFC. (#186399)
There is no need to clear state at the start or end of the run method,
because a fresh instance of SIInsertWaitcnts is constructed for each run
on a MachineFunction.
[flang][NFC] Converted five tests from old lowering to new lowering (part 31) (#186299)
Tests converted from test/Lower/Intrinsics: iall.f90, iand.f90,
iany.f90, ibclr.f90, ibits.f90
[libc++] Avoid including <cmath> in <format> (#186332)
This reduces the time to parse `<format>` a bit.
[X86] Blocklist instructions that are unsafe for masked-load folding. (#178888)
This PR blocklist instructions that are unsafe for masked-load folding.
Folding with the same mask is only safe if every active destination
element reads only from source elements that are also active under the
same mask. These instructions perform element rearrangement or
broadcasting, which may cause active destination elements to read from
masked-off source elements.
VPERMILPD and VPERMILPS are safe only in the rrk form, the rik form
needs to be blocklisted. In the rrk form, the masked source operand is a
control mask, while in the rik form the masked source operand is the
data/value. This is also why VPSHUFB is safe to fold, while other
shuffles such as VSHUFPS are not.
Examples:
```
EVEX.128.66.0F.WIG 67 /r VPACKUSWB xmm1{k1}{z}, xmm2, xmm3/m128
A: 00010203 7F000001 80000002 DEADBEEF
E : 00000000 00000001 00000002 00000003
D: 11111111 22222222 33333333 44444444
k = 0x0400
Masked_e = 00000000 00000000 00000000 00000000 (vmovdqu8{k}{z} Masked_e E)
res1 = 00000000 00000000 00010000 00000000 (VPACKUSWB D{k}{z}, A, E)
res2 = 00000000 00000000 00000000 00000000 (VPACKUSWB D{k}{z}, A, Masked_e)
EVEX.128.66.0F38.W0 C4 /r VPCONFLICTD xmm1 {k1}{z}, xmm2/m128/m32bcst
A: DAA66D2B FFFFFFFC FFFFFFFC D9A0643C
E : 7DDF743F 00000000 5FD99E73 4ED634C9
D: 2629AB38 9E37782F 67BB800F AD66764A
k = 0x0002
Masked_e = (vmovdqu32 {k}{z} Masked_e E)
res1 = 00000000 00000000 00000000 00000000 (VPCONFLICTD D{k}{z}, E)
res2 = 00000000 00000001 00000000 00000000 (VPCONFLICTD D{k}{z}, Masked_e)
EVEX.128.66.0F38.W1 8D /r VPERMW xmm1 {k1}{z}, xmm2, xmm3/m128
A: 00010203 7F000001 80000002 DEADBEEF
E : 00000000 00000001 00000002 00000003
D: 11111111 22222222 33333333 44444444
k = 0x0010
Masked_e = 00000000 00000000 00000002 00000000 (vmovdqu16 {k}{z} Masked_e E)
res1 = 00000000 00000000 00000001 00000000 (vpermw D{k}{z}, A, E)
res2 = 00000000 00000000 00000000 00000000 (vpermw D{k}{z}, A, Masked_e)
EVEX.128.66.0F38.W0 78 /r VPBROADCASTB xmm1{k1}{z}, xmm2/m8
E : 7F4A7C15 6E490933 5D4C9659 4C433CE3
D: F63F9D36 97F6E2B2 9432E8E6 FAEE7A3E
k = 0x0002
Masked_e = 00007C00 00000000 00000000 00000000 (vmovdqu8{k}{z} Masked_e E)
res = 00001500 00000000 00000000 00000000 (vpbroadcastb D{k}{z}, E)
res = 00000000 00000000 00000000 00000000 (vpbroadcastb D{k}{z}, Masked_e)
```
Baseline: https://github.com/llvm/llvm-project/pull/178411
[flang][OpenMP] Implement nest depth calculation in LoopSequence (#186477)
Calculate two depths, a semantic one and a perfect one. The former is
the depth of a loop nest taking into account any loop- or
sequence-transforming OpenMP constructs. The latter is the maximum level
to which the semantic nest is a perfect nest.
Issue: https://github.com/llvm/llvm-project/issues/185287
Reinstate PR185298 after a fix has been merged in PR186416. Includes a
testcase that triggered failures before.
[clang][bytecode] Remove FunctionPointer class (#186757)
It's been mostly living inside `Pointer` for a long time now, so remove
the leftovers.
[SPIR-V] Address comments on SPV_INTEL_masked_gather_scatter extension implementation (#186336)
Address comments left after merge of #185418
[libc] Fix build failures in fuzzing tests (#185017)
The tests:
- __support/freelist_heap_fuzz.cpp
- fuzzing/string/strlen_fuzz.cpp
had build failures for different reasons. This patch fixes these
failures.
freelist_heap_fuzz.cpp had this error:
```
llvm-project/libc/fuzzing/__support/freelist_heap_fuzz.cpp:150:26: error: use of undeclared identifier 'Block'; did you mean '__llvm_libc_23_0_0_git::Block'?
150 | size_t alignment = Block::MIN_ALIGN;
| ^~~~~
| __llvm_libc_23_0_0_git::Block
```
The issue stems from the fact that Block was not available in scope. It
needs to be referenced via LIBC_NAMESPACE.
strlen_fuzz.cpp had this error:
```
In file included from Workspace/llvm-project/libc/fuzzing/string/strlen_fuzz.cpp:14:
In file included from /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/cstdint:38:
In file included from /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/x86_64-linux-gnu/c++/13/bits/c++config.h:679:
/usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/x86_64-linux-gnu/c++/13/bits/os_defines.h:44:5: error: function-like macro '__GLIBC_PREREQ' is not defined
44 | #if __GLIBC_PREREQ(2,15) && defined(_GNU_SOURCE)
```
This issue is more cryptic to me, but I managed to fix it by changing
the includes from cstdint and cstring to stdint.h and string.h.
[LifetimeSafety] Extract Sema helper implementation to separate header (#186492)
Improves code organization by separating lifetime safety Sema-specific
functionality into its own header file.
[clang][AArch64] Update label in test (nfc) (#186759)
[clang-tidy] Fix performance-use-std-move when moving a forward decl (#186704)
This fixes running clang-tidy on top-of-tree with that check on.
[clang][bytecode][NFC] Pre-commit a test case (#186773)
Make sure we get the `expand()` during `computeOffsetForComparison()`
right.
[Analysis][NFC] Use block numbers for BranchProbabilityInfo (#186658)
Instead of a hash map mapping pairs of blocks and successor index to the
probability, store the probabilities as flat array and start indices
into this array in a per-block information vector.
Also drop value handles: no stored pointers => no stale pointers. If a
block is removed, the block number is not reused unless the function is
renumbered, and BPI doesn't support renumbering.
[WebAssembly] Lower wide vector shifts by constant to extmul pairs (#184007)
Wide vector multiplications by power-of-2 constants were
canonicalized to v8i32 shl nodes. Generic legalizers then split these
into separate 128-bit extend and shift operations, bypassing
WebAssembly's native extended multiplication patterns.
Before:
mul v8i32:t1, <4096, ...>
=> shl v8i32:t1, <12, ...>
=> split into independent 128-bit extend + shift sequences
WebAssembly SIMD has no native wide vector shifts, but it does
support 128-bit extended multiplications. Lowering these nodes
directly to extmul_low/extmul_high pairs keeps them in native 128-bit
form and improves DAG matching.
After:
mul v8i32:t1, <4096, ...>
=> concat_vectors (extmul_low t1, c), (extmul_high t1, c)
This preserves the original vector width while utilizing the native
128-bit SIMD pipeline.
Fixed: https://github.com/llvm/llvm-project/issues/179143
[LSR] Remove unnecessary WidestFixupType (NFC) (#185013)
The purpose of WidestFixupType is to prevent FindUseWithSimilarFormula
from matching a formula with different widest fixup type, but this never
happens:
* FindUseWithSimilarFormula is only called by
NarrowSearchSpaceByCollapsingUnrolledCode
* That function only considers Address and ICmpZero kinds, as they're
the only ones that allow a nonzero BaseOffset
* In an Address use all fixups have pointer type
* FindUseWithSimilarFormula already excludes ICmpZero uses
[AMDGPU] Make WaitcntBrackets::Limits a reference. NFC. (#186782)
Reland [VPlan] Extend interleave-group-narrowing to WidenCast (#186454)
The patch was intially landed as bd5f9384, but then reverted due to an
underlying issue in narrowInterleaveGroups, described in #185860. The
issue has since been fixed. The reland is simply a conflict-resolved
version of the original patch, which includes an additonal test update.
WidenCast is very similar to Widen recipes.
Fixes #128062.
[IR] Drop BasicBlockEdge::isSingleEdge (#186767)
This was only called on CondBr instructions, where it is always faster
to access the successors directly than to use successors().
Multi-edges don't dominate anything, so this rare case is often already
handled by dominates().
There is also a very small (hardly measurable) performance
improvement here (it did show up in profiles at 0.03% or so).
[C2y] Update the C Status Page from the recent meetings (#186487)
The Feb and Mar 2026 virtual meetings are now concluded, these are the
adopted papers which could potentially impact the compiler.
[libclc] Add generic clc_mem_fence instruction (#185889)
Summary:
This can be made generic, which works as expected on NVPTX and SPIR-V.
We do not replace this for AMDGPU because the dedicated built-in has an
extra argument that controls whether or not local memory or global
memory will be invalidated. It would be correct to use this generic
operation there, but we'd lose that minor optimization so we likely
should not regress.
[NFC][analyzer] Eliminate NodeBuilder::getContext() (#186201)
This is a step towards the removal of the type `NodeBuilderContext`.
The few remaining locations that used `NodeBuilder::getContext()` were
changed to use the methods `getCurrBlock()` and `getNumVisitedCurrent()`
of `ExprEngine`.
The new code is equivalent to the old one because the `NodeBuilder`s
were constructed with `ExprEngine::currBldrCtx` as their context, which
is currently the "backend" behind `getCurrBlock()` and
`getNumVisitedCurrent()` -- but these methods will remain valid after
the removal of `NodeBuilderContext` and `currBldrCtx`.
[libc][Github] Bump libc-fullbuild-tests.yml to clang 23 (#186699)
Do this now that it is available in the container.
[LifetimeSafety] Add user documentation (#183058)
[LLVM][CodeGen][SVE] insert_subvector(undef, splat(C), 0) -> splat(C). (#186090)
When converting a fixed-length constant splats to scalable vector we can
instead regenerate the splat using the target type.
[ADT] Add `Repeated<T>` for memory-efficient repeated-value ranges (#186721)
Introduce a lightweight range representing N copies of the same value
without materializing a dynamic array. The range owns this value.
I plan to use it with MLIR APIs that often end up requiring N copies of
the same thing. Currently, we use `SmallVector<T>(N, Val)` for these,
which is wasteful.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
[NFC][analyzer] Refactor ExprEngine::processCallExit (#186182)
This commit converts `ExprEngine::processCallExit` to the new paradigm
introduced in 1c424bfb03d6dd4b994a0d549e1f3e23852f1e16 where the current
`LocationContext` and `Block` is populated near the beginning of the
`dispatchWorkItem` call (= elementary analysis step) and remains
available during the whole step.
Unfortunately the first half of the `CallExit` procedure (`removeDead`)
happens within the callee context, while the second half (`PostCall` and
similar callbacks) happen in the caller context -- so I need to change
the current `LocationContext` and `Block` at the middle of this big
method.
This means that I need to discard my invariant that
`setCurrLocationContextAndBlock` is only called once per each
`dispatchWorkItem`; but I think this exceptional case (first half in
callee, second half in caller) is still clear enough.
In addition to this main goal, I perform many small changes to clarify
and modernize the code of this old method.
[IR][NFC] Hot-cold splitting in PatternMatch (#186777)
ConstantAggregates are rare, therefore split that check into a separate
function so that the fast path can be inlined.
Likewise for vectors, which occur much less frequently than scalar
values.
[AArch64] Add partial reduce patterns for new sve dot variants (#184649)
This patch enables generation of new dot instruction added in 2025 arm
extension from partial reduce nodes.
Update docker/login-action action to v4 (#186719)
This PR contains the following updates:
| Package | Type | Update | Change |
|---|---|---|---|
| [docker/login-action](https://redirect.github.com/docker/login-action)
| action | major | `v3.6.0` → `v4.0.0` |
AMDGPU: Don't limit VGPR usage based on occupancy in dVGPR mode (#185981)
The maximum VGPR usage of a shader is limited based on the target
occupancy,
ensuring that the targeted number of waves actually fit onto a CU/WGP.
However, in dynamic VGPR mode, we should not do that, because VGPRs are
allocated
dynamically at runtime, and there are no static constraints based on
occupancy.
Fix that in this patch.
Also fixup the getMinNumVGPRs helper to behave consistently by always
returning
zero in dVGPR mode.
This also fixes a problem where AMDGPUAsmPrinter bumps the VGPR usage to
at least
the result of getMinNumVGPRs, per my understanding in order to avoid an
occupancy
that is higher than the occupancy target. That was causing incorrect
(too high)
VGPR usages in dVGPR mode with medium-sized workgroups (say 768).
[VPlan] Assert CanIV is the first header phi, drop begin (NFC).
Split off as suggested in https://github.com/llvm/llvm-project/pull/156262/.
[DWARFVerifier] Fix infinite loop in verifyDebugInfoCallSite (#186413)
When attempting to find the callsite for a DwarfDie to see if it was
valid or not, there was a while loop that incorrectly attempted to walk
up the Die parent hierarch. It set `curr` to parent, but then `curr` was
set to same original parent instead of curr.getParent(). This caused
infinite recursion on validation of some kernel binaries by
llvm-dwarfdump where DW_TAG_call_site was nested inside a
DW_TAG_lexical_block (or any non-subprogram, non-inlined_subroutine
tag).
Fix by changing Die.getParent() to Curr.getParent() so the loop
correctly walks up the DIE tree.
Add a new test that validates this scenario. Without this change, that
test hangs rather than succeeding.
[IR][NFC] Inline CmpInst::isSigned/isUnsigned (#186791)
These are small helper functions that are called somewhat often, so
inlining is beneficial.
A very minor improvement. Nonetheless, these two functions are
called somewhat regularly and compile to three instructions each,
so it is always beneficial to inline them.
[Utils] Modernize type annotations in git-llvm-push
Import annotations from __future__ so we can start using more modern
annotations now rather than once we move to Python 3.10 while still
preserving Python 3.8 compatibility. Also fix a couple typing issues
while here.
Reviewers: ilovepi, petrhosek
Pull Request: https://github.com/llvm/llvm-project/pull/186690
[CodeGen] Fix C++ global dtor for non-zero program AS targets (#186484)
In codegen for C++ global destructors, we pass a pointer to the
destructor to be called at program exit as the first arg to the
`__cxa_atexit` function.
If the target's default program AS and default AS are not equal, we need
to emit an addrspacecast from the program AS to the generic AS (which is
used as the argument type for the first arg of `__cxa_atexit`) in the
function call.
---------
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
[NFC][LLVM] Fix indentation issue in AArch64ExpandPseudo::expandMI (#186375)
[lldb][NativePDB] Compile `vbases.test` without default libraries (#186510)
`--target=x86_64-windows-msvc`. This will cause the final executable to
be linked to `libcmt.lib`. That doesn't work on ARM, so this PR changes
the command line to link without the default libraries. They're not
needed if we disable `/GS` (buffer security check) like in other tests.
We use `%clang_cl` over `%build` to be able to compile with DWARF as
well.
[lit] Stop holding subprocess objects open in TimeoutHelper (#186712)
Tweak TestRunner's TimeoutHelper storage to hold only PIDs rather
than the whole process object. Holding the object causes many pipes to
stay open, when all we need is the pid.
Addresses #185941
[SPIR-V] Fix llvm.spv.gep return type for vector-indexed GEPs (#185931)
The `int_spv_gep` intrinsic was defined with `llvm_anyptr_ty` which
forced it to return a scalar pointer. Change the return type to
`llvm_any_ty` to allow the intrinsic to match the actual result type of
the original GEP, whether scalar or vector
[Flang][OpenMP] Provide option to use heap allocation for private adjustable arrays (#186795)
The size of adjustable Fortran arrays is not known at compilation time.
Using limited GPU stack memory may cause hard-to-debug errors. On the
other hand, switching to heap memory allocation may lead to missed
optimization opportunities and significantly increased kernel execution
time.
Adding the option `-mmlir --enable-gpu-heap-alloc` allows the user to
generate valid code for adjustable Fortran arrays. The flag is off by
default, so there is no efficiency penalty for code that does not use
adjustable arrays.
[libc] Fix llvm-gpu-loader passing uninitialized device memory (#186804)
Summary:
The return value was not zeroed, this was accidentally dropped when we
did the port and it's zero "almost always" so I didn't notice. Hopefully
this makes the test suite no longer flaky.
[mlir][linalg][elementwise] Fold broadcast into new elementwise (#167626)
Fold broadcast into new elementwise Op which has affine-map attached.
Merging on behalf of @someoneinjd
[DomTree] Assert non-null block for pre-dom tree (#186790)
In a pre-dominator tree, blocks should never be null.
[mlir][llvmir][OpenMP] Translate affinity clause in task construct to llvmir (#182223)
Translate affinity entries to LLVMIR by passing affinity information to
createTask (__kmpc_omp_reg_task_with_affinity is created inside
PostOutlineCB).
3/3 in stack for implementing affinity clause with iterator modifier
1/3 #182218
2/3 #182222
3/3 #182223
[lldb][Module] Remove feedback_stream parameter from LoadScriptingResources (#186787)
I'm in the process of making `LoadScriptingResources` interactively ask
a user whether to load a script. I'd like to turn the existing warning
into the prompt. The simplest way to achieve this is to not print into a
`feedback_stream` parameter, and instead create a prompt right there.
This patch removes the `feedback_stream` parameter and emits a
`ReportWarning` instead. If we get around to adding the prompt instead
of the warning, those changes will be simpler to review. But even if we
don't end up replacing the warning with a prompt, moving away from
output parameters and towards more structured error reporting is a
nice-to-have (e.g., the `warning` prefix is now colored, IDEs have more
flexibility on how to present the warning, etc.).
For a command-line user nothing should change with this patch (apart
from `warning:` being highlighted).
[PowerPC] Use lxvp/stxvp for mcpu=future v256i1 types (#184447)
For `-mcpu=future`, add patterns to use paired vector instructions
(lxvp/lxvpx/stxvp/stxvpx)
for v256i1 operations instead of splitting into two separate vector
operations.
Assistend by AI.
[VPlan] Simplify&clarify skipping VPValues in calculateRegisterUse (NFC)
Split off as suggested in https://github.com/llvm/llvm-project/pull/156262/.
This refactors the code to clarify comments and code, in preparation for #156262.
[OpenMP][AMDGPU] Enable omptest build (#161649)
This enables building the omptest library across the AMD buildbots that
rely on this CMake cache.
[flang][NFC] Converted five tests from old lowering to new lowering (part 32) (#186730)
Tests converted from test/Lower/Intrinsics: ibset.f90, ichar.f90,
ieee_class.f90, ieee_copy_sign.f90, ieee_is_finite.f90
[SLP]Fix legality checks for bswap-based transformations
Fix the checks for the non-power-of-2 base bswaps by checking the
power-of-2 of the source type, not the target scalar type. Plus, add
cost estimation for zext, if the source type does not match the scalar type.
Fixes https://github.com/llvm/llvm-project/pull/184018#issuecomment-4053477562
[VPlan] Check isa<VPRecipeValue> directly, remove unused variable (NFC).
[MLIR][Presburger] Add support for Smith normal form (#185328)
FPL already has support for computing Hermite normal form for integer
matrices. Here we add support to computing Smith normal form.
This is a preparation for Barvinok's algorithm. Given a polyhedron $P =
\{ x | Ax + b = 0, Cx + d \leq 0 \}$, we must find a particular solution
$x_0$ of $Ax + b = 0$ in order to project lower-dimensional polyhedra
into full-dimensional ones. This requires the Smith normal form of the
integer matrix $A$.
The implementation here follows the algorithm in
[wikipedia](https://en.wikipedia.org/wiki/Smith_normal_form#Algorithm).
AMDGPU/GlobalISel: RegBankLegalize rules for s_barrier/wave_barrier (#186512)
[X86] Move getMaskNode to avoid unnecessary forward declarations. (#186815)
I've also improved the assertions on the source / bool mask types to
catch bad use cases.
Cleanup pre-work to allow the i512 codegen to eventually use getMaskNode
instead of manual bool mask creations
Revert "[SLP]Fix legality checks for bswap-based transformations"
This reverts commit 2d4daea3b66469420fc164e76c15558b34e44c75 to fix
a buildbot https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flab.llvm.org%2Fbuildbot%2F%23%2Fbuilders%2F164%2Fbuilds%2F19737&data=05%7C02%7C%7C672461616e0d4b66614208de8374a0ff%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C639092734113272365%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=%2B23yMlvZzYt7bB2gM6MmcEwPkIKQogXjcKYIZ%2Bz79zQ%3D&reserved=0
[RISCV] Fold waddau/wsubau to waddu/wsubu when possible (#186635)
If the wide input is zero extended and only one narrow input is
used, we can fold to waddu/wsubu.
[WebAssembly] Support acquire-release atomics in CodeGen (#184900)
Set the correct memory ordering for relaxed atomics after ISel. This
allows
SelectionDAG to keep the simple generic selection for target-independent
AtomicLoad nodes, but keeps the ordering immediate correct in the MIR.
Notably, the MachineMemOperand still has the original memory ordering
and MIR passes would use that rather than the ordering immedate to make
their code motion decisions (if we had any for Wasm, which we don't).
Revert "[DomTree] Assert non-null block for pre-dom tree" (#186831)
Reverts llvm/llvm-project#186790
Breaks buildbots, there are more SLPVectorizer problems.
https://lab.llvm.org/buildbot/#/builders/52/builds/15810
[CIR][AArch64] Lower BF16 vduph lane builtins (#185852)
Part of #185382.
Lower `__builtin_neon_vduph_lane_bf16` and
`__builtin_neon_vduph_laneq_bf16` in ClangIR to `cir.vec.extract`,
and add dedicated AArch64 Neon BF16 tests.
This is my first LLVM PR, so I'd really appreciate any suggestions on
the implementation, test structure, or general LLVM contribution style.
[flang][parser] Add a feature flag for multiple program units on one line. (#186533)
This PR adds a feature flag `MultipleProgramUnitsOnSameLine` that by
default allows program units to be terminated by semicolons, and then
allow the next program unit to follow on the same line.
It also adds some test programs to demonstrate using programming units
and showing the portability warning with "-pedantic".
[X86] Add test showing failure to fold compress(splat(x),splat(x),mask) -> splat(x) (#186823)
Noticed while working on i512 shift expansion - if we end up with repeated splat args, we fail to remove the compress node
[libc][math] Refactored atanpif16 to header only (#184316)
Fixes #178105
Reapply "[clang][ssaf] Add --ssaf-extract-summaries= and --ssaf-tu-summary-file= options" (#186463)
This reverts commit 3548ec95178c00a2895a65b435945ce318396c8e and adapts
the code to the new ScalableStaticAnalysisFramework/ directory layout.
Re-adds:
- `TUSummaryExtractorFrontendAction` and its integration into `ExecuteCompilerInvocation`
- `--ssaf-extract-summaries=` and `--ssaf-tu-summary-file=` CLI options
- SSAFForceLinker / SSAFBuiltinForceLinker headers and anchor symbols
- Diagnostics under -Wscalable-static-analysis-framework
- Lit tests for the CLI and unit tests for the frontend action
- Changes the Formats to be lowercase - and match their spellings in the file paths.
[libc][math] Refactor bf16fma to Header Only (#182572)
Fixes #181625
[MIR][NFC] Test verbalising INLINEASM extra-info flags. (#186796)
Exposes the bug printing inteldialect.
[libc][math] Refactor log_bf16 to Header (#186618)
AMDGPU/GlobalISel: RegBankLegalize rules for ds_read_tr* (#186006)
AMDGPU/GlobalISel: RegBankLegalize rules for ctlz/cttz_zero_undef (#186546)
[X86] known-pow2.ll - add min/max vector test coverage for #182369 (#186841)
AMDGPU/GlobalISel: RegBankLegalize rules for s_wait intrinsics (#186254)
[InstCombine] Support disjoint or in add-sub reassociation fold (#186827)
[lldb] Include stdio.h in synthetic subscript test (#186847)
The [lldb-aarch64-windows](https://lab.llvm.org/buildbot/#/builders/141)
buildbot failed with:
```
lld-link: error: undefined symbol: printf
>>> referenced by main.o:(main)
```
I'm assuming that's because of the use of `__builtin_printf`. In other
tests, we use `printf` form `stdio.h` and these build fine, so I added
an include and used `printf`.
[AMDGPU][GlobalIsel] Add register bank legalization rules for amdgcn_wqm amdgcn_softwqm amdgcn_strict_wqm (#186214)
This patch adds register bank legalization rules for amdgcn_wqm
amdgcn_softwqm amdgcn_strict_wqm in the AMDGPU GlobalISel pipeline.
[flang] Reorder messages wrt line number before diff(actual, expect) (#186812)
When messages are attached together, the source locations to which they
refer are not necessarily monotonically increasing. For example
```
error: foo.f90:10: There is a problem here # line 10
because: foo.f90:12: This thing is invalid # line 12 (attached)
error: foo.f90:11: There is another problem here # line 11
```
There is no way to represent that in the source file via ERROR
annotations, so before running unified_diff "canonicalize" the list of
messages into an order that corresponds to the line numbers.
---------
Co-authored-by: Michael Kruse <llvm-project@meinersbur.de>
[ForceFunctionAttrs] Fix handling of conflicts for more attributes (#186304)
Fixes #185277
ForceFunctionAttrs currently only checks the `alwaysinline`/`noinline`
conflict when forcing function attributes. This is incomplete, because
LLVM verifier rules define additional incompatible function attribute
combinations.
Extend hasConflictingFnAttr() to reject more conflicting function
attributes, including combinations involving `optnone`, `minsize`,
`optsize`, and `optdebug`.
Also add required companion attributes when forcing function attr…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #186362