Skip to content

Commit 94090ce

Browse files
committed
Added tests and resolved conflicts
Added tests and resolved conflicts added a smoke test [clang-format] Fix incorrect trailing comment and escaped newlines when AlignArrayOfStructures is enabled (#180305) This change fixes how the spaces are modified during alignment. Previously it was inconsistent whether the `StartOfTokenColumn` and `PreviousEndOfTokenColumn` members of `WhitespaceManager::Change`s were also updated when their `Spaces` member was changed to align tokens. A new function has been added that properly maintains the relationship between these members, and all places that directly modified `Spaces` have been replaced with calls to this new function. Fixes https://github.com/llvm/llvm-project/issues/138151. Fixes https://github.com/llvm/llvm-project/issues/85937. Fixes https://github.com/llvm/llvm-project/issues/53442. Tests have been added to ensure they stay fixed. Attribution Note - I have been authorized to contribute this change on behalf of my company: ArenaNet LLC libclc: Disable contract in trig reductions (#186432) libclc: Remove attempt at subnormal flush from trig functions (#186424) [clang-format] Ignore imports in comments for Java import sorting (#177326) Java source files can contain apparent import statements inside block comments (e.g., showing a code example). These can get mixed up with real import statements when run through clang-format. This patch tracks block comments (/* ... */) so that we skip lines that are inside them. Fixes #176771 --------- Co-authored-by: Natalia Kokoromyti <knatalia@yost-cm-01-imme.stanford.edu> Co-authored-by: owenca <owenpiano@gmail.com> [lldb/test] Fix MTC dylib path for newer Darwin embedded devices (NFC) Signed-off-by: Med Ismail Bennani <ismail@bennani.ma> [clang-tidy] Fix virtual inheritance FP in misc-multiple-inheritance (#186103) Avoid double-counting concrete bases introduced through virtual inheritance in `misc-multiple-inheritance`. As of AI-Usage: Gemini 3 is used for pre-commit reviewing. Closes https://github.com/llvm/llvm-project/issues/186059 [SPIRV][NFC] Drop uses of BranchInst (#186514) Also simplify the code to use successors(). [lldb][NativePDB] Require `target-windows` for MSVC test (#186578) Fixes the failure on the lldb-remote-linux-win buildbot (https://github.com/llvm/llvm-project/pull/186124#issuecomment-4060098881). The test runs MSVC to produce an executable that only runs on Windows. [lldb] Fix heap.py crashes on recent Darwin embedded targets Two fixes for the ptr_refs/cstr_refs/find_variable heap commands: 1. Move the `task` variable declaration into the common expression preamble. Previously it was only declared inside the `search_heap` code path, causing compilation errors when using `--ignore-heap` with stack or segment scanning. 2. On recent iOS, some shared cache __DATA_CONST pages are remapped to non-accessible at runtime, even though the Mach-O section metadata still marks them as readable. The segment scan would crash with EXC_BAD_ACCESS when reading these pages. Fix by querying actual VM region permissions via SBProcess.GetMemoryRegionInfo() and splitting sections at region boundaries to only scan readable portions. rdar://172543652 Signed-off-by: Med Ismail Bennani <ismail@bennani.ma> [Transforms][NFC] Drop uses of BranchInst in headers (#186580) Replace BranchInst with CondBrInst/UncondBrInst/Instruction in headers and handle the related fall out. The removed code in simplifyUncondBranch was made dead in 0895b836d74ed333468ddece2102140494eb33b6, where FoldBranchToCommonDest was changed to only handle conditional branches. [Transforms/Utils][NFC] Drop uses of BranchInst (#186586) [x86][GlobalISel] Select MOV32ri64 for unsigned 32-bit i64 constants (#185182) x86 GlobalISel currently selects `MOV64ri32` for signed 32-bit `i64` constants and falls back to `MOV64ri` otherwise. That misses the unsigned 32-bit case, where `MOV32ri64` is a better match. FastISel already handles this case by using `MOV32ri64` for zero-extended 32-bit values. Update `X86InstructionSelector::selectConstant()` to select `MOV32ri64` for `i64` constants that fit in `uint32_t`, while keeping `MOV64ri32` for signed 32-bit values and `MOV64ri` for larger constants. This reduces the encoding size for these constants and fixes the `0xffffffff` boundary case to use the correct zero-extending move. [X86] apply mulx optimization for two-wide mul instruction (mull, mulq) (#185127) References: https://github.com/llvm/llvm-project/pull/184462 In the discussion for the linked PR, which removes unnecessary register to register moves when one operand is in %rdx for mulx, the point was brought up that this pattern also happens for mull and mulq. The IR below: ```llvm declare i32 @foo32() declare i64 @foo64() define i32 @mul32_no_implicit_copy(i32 %a0) { %a1 = call i32 @foo32() %a2 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %a0, i32 %a1) %a3 = extractvalue { i32, i1 } %a2, 0 ret i32 %a3 } define i64 @mul64_no_implicit_copy(i64 %a0) { %a1 = call i64 @foo64() %a2 = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a0, i64 %a1) %a3 = extractvalue { i64, i1 } %a2, 0 ret i64 %a3 } ``` Generates this code on current HEAD: ```asm mul32_no_implicit_copy: # @mul32_no_implicit_copy push rbx mov ebx, edi call foo32@PLT mov ecx, eax mov eax, ebx mul ecx pop rbx ret mul64_no_implicit_copy: # @mul64_no_implicit_copy push rbx mov rbx, rdi call foo64@PLT mov rcx, rax mov rax, rbx mul rcx pop rbx ret ``` Where the register shuffling before the mul is the same pattern as for mulx in the previous PR. With this branch it generates this code now: ```asm mul32_no_implicit_copy: pushq %rbx movl %edi, %ebx callq foo32@PLT mull %ebx popq %rbx retq mul64_no_implicit_copy: pushq %rbx movq %rdi, %rbx callq foo64@PLT mulq %rbx popq %rbx retq ``` [StructurizeCFG] Fix incorrect zero-cost hoisting in nested control flow (#183792) hoistZeroCostElseBlockPhiValues() hoists zero-cost instructions from else blocks to their common dominator with the then block. When the merge point has additional predecessors beyond the simple if-else pattern, the hoisted instruction ends up in a dominator that feeds a Flow phi on every edge, including edges where the else block was never taken. simplifyHoistedPhis() then replaces poison entries in those Flow phis with the hoisted value, causing it to leak into unrelated paths. This manifests as miscompilation in sorting kernels compiled with code coverage: the PGO counter blocks create deeply nested CFGs where the hoisted shufflevector (used for swapping sort keys) reaches the no-swap path, corrupting sort results. Fix by requiring a simple if-else CFG shape before hoisting: ThenBB must branch directly to ElseSucc and ElseSucc must have exactly 2 predecessors. This matches the structure that simplifyHoistedPhis assumes. [RISCV] Add more extensions to spacemit-x100 (#186351) [RISCV][NFC] Move extension test for spacemit-x60 to a separate file (#186357) [CIR] Add Commutative/Idempotent traits to binary ops (#185163) Add missing MLIR traits to CIR binary operations: - AndOp, OrOp: Commutative, Idempotent - AddOp, MulOp, XorOp, MaxOp: Commutative Add these ops to the CIRCanonicalize pass op list so trait-based folding is exercised by applyOpPatternsGreedily. [clang-tidy] Fix false positive in `readability-else-after-return` on `return` jumped over by `goto` (#186370) Given this code: ```cpp if (...) { goto skip_over_return; return; skip_over_return: foo(); } else { ... } ``` ...the check suggests removing the `else`, which is not a valid transformation. This is because it looks at *all* the substatements of the then-branch for interrupting statements. This PR changes it to only look at the *final* substatement. Technically, this introduces a false negative on code like this: ```cpp if (...) { return; dead_code(); } else { // <-- Could in theory remove this 'else' ... } ``` But, that code is objectively bad, so I don't think we're losing anything. This change has the side effect of making the check a bit more general; it now recognizes attributed interrupting statements (e.g. `[[clang::musttail]] return f();`). [Transforms/Scalar][NFC] Drop uses of BranchInst (#186592) I ended up relaxing some of the checks that LoopInterchange made, the assumptions that certain instructions were branches seemed to not be used at all. [LV] Move predication, early exit & region handling to VPlan0 (NFCI) (#185305) Move handleEarlyExits, predication and region creation to operate directly on VPlan0. This means they only have to run once, reducing compile time a bit; the relative order remains unchanged. Introducing the regions at this point in particular unlocks performing more transforms once, on the initial VPlan, instead of running them for each VF. Whether a scalar epilogue is required is still determined by legacy cost model, so we need to still account for that in the VF specific VPlan logic. PR: https://github.com/llvm/llvm-project/pull/185305 [IPO][InstCombine][Vectorize][NFCI] Drop uses of BranchInst (#186596) Refactor remaining parts of Transforms apart from Scalar and Utils. [IR][NFC] Remove BranchInst successor functions (#186604) The efficient access is now handled by UncondBrInst/CondBrInst, Instruction functions handle the more generic cases. These functions are now largely unused now that most uses of BranchInst are gone. Preliminary work for making the CondBrInst operand order consistent. [WebAssembly][NFC] Rename and test FastISel selectBr (#186577) selectBr only handles conditional branches and also wasn't tested. Clarify the name and add test that enforces that there's no fallback. [X86] Reject 'p' constraint without 'a' modifier in inline asm (#185799) The 'p' constraint produces an address operand that should only be printed with the 'a' modifier (e.g., %a0). Without it, GCC and Clang produce different and arguably incorrect output https://github.com/llvm/llvm-project/issues/185343#issuecomment-4029670370 Reject the combination to catch misuse early. [llvm-mc] Default output assembly variant to AssemblerDialect (#186317) Previously, llvm-mc always defaulted to output assembly variant 0 regardless of the target's AssemblerDialect. This was inconsistent: llvm-mc -x86-asm-syntax=intel changed the input parser to Intel syntax but output stayed AT&T, unlike clang's -masm=intel which affects both. When --output-asm-variant is not explicitly specified, fall back to MAI->getAssemblerDialect() instead of hardcoding variant 0. This makes the output match the target's configured dialect: - X86: -x86-asm-syntax=intel now produces Intel output - AArch64: Apple triples default to Apple syntax output - SystemZ: z/OS triples default to HLASM syntax output Tests that relied on a specific output variant now use explicit --output-asm-variant=0. [lldb] Rename Status variables to avoid confusion (NFC) (#186486) Rename Status variables that are named `error` to `status` to avoid confusion with llvm::Error as the latter becomes more and more prevalent. [lldb] Skip tests that are incompatible with MTE (#186043) Skip tests that are incompatible with MTE. Depends on: - https://github.com/llvm/llvm-project/pull/185780 [IR] Add Instruction::successors() (#186606) Nowadays all terminators store all successor operands consecutively, so we can expose the range of successors through a unified interface. Rename succ_op_iterator to succ_iterator for consistency, also with Machine IR. Preliminary work for replacing the succ_iterator in CFG.h with an iterator that iterates directly over the uses. [msan][NFCI] Replace unnecessary shadow cast with assertion (#186498) Fabian Wolff pointed out that #176031 made the output of CreateIntCast() unused in handleBitwiseAnd(). Upon closer inspection, the CreateIntCast()s are unnecessary, because the arguments to handleBitwiseAnd() (and visitOr()) are integers or vectors of integers, for which the shadow types are the same as the original types. This patch removes the unnecessary if and shadow cast, and adds assertions. [CIR] Add cir.min op and refactor cir.max lowering (#185276) Add cir.min operation for integer minimum computation. Refactor cir.max lowering into a shared lowerMinMaxOp template reused by both ops. [IR] Make BranchInst operand order consistent (#186609) Ensure that successors are always reported in the same order in which they are stored in the operand list. Improved ISD::SRL handling in isKnownToBeAPowerOfTwo (#182562) Fixes #181651 Added DemandedElts argument to isConstOrConstSplat and to isKnowTobePowerOfTwo calls and OrZero || isKnownNeverZero(Val, Depth) is checked before isKnowTobePowerOfTwo. Also added unit tests. [X86] lowerV4F32Shuffle - prefer INSERTPS over SHUFPS when zeroing upper/lower v2f32 (#186612) Followup to #186468 - use INSERTPS over SHUFPS if the implicit zeroing doesn't cross the the 64-bit halves [LLVM] Change IRBuilder::CreateAggregateRet to accept an ArrayRef (#186605) Change `IRBuilder::CreateAggregateRet()` to accept an `ArrayRef` instead of a pointer and size, and extend IRBuilder unit test to exercise it. [PhaseOrdering][X86] Add average round tests based off #128424 (#186615) [CIR] Remove cir.unary(plus, ...) and emit nothing for unary plus (#185278) Traditional codegen never emits any operation for unary plus — it just visits the subexpression as a pure identity at the codegen level. Align CIRGen with this behavior by removing Plus from UnaryOpKind entirely and having VisitUnaryPlus directly visit the subexpression with the appropriate promotion/demotion handling. [VPlan] Add hasPredecessors and hasSuccessors to VPBlockBase (NFC). Add/move helpers to VPBlockBase, and use in a few more places. Split off from https://github.com/llvm/llvm-project/pull/156262 as suggested. [clang-format] Fix a crash on fuzzer-generated invalid C++ code (#186566) Fixes #185421 [VPlan] Consolidate VPRegionBlock constructors (NFC). Unify VPRegionBlock constructors into a single one, in preparation for https://github.com/llvm/llvm-project/pull/156262. Split off as suggested. [X86] isSplatValueForTargetNode - test source value for vector uniform shift ops (#186619) For old SSE style vector shifts, we just need to check the shifted value is a splat as the shift amount is uniform Avoids an unnecessary variable shuffle in i512 ashr expansion [IR] Implement successors as Use iterators (#186616) This is possible since now all successor operands are stored consecutively. There is just one out-of-line function call instead of one call to getSuccessor() per operand. [VPlan] Remove special handling for canonical increment (NFC). The canonical IV increment should be proven as uniform-across-VF-and-UF by the existing logic. Remove explicit handling, in preparation for https://github.com/llvm/llvm-project/pull/156262. Split off as suggested. [VPlan] Create zero resume value for CanIV directly (NFC). The start value of the canonical IV is always 0. Assert and generate zero VPValue manually in preparation for https://github.com/llvm/llvm-project/pull/156262. Split off as suggested. [Docs] typo settting -> setting (#178665) [libc++][Android] Update Compiler for Android CI (#186531) Upgrade Android compiler from r563880 to r584948b because libc++ does not support LLVM 20 anymore [clang][Driver][Darwin] Optionally use xcselect to find macOS SDK (#119670) This is a scaled down version of https://reviews.llvm.org/D136315. The intent is largely the same as before[^1], but I've scaled down the scope to try to avoid the issues that the previous patch caused: - the changes are now opt-in based on enabling `CLANG_USE_XCSELECT` - this only works when targeting macOS on a macOS host (this is the only case supported by `libxcselect`[^2]) - calling `libxcselect` is done only when the target is `*-apple-macos*` to avoid breaking many tests Another reason to leave this as opt-in for now is that there are some bugs in libxcselect that need fixing before it is safe to use by default for all users. This has been reported to Apple as FB16081077. [^1]: See also https://reviews.llvm.org/D109460 and #45225. [^2]: https://developer.apple.com/documentation/xcselect?language=objc [clang-tidy] Add redundant qualified alias check (#180404) Introduce `readability-redundant-qualified-alias` to flag identity type aliases that repeat a qualified name and suggest using-declarations when safe. The check is conservative: it skips macros, elaborated keywords, dependent types, and templates. `OnlyNamespaceScope` controls whether local/class scopes are included (default `false`). Depends on: #183940 #183941 [CIR] Split CIR_UnaryOp into individual operations (#185280) Split the monolithic cir.unary operation (which dispatched on a UnaryOpKind enum) into four separate operations: cir.inc, cir.dec, cir.minus, and cir.not. Changes: - Add CIR_UnaryOpInterface with getInput()/getResult() methods - Add CIR_UnaryOp and CIR_UnaryOpWithOverflowFlag base classes - Define IncOp, DecOp, MinusOp, NotOp with per-op folds - Add Involution trait to NotOp for not(not(x)) -> x folding - Replace createUnaryOp() with createInc/Dec/Minus/Not builders - Split LLVM lowering into four separate patterns - Split LoweringPrepare complex-type handling per unary op - Update CIRCanonicalize and CIRSimplify for new op types - Update all codegen files to use bool params instead of UnaryOpKind - Remove CIR_UnaryOpKind enum and old CIR_UnaryOp definition Assembly format change: cir.unary(inc, %x) nsw : !s32i, !s32i -> cir.inc nsw %x : !s32i cir.unary(not, %x) : !u32i, !u32i -> cir.not %x : !u32i [AggressiveInstCombine] Recognize table based log2 and replace with ctlz+sub. (#185160) Recognize table based log2 implementations like ``` unsigned log2(unsigned v) { static const unsigned char table[] = { 0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31 }; v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16; return table[(unsigned)(v * 0x07C4ACDDU) >> 27]; } ``` and replaces with 31 - llvm.ctlz(v). Similar for i64 log2. Other sizes can be supported with correct multiply constant and table values, but I have not found examples yet. This code is based on the existing tryToRecognizeTableBasedCttz. Like that function, we support any combination of multiply constant and table values that produce the correct result. It handles the same pattern as #177110, but does not match the outer subtract from that patch. It is assumed that InstCombine or other optimizations can combine (sub 31 (sub 31, cttz V)) later. I have limited this to targets that have a fast ctlz. The backend does not yet have a table based lowering for ctlz so this reduces the chance of regressions. [MLIR][Python] Refine the behavior of Python-defined dialect reloading (#186128) This includes several changes: - `Dialect.load(reload=False)` will fail if the dialect was already loaded in a different context. To prevent the further program abortion. - `Dialect.load(reload=True)` implies `replace=True` in dialect/operation registering. - `PyGlobals::registerDialectImpl` now has a parameter `replace`. - `register_dialect` and `register_operation` is no longer exposed in `mlir.dialects.ext`. This should solve the registering problem found in writing transform test cases by @rolfmorel. [libc++][test] Use loop with compare_exchange_weak calls (#185953) On AIX, this test sometimes fails with error `Assertion failed: y == true`. The test assumes `compare_exchange_weak` should succeed on a single call, however according to the standard: > A weak compare-and-exchange operation may fail spuriously. That is, even when the contents of memory referred to by expected and ptr are equal, it may return false and store back to expected the same memory contents that were originally there. This spurious failure enables implementation of compare-and-exchange on a broader class of machines, e.g., load-locked store-conditional machines. A consequence of spurious failure is that nearly all uses of weak compare-and-exchange will be in a loop. [atomics.ref.ops]/27 [orc-rt] Rename "ResourceManager" to "Service". NFCI. (#186639) The name "Service" better reflects the general purpose of this class: It provides *something* (often resource management) to the Session, is owned by the Session, and receives notifications from the Session when the controller detaches / is detached, and when the Session is shut down. An example of a non-resource-managing Service (to be added in an upcoming patch) is a detach / shutdown notification service: Clients can add this service to register arbitrary callbacks to be run on detach / shutdown. The advantage of this over the current Session detach / shutdown callback system is that clients can control both the order of the callbacks, and their order relative to notification of other services. [orc-rt] Return ref from Session::addService, add createService. (#186640) Session::addService now returns a reference to the added Service. This allows clients to hold a reference for further direct interaction with the Service object. This commit also introduces a new Session::createService convenience method that creates the service and returns a reference to it. [mlir] Fix op comparisons in extensible dialects (#186637) The extensible dialect system defined `compareProperties` to false because it doesn't use properties. However, this should have been `true`, as the empty properties are trivially always equal to themselves. Doing otherwise means that no operations in extensible dialects that aren't the exact same operation will ever compare equal for the purposes of operations like CSE. [clang-format] Upgrade ShortFunctionStyle to a struct (#134337) The current clang-format configuration option AllowShortFunctionsOnASingleLine uses a single enum (ShortFunctionStyle) to control when short function definitions can be merged onto a single line. This enum provides predefined combinations of conditions (e.g., None, Empty only, Inline only, Inline including Empty, All). This approach has limitations: 1. **Lack of Granularity:** Users cannot specify arbitrary combinations of conditions. For example, a user might want to allow merging for both empty functions and short top-level functions, but not for short functions defined within classes. This is not possible with the current enum options except by choosing All, which might merge more than desired. 2. **Inflexibility:** Adding new conditions for merging (e.g., distinguishing between member functions and constructors, handling lambdas specifically) would require adding many new combined enum values, leading to a combinatorial explosion and making the configuration complex. 3. **Implicit Behavior:** Some options imply others (e.g., Inline implies Empty), which might not always be intuitive or desired. The goal is to replace this single-choice enum with a more flexible mechanism allowing users to specify a set of conditions that must be met for a short function to be merged onto a single line. --------- Co-authored-by: owenca <owenpiano@gmail.com> [clang][bytecode] Remove unused members from `EvalEmitter` (#186601) Remove the DenseMap handling lambda paramter mappings from `EvalEmitter`. This was always unused. Remove it and use `if constexpr` to keep things compiling. [CMake] Disable PCH reuse for plugins in non-PIC builds (#186643) Plugins are always PIC and therefore cannot reuse non-PIC PCH. [Analysis][NFC] Move BranchProbabilityInfo constr to cpp (#186648) The implementation details of the analysis are irrelevant for users, therefore move these to the .cpp file. [clang-format] Add option AllowShortRecordOnASingleLine (#154580) This patch supersedes PR #151970 by adding the option ``AllowShortRecordOnASingleLine`` that allows the following formatting: ```c++ struct foo {}; struct bar { int i; }; struct baz { int i; int j; int k; }; ``` --------- Co-authored-by: owenca <owenpiano@gmail.com> [clang][ssaf][NFC] Prefix ssaf-{linker,format} dirs with 'clang-' (#186610) Addresses: https://github.com/llvm/llvm-project/pull/185631#issuecomment-4054586633 [X86] Add missing VPSRAQ broadcast-from-mem patterns for non-VLX targets (#186654) [clang-tidy] Adds do-while support to performance-inefficient-string-concatenation (#186607) Closes #186362 --------- Co-authored-by: Victor Chernyakin <chernyakin.victor.j@outlook.com> Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com> [X86] known-never-zero.ll - add vector test coverage for #186335 (#186660) Support float8_e3m4 and float8_e4m3 in np_to_memref (#186453) This patch adds support for `float8_e3m4` and `float8_e4m3` in `np_to_memref.py` by adding the appropriate ctypes structures [Transforms/Utils][NFC] Replace SmallPtrSet with vector (#186664) Typically most blocks in a function are reachable, so use a vector indexed by block number instead of a SmallPtrSet. [SimplifyCFG][NFC] Renumber blocks when changing func (#186666) Keep numbering dense when changing the function. SimplifyCFG is a good candidate, because it is likely to remove blocks and preserves few analyses. [CFG][InstCombine][NFC] Use block numbers when finding backedges (#186668) The functions traverse all basic blocks, so SmallPtrSets use a single vector indexed by block number. [CodeGenPrepare][NFC] Get BPI/BFI from pass/analysis manager (#186651) BranchProbabilityInfo will compute it's own dominator tree and post-dominator tree if none is specified; avoid this by using the analysis manager/pass manager to get the analysis, which will reuse the previously computed DomTree. [X86] combineConcatVectorOps - concat(vtruncs(x),vtruncs(y)) -> packss(shuffle(x,y),shuffle(x,y)) (#186678) Although at worst this isn't a reduction in instruction count, the shuffle/packss sequence is much easier for further folds / shuffle combining Revert "[CI] Try lowering max parallel link jobs on Windows (#185255)" This reverts commit af22b50fac2311ff3f859e4e8bdec552c7aa8d5a. This seems to have had no noticeable effect on the frequency of failures so likely was not the issue. Revert "Support float8_e3m4 and float8_e4m3 in np_to_memref (#186453)" (#186677) This reverts commit 57427f84fe5fdda71aef4be257ed28d7b4f55d05. For some reason mlir-nvidia CI is failing to import `float8_e3m4` from `ml_dtypes`. See https://lab.llvm.org/buildbot/#/builders/138/builds/27095. [X86] combineConcatVectorOps - concat(vtruncus(smax(x,0)),vtruncus(smax(y,0))) -> packus(shuffle(x,y),shuffle(x,y)) (#186681) Followup to vtruncs/packss handling Update GitHub Artifact Actions (major) (#184052) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [actions/download-artifact](https://redirect.github.com/actions/download-artifact) | action | major | `v7.0.0` → `v8.0.1` | | [actions/upload-artifact](https://redirect.github.com/actions/upload-artifact) | action | major | `v6.0.0` → `v7.0.0` | | [actions/upload-artifact](https://redirect.github.com/actions/upload-artifact) | action | major | `6.0.0` → `7.0.0` | [BPF] Use ".L" local prefix label for basic blocks (#95103) Previously, PrivateLabelPrefix was default-initialized to "L", so basic block labels were added to the symbol table. This seems like an oversight, so use ".L" for all private labels. [clang-tidy][NFC] Use universal type_traits mock (#186652) [Utils] Format git-llvm-push Use single quotes for string arguments inside f-strings or otherwise the version of black that we use fails to parse. Also reformat the file given that hasn't been working for a while (wholesale or incrementally) to the above issue. [clang][doc] Improve error handling for `LibTooling` example code avoiding core dump (#98129) Resolves #97983 [Clang][Docs] Clarify [[unlikely]] example in compound statement (#186590) The first code example in the "confusing standard behavior" section had a comment claiming `[[unlikely]]` makes the branch unlikely, contradicting a later example showing the same placement being ignored. Rewords the comment to clarify this is the C++ Standard's recommendation that Clang does not follow, since the attribute is not on the substatement. Continues the work from #126372. Fixes #126362. [libc][Github] Bump clang in libc container to v23 (#186697) Back to HEAD now that apt.llvm.org is working again for ToT. [gn] port 629edaf67844c01db37 (CLANG_USE_XCSELECT) [gn] port f002fc0ee8734283 [IR] Don't allow successors() over block without terminators (#186646) There's no point constructing a dominator tree or similar on known-broken IR. Generally, functions should be able to assume that IR is valid (i.e., passes the verifier). Users of this "feature" were: - Verifier, fixed by verifying existence of terminators first. - FuzzMutate, worked around by temporarily inserting terminators. - OpenMP to run analyses while building the IR, worked around by temporarily inserting terminators. - Polly to work with an empty dominator tree, fixed by temporarily adding an unreachable inst. - MergeBlockIntoPredecessor, inadvertently, fixed by adding terminator before updating MemorySSA. - Some sloppily written unit tests. [IR] Add initial support for the byte type (#178666) Following the [byte type RFC](https://discourse.llvm.org/t/rfc-add-a-new-byte-type-to-llvm-ir/89522) and the discussions within the [LLVM IR Formal Specification WG](https://discourse.llvm.org/t/rfc-forming-a-working-group-on-formal-specification-for-llvm/89056), this PR introduces initial support for the byte type in LLVM. This PR: - Adds the byte type to LLVM's type system - Extends the `bitcast` instruction to accept the byte operands - Adds parsing tests for all new functionality - Fixes failing regressions tests (IR2Vec and IRNormalizer) --------- Co-authored-by: George Mitenkov <georgemitenk0v@gmail.com> [orc-rt] Don't return Error in Service::OnComplete. (#186708) The Session can't do anything useful with these errors, it can only report them. It's cleaner if the Service objects just report the error directly. [clang-tidy][NFC] Use universal memory mock for smart ptrs (#186649) [orc-rt] Fix unittests after 53a1e056f38. (#186711) Updates unittests to reflect Service interface changes. Revert "[IR] Add initial support for the byte type" (#186713) Reverts llvm/llvm-project#178666 to unblock CI. `CodeGen/X86/byte-constants.ll` is at fault. Will look into it and hopefully fix it by tomorrow. [NFC] Delete `MCPseudoProbeDecoder`'s move constructor (#186698) `MCPseudoProbeDecoder` cannot be copeied/moved due to its address dependence on the DummyInlineRoot member address. Explicitly delete the move constructor. [RISCV] Add `sifive-x160` and `sifive-x180` processor definitions (#186264) This PR adds new processor definitions for two SiFive cores: - X160 (https://www.sifive.com/document-file/sifive-intelligence-x160-gen2-product-brief): A RV32 core with Zve32f - X180 (https://www.sifive.com/document-file/sifive-intelligence-x180-gen2-product-brief): A RVV-capable RV64 core Both of them have VLEN=128. Scheduling model supports will be added in follow-up patches. [orc-rt] Add a simple iterator_range class. (#186720) This will be used to simplify operations on iterator ranges in the ORC runtime. [LoongArch] Remove unreachable Value check in fixupLeb128 (#186297) Value is guaranteed to be zero after the loop: for (I = 0; Value; ++I, Value >>= 7) Therefore the subsequent `if (Value)` condition is always false. Remove the unreachable code. Reported by PVS-Studio. Fixed: #170122 [lld][ELF] Fix crash when relaxation pass encounters synthetic sections In LoongArch and RISC-V, the relaxation pass iterates over input sections within executable output sections. When a linker script places a synthetic section (e.g., .got) into such an output section, the linker would crash because synthetic sections do not have the relaxAux field initialized. The relaxAux data structure is only allocated for non-synthetic sections in initSymbolAnchors. This patch adds the necessary null checks in the relaxation loops (relaxOnce and finalizeRelax) to skip sections that do not require relaxation. A null check is also added to elf::initSymbolAnchors to ensure the subsequent sorting of anchors is safe. Fixes: #184757 Reviewers: MaskRay Pull Request: https://github.com/llvm/llvm-project/pull/184758 [clang] Skip dllexport of inherited constructors with unsatisfied constraints (#186497) When a class is marked `__declspec(dllexport)`, Clang eagerly creates inherited constructors via `findInheritingConstructor` and propagates the dllexport attribute to all members. This bypasses overload resolution, which would normally filter out constructors whose requires clause is not satisfied. As a result, Clang attempted to instantiate constructor bodies that should never be available, causing spurious compilation errors. Add constraint satisfaction checks in `checkClassLevelDLLAttribute` to match MSVC behavior: 1. Before eagerly creating inherited constructors, verify that the base constructor's `requires` clause is satisfied. Skip creation otherwise. 2. Before applying dllexport to non-inherited methods of class template specializations, verify constraint satisfaction. This handles the case where `dllexport` propagates to a base template specialization whose own members have unsatisfied constraints. Inherited constructors skip the second check since their constraints were already verified at creation time. Fixes #185924 Followup to https://github.com/llvm/llvm-project/pull/182706 Assisted by: Cursor // Claude Opus 4.6 [orc-rt] Add LockedAccess utility. (#186737) LockedAccess provides pointer-like access to a value while holding a lock. All accessors are rvalue-ref-qualified, restricting usage to temporaries to prevent accidental lock lifetime extension. A with_ref method is provided for multi-statement critical sections. [CIR] Add Pure trait to IsFPClassOp (#186625) IsFPClassOp is a pure classification check on a floating-point value with no memory effects. [clangd] Report reference to UsingType's target decl at the correct location (#186310) Fixes https://github.com/clangd/clangd/issues/2617 [SelectionDAG] Add CTTZ_ELTS[_ZERO_POISON] nodes. NFCI (#185600) Currently llvm.experimental.cttz.elts are directly lowered from the intrinsic. If the type isn't legal then the target tells SelectionDAGBuilder to expand it into a reduction, but this means we can't split the operation. E.g. it's possible to split a cttz.elts nxv32i1 into two nxv16i1, instead of expanding it into a nxv32i64 reduction. vp.cttz.elts can be split because it has a dedicated SelectionDAG node. This adds CTTZ_ELTS and CTTZ_ELTS[_ZERO_POISON] nodes and just enough legalization to get tests passing. A follow up patch will add splitting and move the expansion into LegalizeDAG. [mlir][linalg] Use inferConvolutionDims for generic convolution downscaling (#180586) The goal of this PR is to implement a generic, structure-aware convolution downscaling transformation that works for any convolution-like operation regardless of its specific layout or naming, rather than relying on pattern-matching against specific named operations. Each pattern we currently have, have hardcoded dimension indices specific to its layout (e.g., NHWC vs NCHW). This approach :- 1. Requires maintaining many similar patterns. 2. Is brittle when new layouts are introduced. 3. Cannot handle batchless versions of the conv variants. This PR thus creates a single downscaleSizeOneWindowedConvolution function that uses `inferConvolutionDims` to semantically understand the convolution structure (batch dims, output image dims, filter loop dims, etc.) rather than hardcoding indices. It works with any layout - NHWC, NCHW, or any other - because it reasons about the meaning of dimensions, not their positions. If the input to the downscaling pattern is a named op -> the output will be a named op. Else it'd be a generic op input/output. And for this reason we now remove the second RUN line as the infra tests both named as well as generic ops. Signed-off-by: Abhishek Varma <abhvarma@amd.com> [clang-tidy] Fix an edge case in readability-implicit-bool-conversion (#186234) Fix a FP for condition expressions wrapped by `ExprWithCleanups`. Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com> Co-authored-by: Zeyi Xu <zeyi2@nekoarch.cc> [X86][APX] Combine MOVABS+JMP to JMPABS when in no-PIC large code model (#186402) [CodeGen] Call getMCPU once instead of commonly twice (NFC) (#186581) [ARM] Try to lower sign bit SELECT_CC to shift (#186349) Lower a `x < 0 ? 1 : 0` style SELECT_CC to `x>>(bw-1)`. This will become more important with an upcoming change, but also appears to be somewhat useful by itself. [C++20] [Modules] Don't add discardable variables to module initializers (#186752) Close https://github.com/llvm/llvm-project/issues/170099 The root cause of the problem is, we shouldn't add the inline variable (which is discardable in linker's point of view) to the module's initializers. I verified with GCC's generated code to make the behavior consistent. This is also a small optimization by the way. [LV] Add more tests for blend masks. NFC (#186751) To be used in #184838 [LangRef] Fix typo in signatures for rounding intrinsics (#186709) Fixes #186536 [lldb-dap] Mark return value as readonly (#186329) Marked return value as readonly to give VS Code a hint that this variable doesn't support `setVariable` request. [orc-rt] Add Controller Interface (CI) symbol table to Session. (#186747) The Controller Interface is the extended set of symbols (mostly wrapper functions) that the controller can call prior to loading any JIT'd code. It is expected that it will be used to inspect the process and create / configure services to enable JITing. [AArch64] Add extra test coverage to legalize-shuffle-1x.ll. NFC [AMDGPU] Initialize more fields in the SIInsertWaitcnts constructor. NFC. (#186394) ST, TII, TRI and MRI can all be initialized in the constructor and hence be references instead of pointers. [AVR] Optimize expansion of pseudo instruction SPWRITE for no SPH devices (#152905) fixes https://github.com/llvm/llvm-project/issues/148560 [AMDGPU] Simplify state clearing in SIInsertWaitcnts. NFC. (#186399) There is no need to clear state at the start or end of the run method, because a fresh instance of SIInsertWaitcnts is constructed for each run on a MachineFunction. [flang][NFC] Converted five tests from old lowering to new lowering (part 31) (#186299) Tests converted from test/Lower/Intrinsics: iall.f90, iand.f90, iany.f90, ibclr.f90, ibits.f90 [libc++] Avoid including <cmath> in <format> (#186332) This reduces the time to parse `<format>` a bit. [X86] Blocklist instructions that are unsafe for masked-load folding. (#178888) This PR blocklist instructions that are unsafe for masked-load folding. Folding with the same mask is only safe if every active destination element reads only from source elements that are also active under the same mask. These instructions perform element rearrangement or broadcasting, which may cause active destination elements to read from masked-off source elements. VPERMILPD and VPERMILPS are safe only in the rrk form, the rik form needs to be blocklisted. In the rrk form, the masked source operand is a control mask, while in the rik form the masked source operand is the data/value. This is also why VPSHUFB is safe to fold, while other shuffles such as VSHUFPS are not. Examples: ``` EVEX.128.66.0F.WIG 67 /r VPACKUSWB xmm1{k1}{z}, xmm2, xmm3/m128 A: 00010203 7F000001 80000002 DEADBEEF E : 00000000 00000001 00000002 00000003 D: 11111111 22222222 33333333 44444444 k = 0x0400 Masked_e = 00000000 00000000 00000000 00000000 (vmovdqu8{k}{z} Masked_e E) res1 = 00000000 00000000 00010000 00000000 (VPACKUSWB D{k}{z}, A, E) res2 = 00000000 00000000 00000000 00000000 (VPACKUSWB D{k}{z}, A, Masked_e) EVEX.128.66.0F38.W0 C4 /r VPCONFLICTD xmm1 {k1}{z}, xmm2/m128/m32bcst A: DAA66D2B FFFFFFFC FFFFFFFC D9A0643C E : 7DDF743F 00000000 5FD99E73 4ED634C9 D: 2629AB38 9E37782F 67BB800F AD66764A k = 0x0002 Masked_e = (vmovdqu32 {k}{z} Masked_e E) res1 = 00000000 00000000 00000000 00000000 (VPCONFLICTD D{k}{z}, E) res2 = 00000000 00000001 00000000 00000000 (VPCONFLICTD D{k}{z}, Masked_e) EVEX.128.66.0F38.W1 8D /r VPERMW xmm1 {k1}{z}, xmm2, xmm3/m128 A: 00010203 7F000001 80000002 DEADBEEF E : 00000000 00000001 00000002 00000003 D: 11111111 22222222 33333333 44444444 k = 0x0010 Masked_e = 00000000 00000000 00000002 00000000 (vmovdqu16 {k}{z} Masked_e E) res1 = 00000000 00000000 00000001 00000000 (vpermw D{k}{z}, A, E) res2 = 00000000 00000000 00000000 00000000 (vpermw D{k}{z}, A, Masked_e) EVEX.128.66.0F38.W0 78 /r VPBROADCASTB xmm1{k1}{z}, xmm2/m8 E : 7F4A7C15 6E490933 5D4C9659 4C433CE3 D: F63F9D36 97F6E2B2 9432E8E6 FAEE7A3E k = 0x0002 Masked_e = 00007C00 00000000 00000000 00000000 (vmovdqu8{k}{z} Masked_e E) res = 00001500 00000000 00000000 00000000 (vpbroadcastb D{k}{z}, E) res = 00000000 00000000 00000000 00000000 (vpbroadcastb D{k}{z}, Masked_e) ``` Baseline: https://github.com/llvm/llvm-project/pull/178411 [flang][OpenMP] Implement nest depth calculation in LoopSequence (#186477) Calculate two depths, a semantic one and a perfect one. The former is the depth of a loop nest taking into account any loop- or sequence-transforming OpenMP constructs. The latter is the maximum level to which the semantic nest is a perfect nest. Issue: https://github.com/llvm/llvm-project/issues/185287 Reinstate PR185298 after a fix has been merged in PR186416. Includes a testcase that triggered failures before. [clang][bytecode] Remove FunctionPointer class (#186757) It's been mostly living inside `Pointer` for a long time now, so remove the leftovers. [SPIR-V] Address comments on SPV_INTEL_masked_gather_scatter extension implementation (#186336) Address comments left after merge of #185418 [libc] Fix build failures in fuzzing tests (#185017) The tests: - __support/freelist_heap_fuzz.cpp - fuzzing/string/strlen_fuzz.cpp had build failures for different reasons. This patch fixes these failures. freelist_heap_fuzz.cpp had this error: ``` llvm-project/libc/fuzzing/__support/freelist_heap_fuzz.cpp:150:26: error: use of undeclared identifier 'Block'; did you mean '__llvm_libc_23_0_0_git::Block'? 150 | size_t alignment = Block::MIN_ALIGN; | ^~~~~ | __llvm_libc_23_0_0_git::Block ``` The issue stems from the fact that Block was not available in scope. It needs to be referenced via LIBC_NAMESPACE. strlen_fuzz.cpp had this error: ``` In file included from Workspace/llvm-project/libc/fuzzing/string/strlen_fuzz.cpp:14: In file included from /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/cstdint:38: In file included from /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/x86_64-linux-gnu/c++/13/bits/c++config.h:679: /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/x86_64-linux-gnu/c++/13/bits/os_defines.h:44:5: error: function-like macro '__GLIBC_PREREQ' is not defined 44 | #if __GLIBC_PREREQ(2,15) && defined(_GNU_SOURCE) ``` This issue is more cryptic to me, but I managed to fix it by changing the includes from cstdint and cstring to stdint.h and string.h. [LifetimeSafety] Extract Sema helper implementation to separate header (#186492) Improves code organization by separating lifetime safety Sema-specific functionality into its own header file. [clang][AArch64] Update label in test (nfc) (#186759) [clang-tidy] Fix performance-use-std-move when moving a forward decl (#186704) This fixes running clang-tidy on top-of-tree with that check on. [clang][bytecode][NFC] Pre-commit a test case (#186773) Make sure we get the `expand()` during `computeOffsetForComparison()` right. [Analysis][NFC] Use block numbers for BranchProbabilityInfo (#186658) Instead of a hash map mapping pairs of blocks and successor index to the probability, store the probabilities as flat array and start indices into this array in a per-block information vector. Also drop value handles: no stored pointers => no stale pointers. If a block is removed, the block number is not reused unless the function is renumbered, and BPI doesn't support renumbering. [WebAssembly] Lower wide vector shifts by constant to extmul pairs (#184007) Wide vector multiplications by power-of-2 constants were canonicalized to v8i32 shl nodes. Generic legalizers then split these into separate 128-bit extend and shift operations, bypassing WebAssembly's native extended multiplication patterns. Before: mul v8i32:t1, <4096, ...> => shl v8i32:t1, <12, ...> => split into independent 128-bit extend + shift sequences WebAssembly SIMD has no native wide vector shifts, but it does support 128-bit extended multiplications. Lowering these nodes directly to extmul_low/extmul_high pairs keeps them in native 128-bit form and improves DAG matching. After: mul v8i32:t1, <4096, ...> => concat_vectors (extmul_low t1, c), (extmul_high t1, c) This preserves the original vector width while utilizing the native 128-bit SIMD pipeline. Fixed: https://github.com/llvm/llvm-project/issues/179143 [LSR] Remove unnecessary WidestFixupType (NFC) (#185013) The purpose of WidestFixupType is to prevent FindUseWithSimilarFormula from matching a formula with different widest fixup type, but this never happens: * FindUseWithSimilarFormula is only called by NarrowSearchSpaceByCollapsingUnrolledCode * That function only considers Address and ICmpZero kinds, as they're the only ones that allow a nonzero BaseOffset * In an Address use all fixups have pointer type * FindUseWithSimilarFormula already excludes ICmpZero uses [AMDGPU] Make WaitcntBrackets::Limits a reference. NFC. (#186782) Reland [VPlan] Extend interleave-group-narrowing to WidenCast (#186454) The patch was intially landed as bd5f9384, but then reverted due to an underlying issue in narrowInterleaveGroups, described in #185860. The issue has since been fixed. The reland is simply a conflict-resolved version of the original patch, which includes an additonal test update. WidenCast is very similar to Widen recipes. Fixes #128062. [IR] Drop BasicBlockEdge::isSingleEdge (#186767) This was only called on CondBr instructions, where it is always faster to access the successors directly than to use successors(). Multi-edges don't dominate anything, so this rare case is often already handled by dominates(). There is also a very small (hardly measurable) performance improvement here (it did show up in profiles at 0.03% or so). [C2y] Update the C Status Page from the recent meetings (#186487) The Feb and Mar 2026 virtual meetings are now concluded, these are the adopted papers which could potentially impact the compiler. [libclc] Add generic clc_mem_fence instruction (#185889) Summary: This can be made generic, which works as expected on NVPTX and SPIR-V. We do not replace this for AMDGPU because the dedicated built-in has an extra argument that controls whether or not local memory or global memory will be invalidated. It would be correct to use this generic operation there, but we'd lose that minor optimization so we likely should not regress. [NFC][analyzer] Eliminate NodeBuilder::getContext() (#186201) This is a step towards the removal of the type `NodeBuilderContext`. The few remaining locations that used `NodeBuilder::getContext()` were changed to use the methods `getCurrBlock()` and `getNumVisitedCurrent()` of `ExprEngine`. The new code is equivalent to the old one because the `NodeBuilder`s were constructed with `ExprEngine::currBldrCtx` as their context, which is currently the "backend" behind `getCurrBlock()` and `getNumVisitedCurrent()` -- but these methods will remain valid after the removal of `NodeBuilderContext` and `currBldrCtx`. [libc][Github] Bump libc-fullbuild-tests.yml to clang 23 (#186699) Do this now that it is available in the container. [LifetimeSafety] Add user documentation (#183058) [LLVM][CodeGen][SVE] insert_subvector(undef, splat(C), 0) -> splat(C). (#186090) When converting a fixed-length constant splats to scalable vector we can instead regenerate the splat using the target type. [ADT] Add `Repeated<T>` for memory-efficient repeated-value ranges (#186721) Introduce a lightweight range representing N copies of the same value without materializing a dynamic array. The range owns this value. I plan to use it with MLIR APIs that often end up requiring N copies of the same thing. Currently, we use `SmallVector<T>(N, Val)` for these, which is wasteful. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> [NFC][analyzer] Refactor ExprEngine::processCallExit (#186182) This commit converts `ExprEngine::processCallExit` to the new paradigm introduced in 1c424bfb03d6dd4b994a0d549e1f3e23852f1e16 where the current `LocationContext` and `Block` is populated near the beginning of the `dispatchWorkItem` call (= elementary analysis step) and remains available during the whole step. Unfortunately the first half of the `CallExit` procedure (`removeDead`) happens within the callee context, while the second half (`PostCall` and similar callbacks) happen in the caller context -- so I need to change the current `LocationContext` and `Block` at the middle of this big method. This means that I need to discard my invariant that `setCurrLocationContextAndBlock` is only called once per each `dispatchWorkItem`; but I think this exceptional case (first half in callee, second half in caller) is still clear enough. In addition to this main goal, I perform many small changes to clarify and modernize the code of this old method. [IR][NFC] Hot-cold splitting in PatternMatch (#186777) ConstantAggregates are rare, therefore split that check into a separate function so that the fast path can be inlined. Likewise for vectors, which occur much less frequently than scalar values. [AArch64] Add partial reduce patterns for new sve dot variants (#184649) This patch enables generation of new dot instruction added in 2025 arm extension from partial reduce nodes. Update docker/login-action action to v4 (#186719) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [docker/login-action](https://redirect.github.com/docker/login-action) | action | major | `v3.6.0` → `v4.0.0` | AMDGPU: Don't limit VGPR usage based on occupancy in dVGPR mode (#185981) The maximum VGPR usage of a shader is limited based on the target occupancy, ensuring that the targeted number of waves actually fit onto a CU/WGP. However, in dynamic VGPR mode, we should not do that, because VGPRs are allocated dynamically at runtime, and there are no static constraints based on occupancy. Fix that in this patch. Also fixup the getMinNumVGPRs helper to behave consistently by always returning zero in dVGPR mode. This also fixes a problem where AMDGPUAsmPrinter bumps the VGPR usage to at least the result of getMinNumVGPRs, per my understanding in order to avoid an occupancy that is higher than the occupancy target. That was causing incorrect (too high) VGPR usages in dVGPR mode with medium-sized workgroups (say 768). [VPlan] Assert CanIV is the first header phi, drop begin (NFC). Split off as suggested in https://github.com/llvm/llvm-project/pull/156262/. [DWARFVerifier] Fix infinite loop in verifyDebugInfoCallSite (#186413) When attempting to find the callsite for a DwarfDie to see if it was valid or not, there was a while loop that incorrectly attempted to walk up the Die parent hierarch. It set `curr` to parent, but then `curr` was set to same original parent instead of curr.getParent(). This caused infinite recursion on validation of some kernel binaries by llvm-dwarfdump where DW_TAG_call_site was nested inside a DW_TAG_lexical_block (or any non-subprogram, non-inlined_subroutine tag). Fix by changing Die.getParent() to Curr.getParent() so the loop correctly walks up the DIE tree. Add a new test that validates this scenario. Without this change, that test hangs rather than succeeding. [IR][NFC] Inline CmpInst::isSigned/isUnsigned (#186791) These are small helper functions that are called somewhat often, so inlining is beneficial. A very minor improvement. Nonetheless, these two functions are called somewhat regularly and compile to three instructions each, so it is always beneficial to inline them. [Utils] Modernize type annotations in git-llvm-push Import annotations from __future__ so we can start using more modern annotations now rather than once we move to Python 3.10 while still preserving Python 3.8 compatibility. Also fix a couple typing issues while here. Reviewers: ilovepi, petrhosek Pull Request: https://github.com/llvm/llvm-project/pull/186690 [CodeGen] Fix C++ global dtor for non-zero program AS targets (#186484) In codegen for C++ global destructors, we pass a pointer to the destructor to be called at program exit as the first arg to the `__cxa_atexit` function. If the target's default program AS and default AS are not equal, we need to emit an addrspacecast from the program AS to the generic AS (which is used as the argument type for the first arg of `__cxa_atexit`) in the function call. --------- Signed-off-by: Nick Sarnie <nick.sarnie@intel.com> [NFC][LLVM] Fix indentation issue in AArch64ExpandPseudo::expandMI (#186375) [lldb][NativePDB] Compile `vbases.test` without default libraries (#186510) `--target=x86_64-windows-msvc`. This will cause the final executable to be linked to `libcmt.lib`. That doesn't work on ARM, so this PR changes the command line to link without the default libraries. They're not needed if we disable `/GS` (buffer security check) like in other tests. We use `%clang_cl` over `%build` to be able to compile with DWARF as well. [lit] Stop holding subprocess objects open in TimeoutHelper (#186712) Tweak TestRunner's TimeoutHelper storage to hold only PIDs rather than the whole process object. Holding the object causes many pipes to stay open, when all we need is the pid. Addresses #185941 [SPIR-V] Fix llvm.spv.gep return type for vector-indexed GEPs (#185931) The `int_spv_gep` intrinsic was defined with `llvm_anyptr_ty` which forced it to return a scalar pointer. Change the return type to `llvm_any_ty` to allow the intrinsic to match the actual result type of the original GEP, whether scalar or vector [Flang][OpenMP] Provide option to use heap allocation for private adjustable arrays (#186795) The size of adjustable Fortran arrays is not known at compilation time. Using limited GPU stack memory may cause hard-to-debug errors. On the other hand, switching to heap memory allocation may lead to missed optimization opportunities and significantly increased kernel execution time. Adding the option `-mmlir --enable-gpu-heap-alloc` allows the user to generate valid code for adjustable Fortran arrays. The flag is off by default, so there is no efficiency penalty for code that does not use adjustable arrays. [libc] Fix llvm-gpu-loader passing uninitialized device memory (#186804) Summary: The return value was not zeroed, this was accidentally dropped when we did the port and it's zero "almost always" so I didn't notice. Hopefully this makes the test suite no longer flaky. [mlir][linalg][elementwise] Fold broadcast into new elementwise (#167626) Fold broadcast into new elementwise Op which has affine-map attached. Merging on behalf of @someoneinjd [DomTree] Assert non-null block for pre-dom tree (#186790) In a pre-dominator tree, blocks should never be null. [mlir][llvmir][OpenMP] Translate affinity clause in task construct to llvmir (#182223) Translate affinity entries to LLVMIR by passing affinity information to createTask (__kmpc_omp_reg_task_with_affinity is created inside PostOutlineCB). 3/3 in stack for implementing affinity clause with iterator modifier 1/3 #182218 2/3 #182222 3/3 #182223 [lldb][Module] Remove feedback_stream parameter from LoadScriptingResources (#186787) I'm in the process of making `LoadScriptingResources` interactively ask a user whether to load a script. I'd like to turn the existing warning into the prompt. The simplest way to achieve this is to not print into a `feedback_stream` parameter, and instead create a prompt right there. This patch removes the `feedback_stream` parameter and emits a `ReportWarning` instead. If we get around to adding the prompt instead of the warning, those changes will be simpler to review. But even if we don't end up replacing the warning with a prompt, moving away from output parameters and towards more structured error reporting is a nice-to-have (e.g., the `warning` prefix is now colored, IDEs have more flexibility on how to present the warning, etc.). For a command-line user nothing should change with this patch (apart from `warning:` being highlighted). [PowerPC] Use lxvp/stxvp for mcpu=future v256i1 types (#184447) For `-mcpu=future`, add patterns to use paired vector instructions (lxvp/lxvpx/stxvp/stxvpx) for v256i1 operations instead of splitting into two separate vector operations. Assistend by AI. [VPlan] Simplify&clarify skipping VPValues in calculateRegisterUse (NFC) Split off as suggested in https://github.com/llvm/llvm-project/pull/156262/. This refactors the code to clarify comments and code, in preparation for #156262. [OpenMP][AMDGPU] Enable omptest build (#161649) This enables building the omptest library across the AMD buildbots that rely on this CMake cache. [flang][NFC] Converted five tests from old lowering to new lowering (part 32) (#186730) Tests converted from test/Lower/Intrinsics: ibset.f90, ichar.f90, ieee_class.f90, ieee_copy_sign.f90, ieee_is_finite.f90 [SLP]Fix legality checks for bswap-based transformations Fix the checks for the non-power-of-2 base bswaps by checking the power-of-2 of the source type, not the target scalar type. Plus, add cost estimation for zext, if the source type does not match the scalar type. Fixes https://github.com/llvm/llvm-project/pull/184018#issuecomment-4053477562 [VPlan] Check isa<VPRecipeValue> directly, remove unused variable (NFC). [MLIR][Presburger] Add support for Smith normal form (#185328) FPL already has support for computing Hermite normal form for integer matrices. Here we add support to computing Smith normal form. This is a preparation for Barvinok's algorithm. Given a polyhedron $P = \{ x | Ax + b = 0, Cx + d \leq 0 \}$, we must find a particular solution $x_0$ of $Ax + b = 0$ in order to project lower-dimensional polyhedra into full-dimensional ones. This requires the Smith normal form of the integer matrix $A$. The implementation here follows the algorithm in [wikipedia](https://en.wikipedia.org/wiki/Smith_normal_form#Algorithm). AMDGPU/GlobalISel: RegBankLegalize rules for s_barrier/wave_barrier (#186512) [X86] Move getMaskNode to avoid unnecessary forward declarations. (#186815) I've also improved the assertions on the source / bool mask types to catch bad use cases. Cleanup pre-work to allow the i512 codegen to eventually use getMaskNode instead of manual bool mask creations Revert "[SLP]Fix legality checks for bswap-based transformations" This reverts commit 2d4daea3b66469420fc164e76c15558b34e44c75 to fix a buildbot https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flab.llvm.org%2Fbuildbot%2F%23%2Fbuilders%2F164%2Fbuilds%2F19737&data=05%7C02%7C%7C672461616e0d4b66614208de8374a0ff%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C639092734113272365%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=%2B23yMlvZzYt7bB2gM6MmcEwPkIKQogXjcKYIZ%2Bz79zQ%3D&reserved=0 [RISCV] Fold waddau/wsubau to waddu/wsubu when possible (#186635) If the wide input is zero extended and only one narrow input is used, we can fold to waddu/wsubu. [WebAssembly] Support acquire-release atomics in CodeGen (#184900) Set the correct memory ordering for relaxed atomics after ISel. This allows SelectionDAG to keep the simple generic selection for target-independent AtomicLoad nodes, but keeps the ordering immediate correct in the MIR. Notably, the MachineMemOperand still has the original memory ordering and MIR passes would use that rather than the ordering immedate to make their code motion decisions (if we had any for Wasm, which we don't). Revert "[DomTree] Assert non-null block for pre-dom tree" (#186831) Reverts llvm/llvm-project#186790 Breaks buildbots, there are more SLPVectorizer problems. https://lab.llvm.org/buildbot/#/builders/52/builds/15810 [CIR][AArch64] Lower BF16 vduph lane builtins (#185852) Part of #185382. Lower `__builtin_neon_vduph_lane_bf16` and `__builtin_neon_vduph_laneq_bf16` in ClangIR to `cir.vec.extract`, and add dedicated AArch64 Neon BF16 tests. This is my first LLVM PR, so I'd really appreciate any suggestions on the implementation, test structure, or general LLVM contribution style. [flang][parser] Add a feature flag for multiple program units on one line. (#186533) This PR adds a feature flag `MultipleProgramUnitsOnSameLine` that by default allows program units to be terminated by semicolons, and then allow the next program unit to follow on the same line. It also adds some test programs to demonstrate using programming units and showing the portability warning with "-pedantic". [X86] Add test showing failure to fold compress(splat(x),splat(x),mask) -> splat(x) (#186823) Noticed while working on i512 shift expansion - if we end up with repeated splat args, we fail to remove the compress node [libc][math] Refactored atanpif16 to header only (#184316) Fixes #178105 Reapply "[clang][ssaf] Add --ssaf-extract-summaries= and --ssaf-tu-summary-file= options" (#186463) This reverts commit 3548ec95178c00a2895a65b435945ce318396c8e and adapts the code to the new ScalableStaticAnalysisFramework/ directory layout. Re-adds: - `TUSummaryExtractorFrontendAction` and its integration into `ExecuteCompilerInvocation` - `--ssaf-extract-summaries=` and `--ssaf-tu-summary-file=` CLI options - SSAFForceLinker / SSAFBuiltinForceLinker headers and anchor symbols - Diagnostics under -Wscalable-static-analysis-framework - Lit tests for the CLI and unit tests for the frontend action - Changes the Formats to be lowercase - and match their spellings in the file paths. [libc][math] Refactor bf16fma to Header Only (#182572) Fixes #181625 [MIR][NFC] Test verbalising INLINEASM extra-info flags. (#186796) Exposes the bug printing inteldialect. [libc][math] Refactor log_bf16 to Header (#186618) AMDGPU/GlobalISel: RegBankLegalize rules for ds_read_tr* (#186006) AMDGPU/GlobalISel: RegBankLegalize rules for ctlz/cttz_zero_undef (#186546) [X86] known-pow2.ll - add min/max vector test coverage for #182369 (#186841) AMDGPU/GlobalISel: RegBankLegalize rules for s_wait intrinsics (#186254) [InstCombine] Support disjoint or in add-sub reassociation fold (#186827) [lldb] Include stdio.h in synthetic subscript test (#186847) The [lldb-aarch64-windows](https://lab.llvm.org/buildbot/#/builders/141) buildbot failed with: ``` lld-link: error: undefined symbol: printf >>> referenced by main.o:(main) ``` I'm assuming that's because of the use of `__builtin_printf`. In other tests, we use `printf` form `stdio.h` and these build fine, so I added an include and used `printf`. [AMDGPU][GlobalIsel] Add register bank legalization rules for amdgcn_wqm amdgcn_softwqm amdgcn_strict_wqm (#186214) This patch adds register bank legalization rules for amdgcn_wqm amdgcn_softwqm amdgcn_strict_wqm in the AMDGPU GlobalISel pipeline. [flang] Reorder messages wrt line number before diff(actual, expect) (#186812) When messages are attached together, the source locations to which they refer are not necessarily monotonically increasing. For example ``` error: foo.f90:10: There is a problem here # line 10 because: foo.f90:12: This thing is invalid # line 12 (attached) error: foo.f90:11: There is another problem here # line 11 ``` There is no way to represent that in the source file via ERROR annotations, so before running unified_diff "canonicalize" the list of messages into an order that corresponds to the line numbers. --------- Co-authored-by: Michael Kruse <llvm-project@meinersbur.de> [ForceFunctionAttrs] Fix handling of conflicts for more attributes (#186304) Fixes #185277 ForceFunctionAttrs currently only checks the `alwaysinline`/`noinline` conflict when forcing function attributes. This is incomplete, because LLVM verifier rules define additional incompatible function attribute combinations. Extend hasConflictingFnAttr() to reject more conflicting function attributes, including combinations involving `optnone`, `minsize`, `optsize`, and `optdebug`. Also add required companion attributes when forcing function attr…
1 parent 0897d46 commit 94090ce

File tree

1,325 files changed

+45267
-16389
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,325 files changed

+45267
-16389
lines changed

.ci/compute_projects.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@
8181
"clang": {"compiler-rt"},
8282
"clang-tools-extra": {"libc"},
8383
"libc": {"libc"},
84+
"libclc": {"libclc"},
8485
"compiler-rt": {"compiler-rt"},
8586
"flang": {"flang-rt"},
8687
"flang-rt": {"flang-rt"},
@@ -146,6 +147,7 @@
146147
"flang": "check-flang",
147148
"flang-rt": "check-flang-rt",
148149
"libc": "check-libc",
150+
"libclc": "check-libclc",
149151
"lld": "check-lld",
150152
"lldb": "check-lldb",
151153
"mlir": "check-mlir",
@@ -154,7 +156,15 @@
154156
"lit": "check-lit",
155157
}
156158

157-
RUNTIMES = {"libcxx", "libcxxabi", "libunwind", "compiler-rt", "libc", "flang-rt"}
159+
RUNTIMES = {
160+
"libcxx",
161+
"libcxxabi",
162+
"libunwind",
163+
"compiler-rt",
164+
"libc",
165+
"flang-rt",
166+
"libclc",
167+
}
158168

159169
# Meta projects are projects that need explicit handling but do not reside
160170
# in their own top level folder. To add a meta project, the start of the path

.ci/compute_projects_test.py

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -259,6 +259,16 @@ def test_include_libc_in_runtimes(self):
259259
self.assertEqual(env_variables["runtimes_check_targets"], "check-libc")
260260
self.assertEqual(env_variables["runtimes_check_targets_needs_reconfig"], "")
261261

262+
def test_include_libclc_in_runtimes(self):
263+
env_variables = compute_projects.get_env_variables(
264+
["libclc/CMakeLists.txt"], "Linux"
265+
)
266+
self.assertEqual(env_variables["projects_to_build"], "clang;llvm")
267+
self.assertEqual(env_variables["project_check_targets"], "")
268+
self.assertEqual(env_variables["runtimes_to_build"], "libclc")
269+
self.assertEqual(env_variables["runtimes_check_targets"], "check-libclc")
270+
self.assertEqual(env_variables["runtimes_check_targets_needs_reconfig"], "")
271+
262272
def test_exclude_docs(self):
263273
env_variables = compute_projects.get_env_variables(
264274
["llvm/docs/CIBestPractices.rst"], "Linux"
@@ -297,7 +307,7 @@ def test_ci(self):
297307
)
298308
self.assertEqual(
299309
env_variables["runtimes_check_targets"],
300-
"check-compiler-rt check-flang-rt check-libc",
310+
"check-compiler-rt check-flang-rt check-libc check-libclc",
301311
)
302312
self.assertEqual(
303313
env_variables["runtimes_check_targets_needs_reconfig"],
@@ -322,7 +332,7 @@ def test_windows_ci(self):
322332
)
323333
self.assertEqual(
324334
env_variables["runtimes_check_targets"],
325-
"check-compiler-rt",
335+
"check-compiler-rt check-libclc",
326336
)
327337
self.assertEqual(
328338
env_variables["runtimes_check_targets_needs_reconfig"],
@@ -371,7 +381,7 @@ def test_premerge_workflow(self):
371381
)
372382
self.assertEqual(
373383
env_variables["runtimes_check_targets"],
374-
"check-compiler-rt check-flang-rt check-libc",
384+
"check-compiler-rt check-flang-rt check-libc check-libclc",
375385
)
376386
self.assertEqual(
377387
env_variables["runtimes_check_targets_needs_reconfig"],
@@ -406,7 +416,7 @@ def test_third_party_benchmark(self):
406416
)
407417
self.assertEqual(
408418
env_variables["runtimes_check_targets"],
409-
"check-compiler-rt check-flang-rt check-libc",
419+
"check-compiler-rt check-flang-rt check-libc check-libclc",
410420
)
411421
self.assertEqual(
412422
env_variables["runtimes_check_targets_needs_reconfig"],

.ci/monolithic-linux.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ cmake -S "${MONOREPO_ROOT}"/llvm -B "${BUILD_DIR}" \
5656
-D CMAKE_CXX_COMPILER_LAUNCHER=sccache \
5757
-D CMAKE_DISABLE_PRECOMPILE_HEADERS=ON \
5858
-D LIBCXX_CXX_ABI=libcxxabi \
59+
-D LIBCLC_TARGETS_TO_BUILD="amdgcn-amd-amdhsa-llvm" \
5960
-D MLIR_ENABLE_BINDINGS_PYTHON=ON \
6061
-D LLDB_ENABLE_PYTHON=ON \
6162
-D LLDB_ENFORCE_STRICT_TEST_REQUIREMENTS=ON \

.ci/monolithic-windows.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,11 @@ cmake -S "${MONOREPO_ROOT}"/llvm -B "${BUILD_DIR}" \
4545
-D CMAKE_CXX_COMPILER_LAUNCHER=sccache \
4646
-D CMAKE_DISABLE_PRECOMPILE_HEADERS=ON \
4747
-D MLIR_ENABLE_BINDINGS_PYTHON=ON \
48+
-D LIBCLC_TARGETS_TO_BUILD="amdgcn-amd-amdhsa-llvm" \
4849
-D CMAKE_EXE_LINKER_FLAGS="/MANIFEST:NO" \
4950
-D CMAKE_MODULE_LINKER_FLAGS="/MANIFEST:NO" \
5051
-D CMAKE_SHARED_LINKER_FLAGS="/MANIFEST:NO" \
51-
-D LLVM_ENABLE_RUNTIMES="${runtimes}" \
52-
-D LLVM_PARALLEL_LINK_JOBS=16
52+
-D LLVM_ENABLE_RUNTIMES="${runtimes}"
5353

5454
start-group "ninja"
5555

.github/actions/build-container/action.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ runs:
7878
echo "container-full-name=$container_name" >> $GITHUB_OUTPUT
7979
8080
- name: Create container artifact
81-
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
81+
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
8282
with:
8383
name: ${{ inputs.container-name }}-${{ runner.arch }}
8484
path: "*.tar"

.github/actions/push-container/action.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ runs:
1212
using: "composite"
1313
steps:
1414
- name: Download container
15-
uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7.0.0
15+
uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
1616

1717
- name: Push Container
1818
env:

.github/workflows/build-ci-container-windows.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ jobs:
4444
run: |
4545
docker save ${{ steps.vars.outputs.container-name-tag }} > ${{ steps.vars.outputs.container-filename }}
4646
- name: Upload container image
47-
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
47+
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
4848
with:
4949
name: container
5050
path: ${{ steps.vars.outputs.container-filename }}
@@ -61,7 +61,7 @@ jobs:
6161
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
6262
steps:
6363
- name: Download container
64-
uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7.0.0
64+
uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
6565
with:
6666
name: container
6767
- name: Push Container

.github/workflows/ci-post-commit-analyzer.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ jobs:
8787
scan-build --generate-index-only build/analyzer-results
8888
8989
- name: Upload Results
90-
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
90+
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
9191
if: always()
9292
with:
9393
name: analyzer-results

.github/workflows/commit-access-review.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ jobs:
3939
python3 .github/workflows/commit-access-review.py $GITHUB_TOKEN
4040
4141
- name: Upload Triage List
42-
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
42+
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
4343
with:
4444
name: triagers
4545
path: triagers.log

.github/workflows/containers/libc/Dockerfile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,11 @@ RUN apt-get update && \
1919
apt-get clean && \
2020
rm -rf /var/lib/apt/lists/*
2121

22+
# TODO(boomanaiden154): Remove the LLVM 21 installation once we are no longer
23+
# using it in the libc fullbuild tests workflow.
2224
RUN wget https://apt.llvm.org/llvm.sh && \
2325
chmod +x llvm.sh && \
26+
sudo ./llvm.sh 23 && \
2427
sudo ./llvm.sh 21 && \
2528
rm llvm.sh
2629

0 commit comments

Comments
 (0)