[AutoBump] Merge with 46f9cddf (Jan 29) (29) #567

jorickert · 2025-05-28T13:17:18Z

No description provided.

This makes some other problems show up like the fact that we didn't suppress diagnostics during __builtin_constant_p evaluation.

This patch implements support for the UNROLL directive to control how many times a loop should be unrolled. It must be placed immediately before a `DO LOOP` and applies only to the loop that follows. N is an integer that specifying the unrolling factor. This is done by adding an attribute to the branch into the loop in LLVM to indicate that the loop should unrolled. The code pushed to support the directive `VECTOR ALWAYS` has been modified to take account of the fact that several directives can be used before a `DO LOOP`.

This intrinsic is a gnu extension (https://gcc.gnu.org/onlinedocs/gfortran/CHDIR.html) and is used in FLEUR (https://github.com/JuDFTteam/FLEUR).

…rary on ARM64X (llvm#124833)

) This commit restricts the use of scalar types in vector math builtins, particularly the `__builtin_elementwise_*` builtins. Previously, small scalar integer types would be promoted to `int`, as per the usual conversions. This would silently do the wrong thing for certain operations, such as `add_sat`, `popcount`, `bitreverse`, and others. Similarly, since unsigned integer types were promoted to `int`, something like `add_sat(unsigned char, unsigned char)` would perform a *signed* operation. With this patch, promotable scalar integer types are not promoted to int, and are kept intact. If any of the types differ in the binary and ternary builtins, an error is issued. Similarly an error is issued if builtins are supplied integer types of different signs. Mixing enums of different types in binary/ternary builtins now consistently raises an error in all language modes. This brings the behaviour surrounding scalar types more in line with that of vector types. No change is made to vector types, which are both not promoted and whose element types must match. Fixes llvm#84047. RFC: https://discourse.llvm.org/t/rfc-change-behaviour-of-elementwise-builtins-on-scalar-integer-types/83725

) This resolves the same issue addressed in llvm#124286, but for invoke operations. The issue arose from duplicated logic for both imports. This PR also refactors the common import code for call and invoke instructions to mitigate issues in the future.

…m#124900)

As decided on https://discourse.llvm.org/t/rfc-lets-document-and-enforce-a-minimum-python-version-for-lldb/82731. LLDB 20 recommended `>= 3.8` but did not remove support for anything earlier. Now we are in what will become LLDB 21, so I'm removing that support and making `>= 3.8` required. See https://docs.python.org/3/c-api/apiabiversion.html#c.PY_VERSION_HEX for the format of PY_VERSION_HEX.

Was flagged in llvm#124735 but done separately so it didn't get in the way of that.

This patch moves up the checks that verify if it is legal to replace the atomic load/store with memcpy. Currently these checks are done after we determine to convert the load/store to memcpy/memmove, which makes the logic a bit confusing. This patch is a prelude to llvm#50892

…subdirectory (llvm#124744) I left these alone in llvm#124463 but I think it makes sense to clean these up as well (which Philip also noted in llvm#124464).

…vm#124789) These were based off instruction count, not throughput - we can probably improve these further, but these throughput numbers match the worse expanded shuffles we see in the vector-shuffle-128-v* codegen tests.

LoopInterchange have converted `DVEntry::LE` and `DVEntry::GE` in direction vectors to '<' and '>' respectively. This handling is incorrect because the information about the '=' it lost. This leads to miscompilation in some cases. To resolve this issue, convert them to '*' instead. Resolve llvm#123920

…123675) The PR addresses issues with the filters of 1 x r and of r x 1 and with the tiling. --------- Signed-off-by: Dmitriy Smirnov <[email protected]>

Thread-local code generation requires constant pools because most of the relocations needed for it operate on data, so it cannot be used with -mexecute-only (or -mpure-code, which is aliased in the driver). Without this we hit an assertion in the backend when trying to generate a constant pool.

…124775) When using PAuthLR, the PAUTH_PROLOGUE expands into a sequence of instructions which takes the address of one of those instructions, and uses that address to compute the return address signature. If this is duplicated, there will be two different addresses used in calculating the signature, so the epilogue will only be correct for (at most) one of them. This change also restricts code generation when using v8.3-A return address signing, without PAuthLR. This isn't strictly needed, as duplicating the prologue there would be valid. We could fix this by having two copies of PAUTH_PROLOGUE, with and without isNotDuplicable, but I don't think it's worth adding the extra complexity to a security feature for that.

…#123640) Add new runPass helpers to run a VPlan transformation. This makes it easier to add additional checks/functionality for each transform run. In this patch, an option is added to run the verifier after each VPlan transform. Follow-ups will use the same helper to also support printing VPlans after each transform. Note that the verifier at the moment requires there to be a canonical IV and vector loop region, so the final lowering transforms aren't run via runPass yet. PR: llvm#123640

Add the implementation of the IERRNO intrinsic to get the last system error number, as given by the C errno variable. This intrinsic is also used in RAMSES (https://github.com/ramses-organisation/ramses/).

Using the `__builtin_elementwise_(add|sub)_sat` functions allows us to directly optimize to the desired intrinsic, and avoid scalarization for vector types.

Adds AVX512 bf16 dot-product operation and defines lowering to LLVM intrinsics. AVX512 intrinsic operation definition is extended with an optional extension field that allows specifying necessary LLVM mnemonic suffix e.g., `"bf16"` for `x86_avx512bf16_` intrinsics.

…egister shuffles Patch adds usage of processShuffleMasks in TTI for RISCV. This function is already used for X86 shuffles estimations and in DAGTypeLegalizer::SplitVecRes_VECTOR_SHUFFLE functions and in RISCV codegen. Patch allows better cost estimation for sparse masks and unifies cost/codegen between different targets/passes Reviewers: preames Reviewed By: preames Pull Request: llvm#118103

…4770) The tablgen definition `TypedStrAttr` is an attribute constraints that is meant to restrict the type of a `StringAttr` to the type given as parameter. However, the definition did not previously restrict the type; any `StringAttr` was accepted. This PR makes the definition actually enforce the type. To test the constraints, the PR also changes the test op that was previously used to test this constraint such that the enforced type is `AnyInteger` instead of `AnyType`. The latter allowed any type, so not enforcing that constraint had no observable effect. The PR then adds a test case with a wrong type and ensures that diagnostics are produced. Signed-off-by: Ingo Müller <[email protected]>

Infer mnemonics from the names of the records.

…#124799) After we fall back from GlobalISel to SDAG, the verifier gets called, which calls getReservedRegs which uses SIMachineFunctionInfo::usesAGPRs which caches the result of UsesAGPRs. Because we have just fallen-back the function is empty and it incorrectly gets cached to false. This patch makes sure we don't try to run the verifier whilst the function is empty.

Add support for the `-fopenmp-version=60` command line argument. It is needed for llvm#119891 (`#pragma omp stripe`) which will be the first OpenMP 6.0 directive implemented. Add regression tests for Clang in `-fopenmp-version=60` mode.

…#124616)

Requires x86 target for the lit test to ensure required instructions are available.

This enables more projects in the CMake cache to add them to the buildbot coverage in the AMDGPU buildbots.

…nches. (llvm#124028) - **[NFC] Use GCNPat instead of Pat.** - **[AMDGPU] Always emit SI_KILL_I1_PSEUDO for uniform floating point branches.** --------- Co-authored-by: Konstantina Mitropoulou <[email protected]>

Add v16i16 coverage and "reverse order hadd/hsub" tests

A new op that allows for representing arbitrary contractions on operands of arbitrary rank, with arbitrary transposes and arbitrary broadcasts specified through its indexing_maps attribute. Supports the expected lowerings to linalg.generic and to vector.contract. Corresponding RFC is here: https://discourse.llvm.org/t/mlir-rfc-introduce-linalg-contract/83589

…us checks (llvm#122957)

…02944) This patch implements an LLVM IR pass, named kernel-info, that reports various statistics for codes compiled for GPUs. The ultimate goal of these statistics to help identify bad code patterns and ways to mitigate them. The pass operates at the LLVM IR level so that it can, in theory, support any LLVM-based compiler for programming languages supporting GPUs. It has been tested so far with LLVM IR generated by Clang for OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs. By default, the pass runs at the end of LTO, and options like ``-Rpass=kernel-info`` enable its remarks. Example `opt` and `clang` command lines appear in `llvm/docs/KernelInfo.rst`. Remarks include summary statistics (e.g., total size of static allocas) and individual occurrences (e.g., source location of each alloca). Examples of its output appear in tests in `llvm/test/Analysis/KernelInfo`.

…#124889)

… NFC (llvm#124658) Use the Flags argument to add FrameSetup directly instead of walking backwards to add the flag after the call.

…ve (llvm#122900) A trait poperty can be one of serveral alternatives (name, expression, etc.), and each property in a list was parsed as if it could be any of these alternatives independently from other properties. This made the parsing vulnerable to certain ambiguities in the trait grammar (provided in the OpenMP spec). At the same time the OpenMP spec gives the expected types of properties for almost every trait: all properties listed for a given trait are usually of the same type, e.g. names, clauses, etc. Incorporate these restrictions into the parser, and additionally use property extensions as the fallback if the parsing of the expected property type failed. This is intended to allow the parser to succeed, and instead let the semantic-checking code emit a more user-friendly message.

We can take advantage of the fact that we subsequently only clone cold allocation contexts, since not cold behavior is the default, and significantly reduce the amount of metadata (and later ThinLTO summary and MemProfContextDisambiguation graph nodes) by pruning unnecessary not cold contexts when building metadata from the trie. Specifically, we only need to keep notcold contexts that overlap the longest with cold allocations, to know how deeply to clone those contexts to expose the cold allocation behavior. For a large target this reduced ThinLTO bitcode object sizes by about 35%. It reduced the ThinLTO indexing time by about half and the peak ThinLTO indexing memory by about 20%.

…vm#124948) VectorCombine Since the transformation which is the subject of the 'fold-binop-of-reductions.ll` test will be in VectorCombine move the test there.

@mshockwave

Added an extension point after vectorizer passes in the PassBuilder. Additionally, added extension points before and after vectorizer passes in `buildLTODefaultPipeline`. Credit goes to @mshockwave for guiding me through my first LLVM contribution (and my first open source contribution in general!) :) - Implemented `registerVectorizerEndEPCallback` - Implemented `invokeVectorizerEndEPCallbacks` - Added `VectorizerEndEPCallbacks` SmallVector - Added a command line option `passes-ep-vectorizer-end` to `NewPMDriver.cpp` - `buildModuleOptimizationPipeline` now calls `invokeVectorizerEndEPCallbacks` - `buildO0DefaultPipeline` now calls `invokeVectorizerEndEPCallbacks` - `buildLTODefaultPipeline` now calls BOTH `invokeVectorizerStartEPCallbacks` and `invokeVectorizerEndEPCallbacks` - Added LIT tests to `new-pm-defaults.ll`, `new-pm-lto-defaults.ll`, `new-pm-O0-ep-callbacks.ll`, and `pass-pipeline-parsing.ll` - Renamed `CHECK-EP-Peephole` to `CHECK-EP-PEEPHOLE` in `new-pm-lto-defaults.ll` for consistency. This code is intended for developers that wish to implement and run custom passes after the vectorizer passes in the PassBuilder pipeline. For example, in llvm#91796, a pass was created that changed the induction variables of vectorized code. This is right after the vectorization passes.

llvm#111894) This flag has been deprecated since Clang 19, having been the default since then. It has remained because its negation was still useful to work around backwards compatibility breaking changes from P0522. However, in Clang 20 we have landed various changes which implemented P3310R2 and beyond, which solve almost all of the expected issues, the known remaining few being a bit obscure. So this change removes the flag completely and all of its implementation and support code. Hopefully any remaining users can just stop using the flag. If there are still important issues remaining, this removal will also shake the tree and help us know.

…length. (llvm#124827) The length might be any integer, so hlfir.get_length lowering should explicitly cast it to `index`.

…lvm#124867) An hlfir.elemental with a shape `(0, HUGE)` still runs `HUGE` number of iterations when expanded into a loop nest. HLFIR transformational operations inlined as hlfir.elemental may execute slower comparing to Fortran runtime implementation. This patch adds an option for BufferizeHLFIR pass to reset all upper bounds in the elemental loop nests to zero, if the result is an empty array. A separate patch will enable this option in the driver after I do more performance testing. The option is off by default now.

…#120801)

…PC_INSTALL_PATH is set. (llvm#124810)

…12241) This fixes instantiation of definition for friend function templates, when the declaration found and the one containing the definition have different template contexts. In these cases, the the function declaration corresponding to the definition is not available; it may not even be instantiated at all. So this patch adds a bit which tracks which function template declaration was instantiated from the member template. It's used to find which primary template serves as a context for the purpose of obtainining the template arguments needed to instantiate the definition. Fixes llvm#55509 Relanding patch, with no changes, after it was reverted due to revert of commit this patch depended on.

This refactors the standard stream implementation in multiple ways: - The streams are now `stream_data` structs, which contain all the data required for a stream - The windows mangling is generated via a macro instead of having magic strings for the different streams. (i.e. it's now only partially magic)

This is an implementation of P1061 Structure Bindings Introduce a Pack without the ability to use packs outside of templates. There is a couple of ways the AST could have been sliced so let me know what you think. The only part of this change that I am unsure of is the serialization/deserialization stuff. I followed the implementation of other Exprs, but I do not really know how it is tested. Thank you for your time considering this. --------- Co-authored-by: Yanzuo Liu <[email protected]>

…VE (llvm#121817) Parse METADIRECTIVE as a standalone executable directive at the moment. This will allow testing the parser code. There is no lowering, not even clause conversion yet. There is also no verification of the allowed values for trait sets, trait properties.

…it (llvm#124965) This patch addresses post commit review comments from llvm#124859. The extra compile definition is not necessary and goes against the effort to separate the runtimes from the flang compiler itself. The function declaration for `CUFInit` can be accessed anyway since the header are always present. The insertion of the call is only based on the language feature options from the folding context. A program compiled with cuda enabled but no cufruntime would just fail at link time as expected.

Describes PLT entries for hexagon.

Modify the DA pretty printer to match the output of other analysis passes. This enables update_analyze_test_checks.py to also work on DA tests. Auto generate all the Dependence Analysis tests.

brad0 and others added 30 commits January 29, 2025 03:21

[lldb] Remove PATH workaround for Android (llvm#124682)

9326633

[clang][bytecode] Fix dummy handling for p2280r4 (llvm#124396)

51c7338

This makes some other problems show up like the fact that we didn't suppress diagnostics during __builtin_constant_p evaluation.

[flang] Implement CHDIR intrinsic (llvm#124280)

5a34e6f

This intrinsic is a gnu extension (https://gcc.gnu.org/onlinedocs/gfortran/CHDIR.html) and is used in FLEUR (https://github.com/JuDFTteam/FLEUR).

[LLD][COFF] Write both native and EC export symbols to the import lib…

e902cf2

…rary on ARM64X (llvm#124833)

[LoopVectorize][NFC] Regenerate some early exit test CHECK lines (llv…

776ef9d

…m#124900)

[lldb][NFC] Format part of ScriptInterpreterPython.cpp

db567ea

Was flagged in llvm#124735 but done separately so it didn't get in the way of that.

[MCJIT][test] Move remaining MCJIT interpreter tests to Interpreter/ …

e80d934

…subdirectory (llvm#124744) I left these alone in llvm#124463 but I think it makes sense to clean these up as well (which Philip also noted in llvm#124464).

[MLIR][Linalg] Fixes for Winograd decomposition and for tiling (llvm#…

f20b8e3

…123675) The PR addresses issues with the filters of 1 x r and of r x 1 and with the tiling. --------- Signed-off-by: Dmitriy Smirnov <[email protected]>

[flang] Implement IERRNO intrinsic (llvm#124281)

ecc71de

Add the implementation of the IERRNO intrinsic to get the last system error number, as given by the C errno variable. This intrinsic is also used in RAMSES (https://github.com/ramses-organisation/ramses/).

[libclc] Move (add|sub)_sat to CLC; optimize (llvm#124903)

12cdf43

Using the `__builtin_elementwise_(add|sub)_sat` functions allows us to directly optimize to the desired intrinsic, and avoid scalarization for vector types.

[X86] vector-idiv-sdiv-512.ll - regenerate VPTERNLOG comments

9534d27

[AMDGPU][NFC] Simplify t16/fake16 TableGen definitions. (llvm#122693)

983562d

Infer mnemonics from the names of the records.

[OpenMP] Allow OMP6.0 features. (llvm#122108)

978e083

Add support for the `-fopenmp-version=60` command line argument. It is needed for llvm#119891 (`#pragma omp stripe`) which will be the first OpenMP 6.0 directive implemented. Add regression tests for Clang in `-fopenmp-version=60` mode.

[LLVM][AMDGPU] Add Intrinsic and Builtin for ds_bpermute_fi_b32 (llvm…

3a29dfe

…#124616)

[mlir][x86vector] Restrict BF16 dot test to x86 (llvm#124916)

008e162

Requires x86 target for the lit test to ensure required instructions are available.

[Offload] Enable mlir and flang in bot build (llvm#124915)

d412fe5

This enables more projects in the CMake cache to add them to the buildbot coverage in the AMDGPU buildbots.

kmitropoulou and others added 30 commits January 29, 2025 09:00

[bazel] Port 25ae1a2

d444558

[PhaseOrdering][X86] Add additional hadd/hsub test coverage

88e0014

Add v16i16 coverage and "reverse order hadd/hsub" tests

[clang-tidy] Refactor: remove typos in 'AllowedTypes' option in vario…

15412d7

…us checks (llvm#122957)

[PowerPC] Use SelectionDAG::makeEquivalentMemoryOrdering(). NFC (llvm…

7fff252

…#124889)

[X86] Use new Flags argument to storeRegToStackSlot to simplify code.…

27e01d1

… NFC (llvm#124658) Use the Flags argument to add FrameSetup directly instead of walking backwards to add the flag after the call.

Fix typo "tranpose" (llvm#124929)

aa29521

Fix MSVC signed/unsigned mismatch warning. NFC.

5dae05f

[gn build] Port 18f8106

bda1976

[KernelInfo] Fix layering violation, Analysis cannot depend on Passes

953354c

[KernelInfo] Remove unused include.

57f1731

[InstCombine][VectorCombine][NFC] Move a test from InstCombine to (ll…

1822462

…vm#124948) VectorCombine Since the transformation which is the subject of the 'fold-binop-of-reductions.ll` test will be in VectorCombine move the test there.

[flang] Allow non-index length parameter on exprs fed into hlfir.get_…

b870875

…length. (llvm#124827) The length might be any integer, so hlfir.get_length lowering should explicitly cast it to `index`.

[ExtractAPI] merge anon declarators even if they're array types (llvm…

a368402

…#120801)

[libc] Update include directory for libcMPCWrapper target when LIBC_M…

bcf306e

…PC_INSTALL_PATH is set. (llvm#124810)

[Hexagon] Add support for decoding PLT symbols (llvm#123425)

61ea63b

Describes PLT entries for hexagon.

[DA] enable update_analyze_test_checks.py (llvm#123435)

46f9cdd

Modify the DA pretty printer to match the output of other analysis passes. This enables update_analyze_test_checks.py to also work on DA tests. Auto generate all the Dependence Analysis tests.

[AutoBump] Merge with 46f9cdd (Jan 29)

05e7f0e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoBump] Merge with 46f9cddf (Jan 29) (29) #567

[AutoBump] Merge with 46f9cddf (Jan 29) (29) #567

Uh oh!

jorickert commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

[AutoBump] Merge with 46f9cddf (Jan 29) (29) #567

Are you sure you want to change the base?

[AutoBump] Merge with 46f9cddf (Jan 29) (29) #567

Uh oh!

Conversation

jorickert commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants