[AutoBump] Merge with 46f9cddf (Jan 29) (2) by jorickert · Pull Request #884 · Xilinx/llvm-aie

jorickert · 2026-04-01T06:29:31Z

No description provided.

…24877)

The import of LLVM IR should use is isDSOLocal instead of hasLocalLinkage to set the dso_local attribute. Without this change, function definitions that mostly have external linkage would be missing dso_local attribute during translation. --------- Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>

`at` has an assert that the key exists. Since we are assuming the key exists, use `at` instead of `lookup`.

…Map access" This reverts commit 3ce97e4. Pushed to main prematurley.

This case is intended to check the callee argument, not the call-site. Fixes an issue introduced in #123181.

This patch simplifies the implementation of `__construct_at_end` in `vector<bool>`, which currently contains duplicate initialization logic across its two overloads.

…opriate size select based on results (#124604) This PR aims to fix a mapping error when trying to map nullary elements of a record type (primary example is allocatables/pointer types in Fortran at the moment). This should be legal to map, just not write to without pointing to anything within the target region. A common Fortran OpenMP idiom/example where this is useful can be found in the added Fortran offload example. The runtime error arises when we try to map the pointer member utilising a prescribed constant size that we receive from the lowered type, resulting in mapping of data that will be non-existent when there is no allocated data. The fix in this case is to emit a runtime check to see if the data has been allocated, if it hasn't been we select a size of 0, if it has we emit the usual type size.

…nches. (#124028) - **[NFC] Use GCNPat instead of Pat.** - **[AMDGPU] Always emit SI_KILL_I1_PSEUDO for uniform floating point branches.** --------- Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou@amd.com>

Add v16i16 coverage and "reverse order hadd/hsub" tests

A new op that allows for representing arbitrary contractions on operands of arbitrary rank, with arbitrary transposes and arbitrary broadcasts specified through its indexing_maps attribute. Supports the expected lowerings to linalg.generic and to vector.contract. Corresponding RFC is here: https://discourse.llvm.org/t/mlir-rfc-introduce-linalg-contract/83589

…us checks (#122957)

This patch implements an LLVM IR pass, named kernel-info, that reports various statistics for codes compiled for GPUs. The ultimate goal of these statistics to help identify bad code patterns and ways to mitigate them. The pass operates at the LLVM IR level so that it can, in theory, support any LLVM-based compiler for programming languages supporting GPUs. It has been tested so far with LLVM IR generated by Clang for OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs. By default, the pass runs at the end of LTO, and options like ``-Rpass=kernel-info`` enable its remarks. Example `opt` and `clang` command lines appear in `llvm/docs/KernelInfo.rst`. Remarks include summary statistics (e.g., total size of static allocas) and individual occurrences (e.g., source location of each alloca). Examples of its output appear in tests in `llvm/test/Analysis/KernelInfo`.

…889)

… NFC (#124658) Use the Flags argument to add FrameSetup directly instead of walking backwards to add the flag after the call.

…ve (#122900) A trait poperty can be one of serveral alternatives (name, expression, etc.), and each property in a list was parsed as if it could be any of these alternatives independently from other properties. This made the parsing vulnerable to certain ambiguities in the trait grammar (provided in the OpenMP spec). At the same time the OpenMP spec gives the expected types of properties for almost every trait: all properties listed for a given trait are usually of the same type, e.g. names, clauses, etc. Incorporate these restrictions into the parser, and additionally use property extensions as the fallback if the parsing of the expected property type failed. This is intended to allow the parser to succeed, and instead let the semantic-checking code emit a more user-friendly message.

We can take advantage of the fact that we subsequently only clone cold allocation contexts, since not cold behavior is the default, and significantly reduce the amount of metadata (and later ThinLTO summary and MemProfContextDisambiguation graph nodes) by pruning unnecessary not cold contexts when building metadata from the trie. Specifically, we only need to keep notcold contexts that overlap the longest with cold allocations, to know how deeply to clone those contexts to expose the cold allocation behavior. For a large target this reduced ThinLTO bitcode object sizes by about 35%. It reduced the ThinLTO indexing time by about half and the peak ThinLTO indexing memory by about 20%.

…24948) VectorCombine Since the transformation which is the subject of the 'fold-binop-of-reductions.ll` test will be in VectorCombine move the test there.

@mshockwave

Added an extension point after vectorizer passes in the PassBuilder. Additionally, added extension points before and after vectorizer passes in `buildLTODefaultPipeline`. Credit goes to @mshockwave for guiding me through my first LLVM contribution (and my first open source contribution in general!) :) - Implemented `registerVectorizerEndEPCallback` - Implemented `invokeVectorizerEndEPCallbacks` - Added `VectorizerEndEPCallbacks` SmallVector - Added a command line option `passes-ep-vectorizer-end` to `NewPMDriver.cpp` - `buildModuleOptimizationPipeline` now calls `invokeVectorizerEndEPCallbacks` - `buildO0DefaultPipeline` now calls `invokeVectorizerEndEPCallbacks` - `buildLTODefaultPipeline` now calls BOTH `invokeVectorizerStartEPCallbacks` and `invokeVectorizerEndEPCallbacks` - Added LIT tests to `new-pm-defaults.ll`, `new-pm-lto-defaults.ll`, `new-pm-O0-ep-callbacks.ll`, and `pass-pipeline-parsing.ll` - Renamed `CHECK-EP-Peephole` to `CHECK-EP-PEEPHOLE` in `new-pm-lto-defaults.ll` for consistency. This code is intended for developers that wish to implement and run custom passes after the vectorizer passes in the PassBuilder pipeline. For example, in #91796, a pass was created that changed the induction variables of vectorized code. This is right after the vectorization passes.

…. (#111894) This flag has been deprecated since Clang 19, having been the default since then. It has remained because its negation was still useful to work around backwards compatibility breaking changes from P0522. However, in Clang 20 we have landed various changes which implemented P3310R2 and beyond, which solve almost all of the expected issues, the known remaining few being a bit obscure. So this change removes the flag completely and all of its implementation and support code. Hopefully any remaining users can just stop using the flag. If there are still important issues remaining, this removal will also shake the tree and help us know.

…length. (#124827) The length might be any integer, so hlfir.get_length lowering should explicitly cast it to `index`.

…124867) An hlfir.elemental with a shape `(0, HUGE)` still runs `HUGE` number of iterations when expanded into a loop nest. HLFIR transformational operations inlined as hlfir.elemental may execute slower comparing to Fortran runtime implementation. This patch adds an option for BufferizeHLFIR pass to reset all upper bounds in the elemental loop nests to zero, if the result is an empty array. A separate patch will enable this option in the driver after I do more performance testing. The option is off by default now.

…801)

…PC_INSTALL_PATH is set. (#124810)

This fixes instantiation of definition for friend function templates, when the declaration found and the one containing the definition have different template contexts. In these cases, the the function declaration corresponding to the definition is not available; it may not even be instantiated at all. So this patch adds a bit which tracks which function template declaration was instantiated from the member template. It's used to find which primary template serves as a context for the purpose of obtainining the template arguments needed to instantiate the definition. Fixes #55509 Relanding patch, with no changes, after it was reverted due to revert of commit this patch depended on.

This refactors the standard stream implementation in multiple ways: - The streams are now `stream_data` structs, which contain all the data required for a stream - The windows mangling is generated via a macro instead of having magic strings for the different streams. (i.e. it's now only partially magic)

This is an implementation of P1061 Structure Bindings Introduce a Pack without the ability to use packs outside of templates. There is a couple of ways the AST could have been sliced so let me know what you think. The only part of this change that I am unsure of is the serialization/deserialization stuff. I followed the implementation of other Exprs, but I do not really know how it is tested. Thank you for your time considering this. --------- Co-authored-by: Yanzuo Liu <zwuis@outlook.com>

…VE (#121817) Parse METADIRECTIVE as a standalone executable directive at the moment. This will allow testing the parser code. There is no lowering, not even clause conversion yet. There is also no verification of the allowed values for trait sets, trait properties.

…it (#124965) This patch addresses post commit review comments from #124859. The extra compile definition is not necessary and goes against the effort to separate the runtimes from the flang compiler itself. The function declaration for `CUFInit` can be accessed anyway since the header are always present. The insertion of the call is only based on the language feature options from the folding context. A program compiled with cuda enabled but no cufruntime would just fail at link time as expected.

Describes PLT entries for hexagon.

Modify the DA pretty printer to match the output of other analysis passes. This enables update_analyze_test_checks.py to also work on DA tests. Auto generate all the Dependence Analysis tests.

clementval and others added 30 commits January 29, 2025 08:04

[flang][cuda] Propagate the data attribute on the converted calls (#1…

382d359

…24877)

[mlir][Conversion] Fix typos in MemRef descriptor comments (#124923)

6900768

[ReachingDefAnalysis][NFC] Use at instead of lookup for DenseMap access

3ce97e4

`at` has an assert that the key exists. Since we are assuming the key exists, use `at` instead of `lookup`.

Revert "[ReachingDefAnalysis][NFC] Use at instead of lookup for Dense…

35defdf

…Map access" This reverts commit 3ce97e4. Pushed to main prematurley.

[Attributor] Check correct IRPosition in AANoCapture::isImpliedByIR()

8a43d0e

This case is intended to check the callee argument, not the call-site. Fixes an issue introduced in #123181.

[libc++] Simplify vector<bool>::__construct_at_end (#119632)

67752f6

This patch simplifies the implementation of `__construct_at_end` in `vector<bool>`, which currently contains duplicate initialization logic across its two overloads.

[bazel] Port 25ae1a2

d444558

[PhaseOrdering][X86] Add additional hadd/hsub test coverage

88e0014

Add v16i16 coverage and "reverse order hadd/hsub" tests

[clang-tidy] Refactor: remove typos in 'AllowedTypes' option in vario…

15412d7

…us checks (#122957)

[PowerPC] Use SelectionDAG::makeEquivalentMemoryOrdering(). NFC (#124…

7fff252

…889)

[X86] Use new Flags argument to storeRegToStackSlot to simplify code.…

27e01d1

… NFC (#124658) Use the Flags argument to add FrameSetup directly instead of walking backwards to add the flag after the call.

Fix typo "tranpose" (#124929)

aa29521

Fix MSVC signed/unsigned mismatch warning. NFC.

5dae05f

[gn build] Port 18f8106

bda1976

[KernelInfo] Fix layering violation, Analysis cannot depend on Passes

953354c

[KernelInfo] Remove unused include.

57f1731

[InstCombine][VectorCombine][NFC] Move a test from InstCombine to (#1…

1822462

…24948) VectorCombine Since the transformation which is the subject of the 'fold-binop-of-reductions.ll` test will be in VectorCombine move the test there.

[flang] Allow non-index length parameter on exprs fed into hlfir.get_…

b870875

…length. (#124827) The length might be any integer, so hlfir.get_length lowering should explicitly cast it to `index`.

[ExtractAPI] merge anon declarators even if they're array types (#120…

a368402

…801)

[libc] Update include directory for libcMPCWrapper target when LIBC_M…

bcf306e

…PC_INSTALL_PATH is set. (#124810)

mizvekov and others added 8 commits January 29, 2025 17:23

[Hexagon] Add support for decoding PLT symbols (#123425)

61ea63b

Describes PLT entries for hexagon.

[DA] enable update_analyze_test_checks.py (#123435)

46f9cdd

Modify the DA pretty printer to match the output of other analysis passes. This enables update_analyze_test_checks.py to also work on DA tests. Auto generate all the Dependence Analysis tests.

[AutoBump] Merge with 46f9cdd (Jan 29)

7833768

jorickert requested review from F-Stuckmann, SagarMaheshwari99, abhinay-anubola, abnikant, andcarminati, ehsan-toosi, katerynamuts, khallouh, konstantinschwarz, ljfitz, martien-de-jong, mludevid, niwinanto, roberteg16, stephenneuendorffer and ttjost as code owners April 1, 2026 06:29

jorickert enabled auto-merge April 1, 2026 06:29

konstantinschwarz approved these changes Apr 1, 2026

View reviewed changes

ehsan-toosi approved these changes Apr 2, 2026

View reviewed changes

jorickert disabled auto-merge April 2, 2026 07:24

konstantinschwarz merged commit 7a949d3 into aie-public Apr 2, 2026
15 of 16 checks passed

konstantinschwarz deleted the bump_to_46f9cddf branch April 2, 2026 16:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 46f9cddf (Jan 29) (2)#884

[AutoBump] Merge with 46f9cddf (Jan 29) (2)#884
konstantinschwarz merged 38 commits intoaie-publicfrom
bump_to_46f9cddf

jorickert commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

jorickert commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants