[AutoBump] Merge with 46f9cddf (Jan 29) (2)#884
Merged
konstantinschwarz merged 38 commits intoaie-publicfrom Apr 2, 2026
Merged
[AutoBump] Merge with 46f9cddf (Jan 29) (2)#884konstantinschwarz merged 38 commits intoaie-publicfrom
konstantinschwarz merged 38 commits intoaie-publicfrom
Conversation
The import of LLVM IR should use is isDSOLocal instead of hasLocalLinkage to set the dso_local attribute. Without this change, function definitions that mostly have external linkage would be missing dso_local attribute during translation. --------- Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
`at` has an assert that the key exists. Since we are assuming the key exists, use `at` instead of `lookup`.
…Map access" This reverts commit 3ce97e4. Pushed to main prematurley.
This case is intended to check the callee argument, not the call-site. Fixes an issue introduced in #123181.
This patch simplifies the implementation of `__construct_at_end` in `vector<bool>`, which currently contains duplicate initialization logic across its two overloads.
…opriate size select based on results (#124604) This PR aims to fix a mapping error when trying to map nullary elements of a record type (primary example is allocatables/pointer types in Fortran at the moment). This should be legal to map, just not write to without pointing to anything within the target region. A common Fortran OpenMP idiom/example where this is useful can be found in the added Fortran offload example. The runtime error arises when we try to map the pointer member utilising a prescribed constant size that we receive from the lowered type, resulting in mapping of data that will be non-existent when there is no allocated data. The fix in this case is to emit a runtime check to see if the data has been allocated, if it hasn't been we select a size of 0, if it has we emit the usual type size.
…nches. (#124028) - **[NFC] Use GCNPat instead of Pat.** - **[AMDGPU] Always emit SI_KILL_I1_PSEUDO for uniform floating point branches.** --------- Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou@amd.com>
Add v16i16 coverage and "reverse order hadd/hsub" tests
A new op that allows for representing arbitrary contractions on operands of arbitrary rank, with arbitrary transposes and arbitrary broadcasts specified through its indexing_maps attribute. Supports the expected lowerings to linalg.generic and to vector.contract. Corresponding RFC is here: https://discourse.llvm.org/t/mlir-rfc-introduce-linalg-contract/83589
…us checks (#122957)
This patch implements an LLVM IR pass, named kernel-info, that reports various statistics for codes compiled for GPUs. The ultimate goal of these statistics to help identify bad code patterns and ways to mitigate them. The pass operates at the LLVM IR level so that it can, in theory, support any LLVM-based compiler for programming languages supporting GPUs. It has been tested so far with LLVM IR generated by Clang for OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs. By default, the pass runs at the end of LTO, and options like ``-Rpass=kernel-info`` enable its remarks. Example `opt` and `clang` command lines appear in `llvm/docs/KernelInfo.rst`. Remarks include summary statistics (e.g., total size of static allocas) and individual occurrences (e.g., source location of each alloca). Examples of its output appear in tests in `llvm/test/Analysis/KernelInfo`.
… NFC (#124658) Use the Flags argument to add FrameSetup directly instead of walking backwards to add the flag after the call.
…ve (#122900) A trait poperty can be one of serveral alternatives (name, expression, etc.), and each property in a list was parsed as if it could be any of these alternatives independently from other properties. This made the parsing vulnerable to certain ambiguities in the trait grammar (provided in the OpenMP spec). At the same time the OpenMP spec gives the expected types of properties for almost every trait: all properties listed for a given trait are usually of the same type, e.g. names, clauses, etc. Incorporate these restrictions into the parser, and additionally use property extensions as the fallback if the parsing of the expected property type failed. This is intended to allow the parser to succeed, and instead let the semantic-checking code emit a more user-friendly message.
We can take advantage of the fact that we subsequently only clone cold allocation contexts, since not cold behavior is the default, and significantly reduce the amount of metadata (and later ThinLTO summary and MemProfContextDisambiguation graph nodes) by pruning unnecessary not cold contexts when building metadata from the trie. Specifically, we only need to keep notcold contexts that overlap the longest with cold allocations, to know how deeply to clone those contexts to expose the cold allocation behavior. For a large target this reduced ThinLTO bitcode object sizes by about 35%. It reduced the ThinLTO indexing time by about half and the peak ThinLTO indexing memory by about 20%.
…24948) VectorCombine Since the transformation which is the subject of the 'fold-binop-of-reductions.ll` test will be in VectorCombine move the test there.
Added an extension point after vectorizer passes in the PassBuilder. Additionally, added extension points before and after vectorizer passes in `buildLTODefaultPipeline`. Credit goes to @mshockwave for guiding me through my first LLVM contribution (and my first open source contribution in general!) :) - Implemented `registerVectorizerEndEPCallback` - Implemented `invokeVectorizerEndEPCallbacks` - Added `VectorizerEndEPCallbacks` SmallVector - Added a command line option `passes-ep-vectorizer-end` to `NewPMDriver.cpp` - `buildModuleOptimizationPipeline` now calls `invokeVectorizerEndEPCallbacks` - `buildO0DefaultPipeline` now calls `invokeVectorizerEndEPCallbacks` - `buildLTODefaultPipeline` now calls BOTH `invokeVectorizerStartEPCallbacks` and `invokeVectorizerEndEPCallbacks` - Added LIT tests to `new-pm-defaults.ll`, `new-pm-lto-defaults.ll`, `new-pm-O0-ep-callbacks.ll`, and `pass-pipeline-parsing.ll` - Renamed `CHECK-EP-Peephole` to `CHECK-EP-PEEPHOLE` in `new-pm-lto-defaults.ll` for consistency. This code is intended for developers that wish to implement and run custom passes after the vectorizer passes in the PassBuilder pipeline. For example, in #91796, a pass was created that changed the induction variables of vectorized code. This is right after the vectorization passes.
…. (#111894) This flag has been deprecated since Clang 19, having been the default since then. It has remained because its negation was still useful to work around backwards compatibility breaking changes from P0522. However, in Clang 20 we have landed various changes which implemented P3310R2 and beyond, which solve almost all of the expected issues, the known remaining few being a bit obscure. So this change removes the flag completely and all of its implementation and support code. Hopefully any remaining users can just stop using the flag. If there are still important issues remaining, this removal will also shake the tree and help us know.
…length. (#124827) The length might be any integer, so hlfir.get_length lowering should explicitly cast it to `index`.
…124867) An hlfir.elemental with a shape `(0, HUGE)` still runs `HUGE` number of iterations when expanded into a loop nest. HLFIR transformational operations inlined as hlfir.elemental may execute slower comparing to Fortran runtime implementation. This patch adds an option for BufferizeHLFIR pass to reset all upper bounds in the elemental loop nests to zero, if the result is an empty array. A separate patch will enable this option in the driver after I do more performance testing. The option is off by default now.
…PC_INSTALL_PATH is set. (#124810)
This fixes instantiation of definition for friend function templates, when the declaration found and the one containing the definition have different template contexts. In these cases, the the function declaration corresponding to the definition is not available; it may not even be instantiated at all. So this patch adds a bit which tracks which function template declaration was instantiated from the member template. It's used to find which primary template serves as a context for the purpose of obtainining the template arguments needed to instantiate the definition. Fixes #55509 Relanding patch, with no changes, after it was reverted due to revert of commit this patch depended on.
This refactors the standard stream implementation in multiple ways: - The streams are now `stream_data` structs, which contain all the data required for a stream - The windows mangling is generated via a macro instead of having magic strings for the different streams. (i.e. it's now only partially magic)
This is an implementation of P1061 Structure Bindings Introduce a Pack without the ability to use packs outside of templates. There is a couple of ways the AST could have been sliced so let me know what you think. The only part of this change that I am unsure of is the serialization/deserialization stuff. I followed the implementation of other Exprs, but I do not really know how it is tested. Thank you for your time considering this. --------- Co-authored-by: Yanzuo Liu <zwuis@outlook.com>
…VE (#121817) Parse METADIRECTIVE as a standalone executable directive at the moment. This will allow testing the parser code. There is no lowering, not even clause conversion yet. There is also no verification of the allowed values for trait sets, trait properties.
…it (#124965) This patch addresses post commit review comments from #124859. The extra compile definition is not necessary and goes against the effort to separate the runtimes from the flang compiler itself. The function declaration for `CUFInit` can be accessed anyway since the header are always present. The insertion of the call is only based on the language feature options from the folding context. A program compiled with cuda enabled but no cufruntime would just fail at link time as expected.
Describes PLT entries for hexagon.
Modify the DA pretty printer to match the output of other analysis passes. This enables update_analyze_test_checks.py to also work on DA tests. Auto generate all the Dependence Analysis tests.
konstantinschwarz
approved these changes
Apr 1, 2026
ehsan-toosi
approved these changes
Apr 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.