Skip to content

[AutoBump] Merge with 46f9cddf (Jan 29) (2)#884

Merged
konstantinschwarz merged 38 commits intoaie-publicfrom
bump_to_46f9cddf
Apr 2, 2026
Merged

[AutoBump] Merge with 46f9cddf (Jan 29) (2)#884
konstantinschwarz merged 38 commits intoaie-publicfrom
bump_to_46f9cddf

Conversation

@jorickert
Copy link
Copy Markdown
Contributor

No description provided.

clementval and others added 30 commits January 29, 2025 08:04
The import of LLVM IR should use is isDSOLocal instead of
hasLocalLinkage to set the dso_local attribute.
Without this change, function definitions that mostly have external
linkage would be missing dso_local attribute during translation.

---------

Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
`at` has an assert that the key exists. Since we are assuming the key exists,
use `at` instead of `lookup`.
…Map access"

This reverts commit 3ce97e4. Pushed to main
prematurley.
This case is intended to check the callee argument, not the call-site.

Fixes an issue introduced in #123181.
This patch simplifies the implementation of `__construct_at_end` in
`vector<bool>`, which currently contains duplicate initialization logic
across its two overloads.
…opriate size select based on results (#124604)

This PR aims to fix a mapping error when trying to map nullary elements
of a record type (primary example is allocatables/pointer types in
Fortran at the moment). This should be legal to map, just not write to
without pointing to anything within the target region. A common Fortran
OpenMP idiom/example where this is useful can be found in the added
Fortran offload example.

The runtime error arises when we try to map the pointer member utilising
a prescribed constant size that we receive from the lowered type,
resulting in mapping of data that will be non-existent when there is no
allocated data. The fix in this case is to emit a runtime check to see
if the data has been allocated, if it hasn't been we select a size of 0,
if it has we emit the usual type size.
…nches. (#124028)

- **[NFC] Use GCNPat instead of Pat.**
- **[AMDGPU] Always emit SI_KILL_I1_PSEUDO for uniform floating point
branches.**

---------

Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou@amd.com>
Add v16i16 coverage and "reverse order hadd/hsub" tests
A new op that allows for representing arbitrary contractions on operands
of arbitrary rank, with arbitrary transposes and arbitrary broadcasts
specified through its indexing_maps attribute.

Supports the expected lowerings to linalg.generic and to
vector.contract.

Corresponding RFC is here:
https://discourse.llvm.org/t/mlir-rfc-introduce-linalg-contract/83589
This patch implements an LLVM IR pass, named kernel-info, that reports
various statistics for codes compiled for GPUs. The ultimate goal of
these statistics to help identify bad code patterns and ways to mitigate
them. The pass operates at the LLVM IR level so that it can, in theory,
support any LLVM-based compiler for programming languages supporting
GPUs. It has been tested so far with LLVM IR generated by Clang for
OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs.

By default, the pass runs at the end of LTO, and options like
``-Rpass=kernel-info`` enable its remarks. Example `opt` and `clang`
command lines appear in `llvm/docs/KernelInfo.rst`. Remarks include
summary statistics (e.g., total size of static allocas) and individual
occurrences (e.g., source location of each alloca). Examples of its
output appear in tests in `llvm/test/Analysis/KernelInfo`.
… NFC (#124658)

Use the Flags argument to add FrameSetup directly instead of walking
backwards to add the flag after the call.
…ve (#122900)

A trait poperty can be one of serveral alternatives (name, expression,
etc.), and each property in a list was parsed as if it could be any of
these alternatives independently from other properties. This made the
parsing vulnerable to certain ambiguities in the trait grammar (provided
in the OpenMP spec).
At the same time the OpenMP spec gives the expected types of properties
for almost every trait: all properties listed for a given trait are
usually of the same type, e.g. names, clauses, etc.

Incorporate these restrictions into the parser, and additionally use
property extensions as the fallback if the parsing of the expected
property type failed. This is intended to allow the parser to succeed,
and instead let the semantic-checking code emit a more user-friendly
message.
We can take advantage of the fact that we subsequently only clone cold
allocation contexts, since not cold behavior is the default, and
significantly reduce the amount of metadata (and later ThinLTO summary
and MemProfContextDisambiguation graph nodes) by pruning unnecessary not
cold contexts when building metadata from the trie.

Specifically, we only need to keep notcold contexts that overlap the
longest with cold allocations, to know how deeply to clone those
contexts to expose the cold allocation behavior.

For a large target this reduced ThinLTO bitcode object sizes by about
35%. It reduced the ThinLTO indexing time by about half and the peak
ThinLTO indexing memory by about 20%.
…24948)

VectorCombine

Since the transformation which is the subject of the
'fold-binop-of-reductions.ll` test will be in VectorCombine move the
test there.
Added an extension point after vectorizer passes in the PassBuilder.
Additionally, added extension points before and after vectorizer passes
in `buildLTODefaultPipeline`. Credit goes to @mshockwave for guiding me
through my first LLVM contribution (and my first open source
contribution in general!) :)
- Implemented `registerVectorizerEndEPCallback`
- Implemented `invokeVectorizerEndEPCallbacks`
- Added `VectorizerEndEPCallbacks` SmallVector
- Added a command line option `passes-ep-vectorizer-end` to
`NewPMDriver.cpp`
- `buildModuleOptimizationPipeline` now calls
`invokeVectorizerEndEPCallbacks`
- `buildO0DefaultPipeline` now calls `invokeVectorizerEndEPCallbacks`
- `buildLTODefaultPipeline` now calls BOTH
`invokeVectorizerStartEPCallbacks` and `invokeVectorizerEndEPCallbacks`
- Added LIT tests to `new-pm-defaults.ll`, `new-pm-lto-defaults.ll`,
`new-pm-O0-ep-callbacks.ll`, and `pass-pipeline-parsing.ll`
- Renamed `CHECK-EP-Peephole` to `CHECK-EP-PEEPHOLE` in
`new-pm-lto-defaults.ll` for consistency.

This code is intended for developers that wish to implement and run
custom passes after the vectorizer passes in the PassBuilder pipeline.
For example, in #91796, a pass was created that changed the induction
variables of vectorized code. This is right after the vectorization
passes.
…. (#111894)

This flag has been deprecated since Clang 19, having been the default
since then.

It has remained because its negation was still useful to work around
backwards compatibility breaking changes from P0522.

However, in Clang 20 we have landed various changes which implemented
P3310R2 and beyond, which solve almost all of the expected issues, the
known remaining few being a bit obscure.

So this change removes the flag completely and all of its implementation
and support code.

Hopefully any remaining users can just stop using the flag. If there are
still important issues remaining, this removal will also shake the tree
and help us know.
…length. (#124827)

The length might be any integer, so hlfir.get_length lowering
should explicitly cast it to `index`.
…124867)

An hlfir.elemental with a shape `(0, HUGE)` still runs `HUGE`
number of iterations when expanded into a loop nest.
HLFIR transformational operations inlined as hlfir.elemental
may execute slower comparing to Fortran runtime implementation.
This patch adds an option for BufferizeHLFIR pass to reset all
upper bounds in the elemental loop nests to zero, if the result
is an empty array.

A separate patch will enable this option in the driver after I do
more performance testing. The option is off by default now.
mizvekov and others added 8 commits January 29, 2025 17:23
This fixes instantiation of definition for friend function templates,
when the declaration found and the one containing the definition have
different template contexts.

In these cases, the the function declaration corresponding to the
definition is not available; it may not even be instantiated at all.

So this patch adds a bit which tracks which function template
declaration was instantiated from the member template. It's used to find
which primary template serves as a context for the purpose of
obtainining the template arguments needed to instantiate the definition.

Fixes #55509

Relanding patch, with no changes, after it was reverted due to revert of
commit this patch depended on.
This refactors the standard stream implementation in multiple ways:
- The streams are now `stream_data` structs, which contain all the data
required for a stream
- The windows mangling is generated via a macro instead of having magic
strings for the different streams. (i.e. it's now only partially magic)
This is an implementation of P1061 Structure Bindings Introduce a Pack
without the ability to use packs outside of templates. There is a couple
of ways the AST could have been sliced so let me know what you think.
The only part of this change that I am unsure of is the
serialization/deserialization stuff. I followed the implementation of
other Exprs, but I do not really know how it is tested. Thank you for
your time considering this.

---------

Co-authored-by: Yanzuo Liu <zwuis@outlook.com>
…VE (#121817)

Parse METADIRECTIVE as a standalone executable directive at the moment.
This will allow testing the parser code.

There is no lowering, not even clause conversion yet. There is also no
verification of the allowed values for trait sets, trait properties.
…it (#124965)

This patch addresses post commit review comments from #124859. 

The extra compile definition is not necessary and goes against the
effort to separate the runtimes from the flang compiler itself. The
function declaration for `CUFInit` can be accessed anyway since the
header are always present. The insertion of the call is only based on
the language feature options from the folding context.

A program compiled with cuda enabled but no cufruntime would just fail
at link time as expected.
Modify the DA pretty printer to match the output of other analysis
passes. This enables update_analyze_test_checks.py to also work on DA
tests. Auto generate all the Dependence Analysis tests.
@jorickert jorickert disabled auto-merge April 2, 2026 07:24
@konstantinschwarz konstantinschwarz merged commit 7a949d3 into aie-public Apr 2, 2026
15 of 16 checks passed
@konstantinschwarz konstantinschwarz deleted the bump_to_46f9cddf branch April 2, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.