Skip to content

Isoard.upstream sync#574

Merged
konstantinschwarz merged 1293 commits intoaie-publicfrom
isoard.upstream-sync
Jul 25, 2025
Merged

Isoard.upstream sync#574
konstantinschwarz merged 1293 commits intoaie-publicfrom
isoard.upstream-sync

Conversation

@isoard-amd
Copy link
Copy Markdown
Collaborator

No description provided.

kparzysz and others added 30 commits July 23, 2024 09:16
… find inplace subvectors.

The EXTRACT_SUBVECTOR nodes don't have to be the same type, they just need to be at the correct bit offsets when concatenated back together.

This reapplies d43ec97 (after being reverted 68cb903) now that 65e86a8 has landed to address a downstream issue.
This reverts commit adea9f9.
d3fb41d was reverted in 73d7897.
Doing an add reduction on a vector of i1 elements is the same as
counting the number of set elements so such a reduction can be lowered
to a cntp instruction. This saves a number of instructions over
performing a UADDV. This patch only handles straightforward cases (i.e.
when vectors are not split).
This commit adds a check that disables `wasm-opt` for the
`wasm32-wasip2` target because `wasm-opt` doesn't support components at
this time. This also fixes a minor issue from #95208 where if `wasm-opt`
was disabled then the linker wouldn't run at all.
* Add support for --git flag to bump version for a git suffix
* Update location of the new file where the version is stored
…#100103)

Fixes #100075

---------

Co-authored-by: Jay Foad <jay.foad@amd.com>
This supersedes llvm/llvm-project#87818 and
fixes llvm/llvm-project#52767

When calculating arm64 thunks, we make a few assumptions that may not
hold when considering code sections outside of `__text`:

1. That a section needs thunks only if its size is larger than the
branch range.
2. That any calls into `__stubs` are necessarily forward jumps (that is,
the section with the jump is ordered before `__stubs`)

Sections like this exist in the wild, most prominently the
`__lcxx_overrides` section introduced in
llvm/llvm-project#69498

This change:
- Ensures that if one section in `__TEXT` gets thunks, all of them do.
- Makes all code sections in `__TEXT` contiguous (and guaranteed to be
placed before `__stubs`)
The 3-dimentionsional `std::hypot(x,y,z)` was sub-optimally implemented.
This lead to possible over-/underflows in (intermediate) results which
can be circumvented by this proposed change.

The idea is to to scale the arguments (see linked issue for full
discussion).

Tests have been added for problematic over- and underflows.

Closes #92782
Summary:
The linker wrapper's job is to extract embedded device code from fat
binaries and create linked images that can then be embedded and
executed. In order to support LTO, we originally reinvented all of the
LTO handling that `ld.lld` normally does. Primarily, this was because
`nvlink` didn't support this at all, and we have special hacks required
for offloading languages interacting with archive libraries.

Now since I wrote llvm/llvm-project#96561 we
should be able to pass all the inputs to the device linker
transparently. This has the advantage of allowing the `clang` Driver to
do its own handling. Primarily, this will be used to implicitly pass
libraries to the device link job to make it more consistent with other
toolchains.

The JIT support is a notable departure, however there is an option
called `--lto-emit-llvm` that performs the exact function where we want
the final link job to output LLVM-IR that we can then embed instead.

This patch does not fully delete the LTO handling, primarily because I
think the SPIR-V people might want it. To see only the relevant patches,
ignore the first commit of the nvlink-wrapper.

Depends on llvm/llvm-project#96561.
…st (#100074)

This helps to ensure we revisit the last extract_element uses of a node
so that it can be optimized away in cases such as extract(insert(scalartovec(x), 1), 0).
Add the lld flags `--irpgo-profile-sort=<profile>` and
`--compression-sort={function,data,both}` to order functions to improve
startup time, and functions or data to improve compressed size,
respectively.

We use Balanced Partitioning to determine the best section order using
traces from IRPGO profiles (see
https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
for details) to improve startup time and using hashes of section
contents to improve compressed size.

In our recent LLVM talk (https://www.youtube.com/watch?v=yd4pbSTjwuA),
we showed that this can reduce page faults during startup by 40% on a
large iOS app and we can reduce compressed size by 0.8-3%.

More details can be found in https://dl.acm.org/doi/10.1145/3660635

---------

Co-authored-by: Vincent Lee <thevinster@users.noreply.github.com>
This implements the __builtin_cpu_init and __builtin_cpu_supports
builtin routines based on the compiler runtime changes in
llvm/llvm-project#85790.

This is inspired by llvm/llvm-project#85786.
Major changes are a) a restriction in scope to only the builtins (which
have a much narrower user interface), and the avoidance of false
generality. This change deliberately only handles group 0 extensions
(which happen to be all defined ones today), and avoids the tblgen
changes from that review.

I don't have an environment in which I can actually test this, but @BeMg
has been kind enough to report that this appears to work as expected.

Before this can make it into a release, we need a change such as
llvm/llvm-project#99958. The gcc docs claim that
cpu_support can be called by "normal" code without calling the cpu_init
routine because the init routine will have been called by a high
priority constructor. Our current compiler-rt mechanism does not do
this.
…FC] (#98936)

The code previously deferred deleting the vsetvli to avoid invalidating
iterators, but eagerly deleted any ADDIs feeding the AVL register
operand. This was safe because the iterator was known to point to a
non-ADDI instruction (the vsetvli which was the previous user.) This
change switches to using an early_inc_range so that we can eagerly
delete the vsetvlis, but have to track ADDIs for later deletion.

This is purely stylistic, but IMO makes the code easier to follow. It
will also simplify a future change to support recursive deletion of
trivially dead instructions (i.e. LUI/ADDI pairs.)
The return type of both is signed. Thus, we have to use sext.

Follow up to llvm/llvm-project#99820
Summary:
Some recent patches made these stop failing so the XFAIL now makes the
bots go red.

Fixes llvm/llvm-project#98903
… NFC (#100053)

By making the LHS and RHS const pointers, we can use the const signature
of matchSelectPattern.
…9927)

In debug mode there is a wrapper (the kernel) around the function in
which we generate the kernel code. We worked around this before to get
the correct kernel name, but now we really distinguish both to attach
the launch bounds to the kernel, not the inner function.
`vector<bool>`'s shrink_to_fit implementation is using the
"swap-to-free-container-resources-trick" which only shrinks when the
input vector is empty. Since the request to shrink_to_fit is
non-binding, this is a valid implementation. It is not a high-quality
implementation. Since `vector<bool>` is not a very popular container the
implementation has not been changed and only a test to validate the
non-growing property has been added.

This was discovered while investigating #95161.
Unify the implementations of WaitForSetEvents and WaitForEventsToReset.
The former deals with the possibility of a race between the timeout and
the predicate while the latter does not. The functions were also
inconsistent in when they would recompute the mask. This patch unifies
the two implementations and make them behave exactly the same modulo the
predicate.

rdar://130562344
…nal narrowing. (#100071)

If vncvt doesn't produce the destination type directly, use vnclip to do
additional narrowing with saturation.
This ensures that shrink_to_fit does not increase the allocated size.

Partly addresses #95161
Since `master` is deprecated from OpenMP spec 5.2, warning is added.
Using `masked` is the recommended alternative as per spec
Some problem with current build on macos:
- no libatomic.
- death tests do not work yet.
…lities (#99934)

## Issue
Attempting to run the lldb API tests against a remote-android target
fails with the error `NameError: name 'urlparse' is not defined`.

## Root Cause
It looks the Python import of `urlparse` was removed by mistake in
22ea97d. This import is only used when
running the lldb API tests against a remote-android target so it went
unnoticed.

## Fix
This change simply puts back the missing import. It is a one line
change.

fixes #99931

## Validation
Tested on Fedora 39 with an attached Android device:

`cd llvm-project`
`cmake -S llvm -B build -G Ninja -DLLVM_ENABLE_PROJECTS='clang;lldb'
-DCMAKE_BUILD_TYPE=Release -DLLDB_ENABLE_PYTHON=On`
`ninja -C build`
`./build/bin/lldb-dotest --arch aarch64 --out-of-tree-debugserver
--platform-name=remote-android
--platform-working-dir=/data/local/tmp/ds2
--platform-url=connect://localhost:5432 --compiler
~/Android/Sdk/ndk/21.4.7075529/toolchains/llvm/prebuilt/linux-x86_64/bin/clang`
This patch enables the target-independent lowering of llvm.lround via
GlobalISel. For SelectionDAG, the instrinsic is custom lowered for
AMDGPU. In order to support vector floating point input for llvm.lround,
this patch extends the target independent APIs and provide support for
scalarizing. pr98950 is needed to let verifier allow vector floating
point types
cmc-rep and others added 7 commits July 24, 2024 12:04
… with condition (#98966)

The load-splitting code in RegBank selection is only relevant to those
listed address-spaces because there are cases in those address-spaces in
which we are not sure how far to split during legalization

---------

Signed-off-by: gangc <gangc@amd.com>
Summary:
We can enable the sscanf function on the GPU now. This required adding
the configs to the scanf list so that the GPU build didn't do float
conversions.
…#98968)

Dead calls to these intrinsics were not being deleted at the IR level as
they were not marked `IntrWillReturn`, though they were being deleted
when building the SDAG. This fixes that and adds a test to confirm they
are deleted during `opt`
…ion.ll. NFC

The FOLDING prefix was ambiguous on one of the test cases. It would be
nice if the update script reported this.
This patch enabled the target-independent lowering of llvm.lrint via
GlobalISel.
For SelectionDAG, the instrinsic is custom lowered for AMDGPU.
…verflow (#99579)

We have a mechanism to allow folding expressions that aren't ICEs as an
extension; use it more consistently.

This ends up causing bad effects on diagnostics in a few cases, but
that's not specific to shifts; it's a general issue with the way those
uses handle overflow diagnostics.
@isoard-amd isoard-amd force-pushed the isoard.upstream-sync branch from d25138a to 4949f9c Compare July 25, 2025 22:36
@isoard-amd isoard-amd force-pushed the isoard.upstream-sync branch from 4949f9c to e7c7ef4 Compare July 25, 2025 23:21
@konstantinschwarz konstantinschwarz merged commit cfffcfe into aie-public Jul 25, 2025
16 of 17 checks passed
@konstantinschwarz konstantinschwarz deleted the isoard.upstream-sync branch July 25, 2025 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.