Isoard.upstream sync by isoard-amd · Pull Request #574 · Xilinx/llvm-aie

isoard-amd · 2025-07-25T20:26:42Z

No description provided.

… find inplace subvectors. The EXTRACT_SUBVECTOR nodes don't have to be the same type, they just need to be at the correct bit offsets when concatenated back together. This reapplies d43ec97 (after being reverted 68cb903) now that 65e86a8 has landed to address a downstream issue.

This reverts commit adea9f9. d3fb41d was reverted in 73d7897.

Doing an add reduction on a vector of i1 elements is the same as counting the number of set elements so such a reduction can be lowered to a cntp instruction. This saves a number of instructions over performing a UADDV. This patch only handles straightforward cases (i.e. when vectors are not split).

This commit adds a check that disables `wasm-opt` for the `wasm32-wasip2` target because `wasm-opt` doesn't support components at this time. This also fixes a minor issue from #95208 where if `wasm-opt` was disabled then the linker wouldn't run at all.

* Add support for --git flag to bump version for a git suffix * Update location of the new file where the version is stored

…#100103) Fixes #100075 --------- Co-authored-by: Jay Foad <jay.foad@amd.com>

This supersedes llvm/llvm-project#87818 and fixes llvm/llvm-project#52767 When calculating arm64 thunks, we make a few assumptions that may not hold when considering code sections outside of `__text`: 1. That a section needs thunks only if its size is larger than the branch range. 2. That any calls into `__stubs` are necessarily forward jumps (that is, the section with the jump is ordered before `__stubs`) Sections like this exist in the wild, most prominently the `__lcxx_overrides` section introduced in llvm/llvm-project#69498 This change: - Ensures that if one section in `__TEXT` gets thunks, all of them do. - Makes all code sections in `__TEXT` contiguous (and guaranteed to be placed before `__stubs`)

The 3-dimentionsional `std::hypot(x,y,z)` was sub-optimally implemented. This lead to possible over-/underflows in (intermediate) results which can be circumvented by this proposed change. The idea is to to scale the arguments (see linked issue for full discussion). Tests have been added for problematic over- and underflows. Closes #92782

Summary: The linker wrapper's job is to extract embedded device code from fat binaries and create linked images that can then be embedded and executed. In order to support LTO, we originally reinvented all of the LTO handling that `ld.lld` normally does. Primarily, this was because `nvlink` didn't support this at all, and we have special hacks required for offloading languages interacting with archive libraries. Now since I wrote llvm/llvm-project#96561 we should be able to pass all the inputs to the device linker transparently. This has the advantage of allowing the `clang` Driver to do its own handling. Primarily, this will be used to implicitly pass libraries to the device link job to make it more consistent with other toolchains. The JIT support is a notable departure, however there is an option called `--lto-emit-llvm` that performs the exact function where we want the final link job to output LLVM-IR that we can then embed instead. This patch does not fully delete the LTO handling, primarily because I think the SPIR-V people might want it. To see only the relevant patches, ignore the first commit of the nvlink-wrapper. Depends on llvm/llvm-project#96561.

…st (#100074) This helps to ensure we revisit the last extract_element uses of a node so that it can be optimized away in cases such as extract(insert(scalartovec(x), 1), 0).

Add the lld flags `--irpgo-profile-sort=<profile>` and `--compression-sort={function,data,both}` to order functions to improve startup time, and functions or data to improve compressed size, respectively. We use Balanced Partitioning to determine the best section order using traces from IRPGO profiles (see https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068 for details) to improve startup time and using hashes of section contents to improve compressed size. In our recent LLVM talk (https://www.youtube.com/watch?v=yd4pbSTjwuA), we showed that this can reduce page faults during startup by 40% on a large iOS app and we can reduce compressed size by 0.8-3%. More details can be found in https://dl.acm.org/doi/10.1145/3660635 --------- Co-authored-by: Vincent Lee <thevinster@users.noreply.github.com>

@BeMg

This implements the __builtin_cpu_init and __builtin_cpu_supports builtin routines based on the compiler runtime changes in llvm/llvm-project#85790. This is inspired by llvm/llvm-project#85786. Major changes are a) a restriction in scope to only the builtins (which have a much narrower user interface), and the avoidance of false generality. This change deliberately only handles group 0 extensions (which happen to be all defined ones today), and avoids the tblgen changes from that review. I don't have an environment in which I can actually test this, but @BeMg has been kind enough to report that this appears to work as expected. Before this can make it into a release, we need a change such as llvm/llvm-project#99958. The gcc docs claim that cpu_support can be called by "normal" code without calling the cpu_init routine because the init routine will have been called by a high priority constructor. Our current compiler-rt mechanism does not do this.

…FC] (#98936) The code previously deferred deleting the vsetvli to avoid invalidating iterators, but eagerly deleted any ADDIs feeding the AVL register operand. This was safe because the iterator was known to point to a non-ADDI instruction (the vsetvli which was the previous user.) This change switches to using an early_inc_range so that we can eagerly delete the vsetvlis, but have to track ADDIs for later deletion. This is purely stylistic, but IMO makes the code easier to follow. It will also simplify a future change to support recursive deletion of trivially dead instructions (i.e. LUI/ADDI pairs.)

The return type of both is signed. Thus, we have to use sext. Follow up to llvm/llvm-project#99820

…rWriter (#99599) Close llvm/llvm-project#99479 See llvm/llvm-project#99479 for details

Summary: Some recent patches made these stop failing so the XFAIL now makes the bots go red. Fixes llvm/llvm-project#98903

… NFC (#100053) By making the LHS and RHS const pointers, we can use the const signature of matchSelectPattern.

…9927) In debug mode there is a wrapper (the kernel) around the function in which we generate the kernel code. We worked around this before to get the correct kernel name, but now we really distinguish both to attach the launch bounds to the kernel, not the inner function.

`vector<bool>`'s shrink_to_fit implementation is using the "swap-to-free-container-resources-trick" which only shrinks when the input vector is empty. Since the request to shrink_to_fit is non-binding, this is a valid implementation. It is not a high-quality implementation. Since `vector<bool>` is not a very popular container the implementation has not been changed and only a test to validate the non-growing property has been added. This was discovered while investigating #95161.

Unify the implementations of WaitForSetEvents and WaitForEventsToReset. The former deals with the possibility of a race between the timeout and the predicate while the latter does not. The functions were also inconsistent in when they would recompute the mask. This patch unifies the two implementations and make them behave exactly the same modulo the predicate. rdar://130562344

…nal narrowing. (#100071) If vncvt doesn't produce the destination type directly, use vnclip to do additional narrowing with saturation.

This ensures that shrink_to_fit does not increase the allocated size. Partly addresses #95161

Since `master` is deprecated from OpenMP spec 5.2, warning is added. Using `masked` is the recommended alternative as per spec

Some problem with current build on macos: - no libatomic. - death tests do not work yet.

…lities (#99934) ## Issue Attempting to run the lldb API tests against a remote-android target fails with the error `NameError: name 'urlparse' is not defined`. ## Root Cause It looks the Python import of `urlparse` was removed by mistake in 22ea97d. This import is only used when running the lldb API tests against a remote-android target so it went unnoticed. ## Fix This change simply puts back the missing import. It is a one line change. fixes #99931 ## Validation Tested on Fedora 39 with an attached Android device: `cd llvm-project` `cmake -S llvm -B build -G Ninja -DLLVM_ENABLE_PROJECTS='clang;lldb' -DCMAKE_BUILD_TYPE=Release -DLLDB_ENABLE_PYTHON=On` `ninja -C build` `./build/bin/lldb-dotest --arch aarch64 --out-of-tree-debugserver --platform-name=remote-android --platform-working-dir=/data/local/tmp/ds2 --platform-url=connect://localhost:5432 --compiler ~/Android/Sdk/ndk/21.4.7075529/toolchains/llvm/prebuilt/linux-x86_64/bin/clang`

This patch enables the target-independent lowering of llvm.lround via GlobalISel. For SelectionDAG, the instrinsic is custom lowered for AMDGPU. In order to support vector floating point input for llvm.lround, this patch extends the target independent APIs and provide support for scalarizing. pr98950 is needed to let verifier allow vector floating point types

… with condition (#98966) The load-splitting code in RegBank selection is only relevant to those listed address-spaces because there are cases in those address-spaces in which we are not sure how far to split during legalization --------- Signed-off-by: gangc <gangc@amd.com>

Summary: We can enable the sscanf function on the GPU now. This required adding the configs to the scanf list so that the GPU build didn't do float conversions.

…#98968) Dead calls to these intrinsics were not being deleted at the IR level as they were not marked `IntrWillReturn`, though they were being deleted when building the SDAG. This fixes that and adds a test to confirm they are deleted during `opt`

…ion.ll. NFC The FOLDING prefix was ambiguous on one of the test cases. It would be nice if the update script reported this.

This patch enabled the target-independent lowering of llvm.lrint via GlobalISel. For SelectionDAG, the instrinsic is custom lowered for AMDGPU.

…verflow (#99579) We have a mechanism to allow folding expressions that aren't ICEs as an extension; use it more consistently. This ends up causing bad effects on diagnostics in a few cases, but that's not specific to shifts; it's a general issue with the way those uses handle overflow diagnostics.

…lizerHelper class

kparzysz and others added 30 commits July 23, 2024 09:16

[Frontend][OpenMP] Add deduction guide for ConstructCompositionT

cd2214b

Revert "[gn build] Port d3fb41d (llvm-cgdata)"

6476a1d

This reverts commit adea9f9. d3fb41d was reverted in 73d7897.

[Utils] Updates to bump-version.py (#100089)

20fe252

* Add support for --git flag to bump version for a git suffix * Update location of the new file where the version is stored

AMDGPU: Fix assembler asserting on expressions in vop3 instructions (…

81e2a57

…#100103) Fixes #100075 --------- Co-authored-by: Jay Foad <jay.foad@amd.com>

[DAG] Add users of operand of simplified extract_vector_elt to workli…

b42fe67

…st (#100074) This helps to ensure we revisit the last extract_element uses of a node so that it can be optimized away in cases such as extract(insert(scalartovec(x), 1), 0).

[gn build] Port e3b30bc

0cf92b1

[GlobalIsel] Fix tests for G_SCMP and G_UCMP (#100133)

a138d75

The return type of both is signed. Thus, we have to use sext. Follow up to llvm/llvm-project#99820

[clang] Split ObjectFilePCHContainerReader from ObjectFilePCHContaine…

d64eccf

…rWriter (#99599) Close llvm/llvm-project#99479 See llvm/llvm-project#99479 for details

[Offload] Re-enable tests that are now passing

4854e25

Summary: Some recent patches made these stop failing so the XFAIL now makes the bots go red. Fixes llvm/llvm-project#98903

[SelectionDAGBuilder] Avoid const_cast on call to matchSelectPattern.…

b8d2b77

… NFC (#100053) By making the LHS and RHS const pointers, we can use the const signature of matchSelectPattern.

[RISCV] Use MVT::changeVectorElementType. NFC

df4fa47

[RISCV] Use vnclip(u) to handle fp_to_(s/u)int_sat that needs additio…

ef1367f

…nal narrowing. (#100071) If vncvt doesn't produce the destination type directly, use vnclip to do additional narrowing with saturation.

[libc++][string] Fixes shrink_to_fit. (#97961)

d0ca9f2

This ensures that shrink_to_fit does not increase the allocated size. Partly addresses #95161

Adding warning for Master as it is deprecated in 5.2 (#98955)

a51d263

Since `master` is deprecated from OpenMP spec 5.2, warning is added. Using `masked` is the recommended alternative as per spec

[libc] Fix math tests for macos arm64. (#100060)

1e58c9d

Some problem with current build on macos: - no libatomic. - death tests do not work yet.

[UnifyLoopExits] Never generate phis of only undef values (#99924)

f227dc9

cmc-rep and others added 7 commits July 24, 2024 12:04

[libc] Enable 'sscanf' on the GPU #100211

2e3ee31

Summary: We can enable the sscanf function on the GPU now. This required adding the configs to the scanf list so that the GPU build didn't do float conversions.

[RISCV] Add missing CHECK prefix to fixed-vectors-vfw-web-simplificat…

deb40a2

…ion.ll. NFC The FOLDING prefix was ambiguous on one of the test cases. It would be nice if the update script reported this.

[AMDGPU] Implement llvm.lrint intrinsic lowering (#98931)

0ee32c4

This patch enabled the target-independent lowering of llvm.lrint via GlobalISel. For SelectionDAG, the instrinsic is custom lowered for AMDGPU.

[llvm][MachineLICM] Fix a comment typo. NFC

b1f263e

isoard-amd requested review from F-Stuckmann, SagarMaheshwari99, abhinay-anubola, abnikant, andcarminati, katerynamuts, khallouh, konstantinschwarz, martien-de-jong, niwinanto, philippjh and stephenneuendorffer as code owners July 25, 2025 20:26

Alexandre Isoard added 2 commits July 25, 2025 16:14

Merge commit 'b1f263e4c246' into aie-public

554999c

fixup! [Disassembler] Allow decoding zero-bit operands

2770703

isoard-amd force-pushed the isoard.upstream-sync branch from d25138a to 4949f9c Compare July 25, 2025 22:36

Alexandre Isoard added 5 commits July 25, 2025 17:16

fixup! [NFC][AIE] Move all custom legalization actions to new AIELega…

ffa0689

…lizerHelper class

fixup! [AIE2] Multi-stage register allocation

e763277

fixup! [AIE] Add architecture-specific files

26794ca

fixup! [AIE] Add architecture-specific files

ce247de

fixup! [AIE] Add architecture-specific files

e7c7ef4

isoard-amd force-pushed the isoard.upstream-sync branch from 4949f9c to e7c7ef4 Compare July 25, 2025 23:21

konstantinschwarz merged commit cfffcfe into aie-public Jul 25, 2025
16 of 17 checks passed

konstantinschwarz deleted the isoard.upstream-sync branch July 25, 2025 23:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Isoard.upstream sync#574

Isoard.upstream sync#574
konstantinschwarz merged 1293 commits intoaie-publicfrom
isoard.upstream-sync

isoard-amd commented Jul 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

isoard-amd commented Jul 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants