Merged
Conversation
Doing an add reduction on a vector of i1 elements is the same as counting the number of set elements so such a reduction can be lowered to a cntp instruction. This saves a number of instructions over performing a UADDV. This patch only handles straightforward cases (i.e. when vectors are not split).
This commit adds a check that disables `wasm-opt` for the `wasm32-wasip2` target because `wasm-opt` doesn't support components at this time. This also fixes a minor issue from #95208 where if `wasm-opt` was disabled then the linker wouldn't run at all.
* Add support for --git flag to bump version for a git suffix * Update location of the new file where the version is stored
…#100103) Fixes #100075 --------- Co-authored-by: Jay Foad <jay.foad@amd.com>
This supersedes llvm/llvm-project#87818 and fixes llvm/llvm-project#52767 When calculating arm64 thunks, we make a few assumptions that may not hold when considering code sections outside of `__text`: 1. That a section needs thunks only if its size is larger than the branch range. 2. That any calls into `__stubs` are necessarily forward jumps (that is, the section with the jump is ordered before `__stubs`) Sections like this exist in the wild, most prominently the `__lcxx_overrides` section introduced in llvm/llvm-project#69498 This change: - Ensures that if one section in `__TEXT` gets thunks, all of them do. - Makes all code sections in `__TEXT` contiguous (and guaranteed to be placed before `__stubs`)
The 3-dimentionsional `std::hypot(x,y,z)` was sub-optimally implemented. This lead to possible over-/underflows in (intermediate) results which can be circumvented by this proposed change. The idea is to to scale the arguments (see linked issue for full discussion). Tests have been added for problematic over- and underflows. Closes #92782
Summary: The linker wrapper's job is to extract embedded device code from fat binaries and create linked images that can then be embedded and executed. In order to support LTO, we originally reinvented all of the LTO handling that `ld.lld` normally does. Primarily, this was because `nvlink` didn't support this at all, and we have special hacks required for offloading languages interacting with archive libraries. Now since I wrote llvm/llvm-project#96561 we should be able to pass all the inputs to the device linker transparently. This has the advantage of allowing the `clang` Driver to do its own handling. Primarily, this will be used to implicitly pass libraries to the device link job to make it more consistent with other toolchains. The JIT support is a notable departure, however there is an option called `--lto-emit-llvm` that performs the exact function where we want the final link job to output LLVM-IR that we can then embed instead. This patch does not fully delete the LTO handling, primarily because I think the SPIR-V people might want it. To see only the relevant patches, ignore the first commit of the nvlink-wrapper. Depends on llvm/llvm-project#96561.
…st (#100074) This helps to ensure we revisit the last extract_element uses of a node so that it can be optimized away in cases such as extract(insert(scalartovec(x), 1), 0).
Add the lld flags `--irpgo-profile-sort=<profile>` and
`--compression-sort={function,data,both}` to order functions to improve
startup time, and functions or data to improve compressed size,
respectively.
We use Balanced Partitioning to determine the best section order using
traces from IRPGO profiles (see
https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
for details) to improve startup time and using hashes of section
contents to improve compressed size.
In our recent LLVM talk (https://www.youtube.com/watch?v=yd4pbSTjwuA),
we showed that this can reduce page faults during startup by 40% on a
large iOS app and we can reduce compressed size by 0.8-3%.
More details can be found in https://dl.acm.org/doi/10.1145/3660635
---------
Co-authored-by: Vincent Lee <thevinster@users.noreply.github.com>
This implements the __builtin_cpu_init and __builtin_cpu_supports builtin routines based on the compiler runtime changes in llvm/llvm-project#85790. This is inspired by llvm/llvm-project#85786. Major changes are a) a restriction in scope to only the builtins (which have a much narrower user interface), and the avoidance of false generality. This change deliberately only handles group 0 extensions (which happen to be all defined ones today), and avoids the tblgen changes from that review. I don't have an environment in which I can actually test this, but @BeMg has been kind enough to report that this appears to work as expected. Before this can make it into a release, we need a change such as llvm/llvm-project#99958. The gcc docs claim that cpu_support can be called by "normal" code without calling the cpu_init routine because the init routine will have been called by a high priority constructor. Our current compiler-rt mechanism does not do this.
…FC] (#98936) The code previously deferred deleting the vsetvli to avoid invalidating iterators, but eagerly deleted any ADDIs feeding the AVL register operand. This was safe because the iterator was known to point to a non-ADDI instruction (the vsetvli which was the previous user.) This change switches to using an early_inc_range so that we can eagerly delete the vsetvlis, but have to track ADDIs for later deletion. This is purely stylistic, but IMO makes the code easier to follow. It will also simplify a future change to support recursive deletion of trivially dead instructions (i.e. LUI/ADDI pairs.)
The return type of both is signed. Thus, we have to use sext. Follow up to llvm/llvm-project#99820
…rWriter (#99599) Close llvm/llvm-project#99479 See llvm/llvm-project#99479 for details
Summary: Some recent patches made these stop failing so the XFAIL now makes the bots go red. Fixes llvm/llvm-project#98903
… NFC (#100053) By making the LHS and RHS const pointers, we can use the const signature of matchSelectPattern.
…9927) In debug mode there is a wrapper (the kernel) around the function in which we generate the kernel code. We worked around this before to get the correct kernel name, but now we really distinguish both to attach the launch bounds to the kernel, not the inner function.
`vector<bool>`'s shrink_to_fit implementation is using the "swap-to-free-container-resources-trick" which only shrinks when the input vector is empty. Since the request to shrink_to_fit is non-binding, this is a valid implementation. It is not a high-quality implementation. Since `vector<bool>` is not a very popular container the implementation has not been changed and only a test to validate the non-growing property has been added. This was discovered while investigating #95161.
Unify the implementations of WaitForSetEvents and WaitForEventsToReset. The former deals with the possibility of a race between the timeout and the predicate while the latter does not. The functions were also inconsistent in when they would recompute the mask. This patch unifies the two implementations and make them behave exactly the same modulo the predicate. rdar://130562344
…nal narrowing. (#100071) If vncvt doesn't produce the destination type directly, use vnclip to do additional narrowing with saturation.
This ensures that shrink_to_fit does not increase the allocated size. Partly addresses #95161
Since `master` is deprecated from OpenMP spec 5.2, warning is added. Using `masked` is the recommended alternative as per spec
Some problem with current build on macos: - no libatomic. - death tests do not work yet.
…lities (#99934) ## Issue Attempting to run the lldb API tests against a remote-android target fails with the error `NameError: name 'urlparse' is not defined`. ## Root Cause It looks the Python import of `urlparse` was removed by mistake in 22ea97d. This import is only used when running the lldb API tests against a remote-android target so it went unnoticed. ## Fix This change simply puts back the missing import. It is a one line change. fixes #99931 ## Validation Tested on Fedora 39 with an attached Android device: `cd llvm-project` `cmake -S llvm -B build -G Ninja -DLLVM_ENABLE_PROJECTS='clang;lldb' -DCMAKE_BUILD_TYPE=Release -DLLDB_ENABLE_PYTHON=On` `ninja -C build` `./build/bin/lldb-dotest --arch aarch64 --out-of-tree-debugserver --platform-name=remote-android --platform-working-dir=/data/local/tmp/ds2 --platform-url=connect://localhost:5432 --compiler ~/Android/Sdk/ndk/21.4.7075529/toolchains/llvm/prebuilt/linux-x86_64/bin/clang`
This patch enables the target-independent lowering of llvm.lround via GlobalISel. For SelectionDAG, the instrinsic is custom lowered for AMDGPU. In order to support vector floating point input for llvm.lround, this patch extends the target independent APIs and provide support for scalarizing. pr98950 is needed to let verifier allow vector floating point types
… with condition (#98966) The load-splitting code in RegBank selection is only relevant to those listed address-spaces because there are cases in those address-spaces in which we are not sure how far to split during legalization --------- Signed-off-by: gangc <gangc@amd.com>
Summary: We can enable the sscanf function on the GPU now. This required adding the configs to the scanf list so that the GPU build didn't do float conversions.
…#98968) Dead calls to these intrinsics were not being deleted at the IR level as they were not marked `IntrWillReturn`, though they were being deleted when building the SDAG. This fixes that and adds a test to confirm they are deleted during `opt`
…ion.ll. NFC The FOLDING prefix was ambiguous on one of the test cases. It would be nice if the update script reported this.
This patch enabled the target-independent lowering of llvm.lrint via GlobalISel. For SelectionDAG, the instrinsic is custom lowered for AMDGPU.
…verflow (#99579) We have a mechanism to allow folding expressions that aren't ICEs as an extension; use it more consistently. This ends up causing bad effects on diagnostics in a few cases, but that's not specific to shifts; it's a general issue with the way those uses handle overflow diagnostics.
d25138a to
4949f9c
Compare
added 5 commits
July 25, 2025 17:16
…lizerHelper class
4949f9c to
e7c7ef4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.