Skip to content

Conversation

@pawosm-arm
Copy link
Contributor

Original PR under review: llvm/llvm-project#140070

Downstream issue: #533

Il-Capitano and others added 30 commits August 11, 2025 12:16
…(#152502)

Only BTI and PAC properties were added previously.

Fixes llvm/llvm-project#152427.

(cherry picked from commit c088b5f)
…146424)

If a copy exists between creation of a crbit and a spill, machine-cp
may delete the copy since it seems unaware of the relation between a cr
and crbit. A fix was previously made for the generic ppc64 lowering. It
should be applied to the pwr9 and pwr10 variants too.

Likewise, relax and extend the pwr8 test to verify pwr9 and pwr10
codegen too.

This fixes #143989.

(cherry picked from commit 5f86456)
This reverts llvm/llvm-project#132976.

The PR incorrectly claimed that this makes inlining more liberal,
referencing the string comparison in TargetTransformInfoImpl.h.

However, the implementation that actually applies is the one in
BasicTTIImpl.h, which performs a feature subset comparison. As such,
this regressed inlining, most concerningly of functions without +vector
into functions with +vector.

Revert the change to restore the previous behavior.

(cherry picked from commit 18e4f77)
… (#152269)

This fixes llvm/llvm-project#152097

This commit fixes two instances of a (somewhat) recently enabled
assertion. One with a test, the other I can't reproduce (might be dead
code) but certainly looks like an instance of the same problem.

The PR that introduced the regression:
llvm/llvm-project#117558

With this patch, the AVR backend is usable again for TinyGo.

(cherry picked from commit aeeb9b5)
…<map>` (#152624)

The `__get_value()` member function was removed in LLVM 21, but the
calls in `<map>` weren't removed. This patch completes the removal and
adds regression test cases.

Fixes #152543.

(cherry picked from commit 7ae1424)
Critical edges with an IndirectBr terminator cannot be split.
Add a check it to prevent assertion failures.

Fixes: #150229
(cherry picked from commit a750fcb)
… targets (#149235)

I'm unsure if there is an official source for which targets use/support
which emulations, but for the baremetal GNU Arm/AArch64 toolchains or
binutils builds I've tried to use, GNU ld either did not support the
Linux emulations (resulting in errors unless overriding the emulation)
or the Linux emulations were supported but GCC passed the non-Linux
emulations by default.

These emulations all seem to be accepted by lld as well, so try to align
with what it seems GCC is doing and prefer the non-Linux emulations for
baremetal Arm/AArch64 targets.

(cherry picked from commit ee63c1f)
…Arm/AArch64 targets (#149235)

I'm unsure if there is an official source for which targets use/support
which emulations, but for the baremetal GNU Arm/AArch64 toolchains or
binutils builds I've tried to use, GNU ld either did not support the
Linux emulations (resulting in errors unless overriding the emulation)
or the Linux emulations were supported but GCC passed the non-Linux
emulations by default.

These emulations all seem to be accepted by lld as well, so try to align
with what it seems GCC is doing and prefer the non-Linux emulations for
baremetal Arm/AArch64 targets.

(cherry picked from commit ee63c1f)
)

Downstream issue: #482

With this new A320 in-order core, we follow adding the
FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520
(#132246), which reaps the same code generation benefits of preferring
fixed over scalable when the cost is equal.

So when we have:
```
void foo(float* a, float* b, float* dst, unsigned n) {
    for (unsigned i = 0; i < n; ++i)
        dst[i] = a[i] + b[i];
}
```

When compiling without the feature enabled, we get:
```
...
    ld1b    { z0.b }, p0/z, [x0, x10]
    ld1b    { z2.b }, p0/z, [x1, x10]
    add     x12, x0, x10
    ldr     z1, [x12, #1, mul vl]
    add     x12, x1, x10
    ldr     z3, [x12, #1, mul vl]
    fadd    z0.s, z2.s, z0.s
    add     x12, x2, x10
    fadd    z1.s, z3.s, z1.s
    dech    x11
    st1b    { z0.b }, p0, [x2, x10]
    incb    x10, all, mul #2
    str     z1, [x12, #1, mul vl]
...
```

When compiling with, we get:
```
...
  	ldp	    q0, q1, [x12, #-16]
	ldp	    q2, q3, [x11, #-16]
	subs	x13, x13, #8
	fadd	v0.4s, v2.4s, v0.4s
	fadd	v1.4s, v3.4s, v1.4s
	add	    x11, x11, #32
	add	    x12, x12, #32
	stp	    q0, q1, [x10, #-16]
	add	    x10, x10, #32

...
```

This patch also moves FeatureUseFixedOverScalableIfEqualCost for A510
and A520 from the CPU features to the tune features.
…g (#129744)

Similar to ab976a1, but for llvm.log.

(cherry picked from commit 19ada02)
…#151028)

This has been reported in llvm/llvm-project#116709 (comment).

Fixes #151328

(cherry picked from commit f5f5824)
On Linux, mmap doesn't always zero-fill slack bytes ([man page]),
despite being required to do so by POSIX. If the final page of a file is
in the page cache and the bytes past the end of the file get overwritten
by some process, those bytes then remain non-zero until the page falls
out of the cache or another process overwrites them.

Stop trusting that mmap behaves properly and instead check
whether the buffer was indeed properly terminated. If not, fall back to
using `read` to read the file contents.

This fixes an obscure clang crash bug that can occur if another program
(such as an editor) mmap's a source file and writes past the end of the
mmap'd region shortly before clang or clangd attempts to parse the file.

 [man page]: https://man7.org/linux/man-pages/man2/mmap.2.html#BUGS

(cherry picked from commit 85cd3d9)
Span-dependent instructions on RISC-V interact in a complex manner with
linker relaxation. The span-dependent assembler algorithm implemented in
LLVM has to start with the smallest version of an instruction and then
only make it larger, so we compress instructions before emitting them to
the streamer.

When the instruction is streamed, the information that the instruction
(or rather, the fixup on the instruction) is linker relaxable must be
accurate, even though the assembler relaxation process may transform a
not-linker-relaxable instruction/fixup into one that that is linker
relaxable, for instance `c.jal` becoming `qc.e.jal`, or `bne` getting
turned into `beq; jal` (the `jal` is linker relaxable).

In order for this to work, the following things have to happen:
- Any instruction/fixup which might be relaxed to a linker-relaxable
instruction/fixup, gets marked as `RelaxCandidate = true` in
RISCVMCCodeEmitter.
- In RISCVAsmBackend, when emitting the `R_RISCV_RELAX` relocation, we
have to check that the relocation/fixup kind is one that may need a
relax relocation, as well as that it is marked as linker relaxable (the
latter will not be set if relaxation is disabled).
- Linker Relaxable instructions streamed to a Relaxable fragment need to
mark the fragment and its section as linker relaxable.

I also added more debug output for Sections/Fixups which are marked
Linker Relaxable.

This results in more relocations, when these PC-relative fixups cross an
instruction with a fixup that is resolved as not linker-relaxable but
caused the fragment to be marked linker relaxable at streaming time
(i.e. `c.j`).

(cherry picked from commit 9e8f7ac)

Fixes: #150071
…on that reads the register. (#153644)

If we're moving the second copy before another instruction that reads
the copied register, we need to clear the kill flag on the combined
move.

Fixes #153598.

(cherry picked from commit defbbf0)
Currently, `__attribute__((target("lasx")))` does not automatically
enable LSX support, causing Clang to fail with `-mno-lsx`. Since
LASX depends on LSX, enabling LASX should implicitly enable LSX to
avoid clang error.

Fixes #149512.

Depends on #153541

(cherry picked from commit a1b6e7f)
780054d added support for
`ISD::AssertNoFPClass`.

This ISD node can be used with the `ppc_fp128` type, which is really
just two `f64s` and requires expanding when used with
`ISD::AssertNoFPClass`. Without the support for expanding the result, we
get an assertion because the legalizer does not know how to expand the
results of `ppc_fp128` with `ISD::AssertNoFPClass`.
```
ExpandFloatResult #0: t7: ppcf128 = AssertNoFPClass t5, TargetConstant:i32<3>

LLVM ERROR: Do not know how to expand the result of this operator!
```
Thus, this patch aims to add support for the expand so we no longer
assert.

This fixes #151375.

(cherry picked from commit 63cc2e3)
…ite loop (#153069)

Fixes #153012

As we tolerate unfoldable constant expressions in `scalarizeOpOrCmp`, we
may fold
```llvm
define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) #0 {
entry:
  %158 = insertelement <2 x i64> <i64 5, i64 ptrtoint (ptr @Val to i64)>, i64 %idx, i32 0
  %159 = or disjoint <2 x i64> splat (i64 2), %158
  store <2 x i64> %159, ptr %ptr2
  ret void
}
```

to

```llvm
define void @bug(ptr %ptr1, ptr %ptr2, i64 %idx) {
entry:
  %.scalar = or disjoint i64 2, %idx
  %0 = or <2 x i64> splat (i64 2), <i64 5, i64 ptrtoint (ptr @Val to i64)>
  %1 = insertelement <2 x i64> %0, i64 %.scalar, i64 0
  store <2 x i64> %1, ptr %ptr2, align 16
  ret void
}
```
And it would be folded back in `foldInsExtBinop`, resulting in an
infinite loop.

This patch forces scalarization iff InstSimplify can fold the constant
expression.

(cherry picked from commit 3a4a60d)
…#153924)

Fixes #153891

(cherry picked from commit a21d17f)
…able vec arg (#129744)

Similar to ab976a1, but for llvm.log.

(cherry picked from commit 19ada02)
…o eagerly (#151028)

This has been reported in llvm/llvm-project#116709 (comment).

Fixes #151328

(cherry picked from commit f5f5824)
On Linux, mmap doesn't always zero-fill slack bytes ([man page]),
despite being required to do so by POSIX. If the final page of a file is
in the page cache and the bytes past the end of the file get overwritten
by some process, those bytes then remain non-zero until the page falls
out of the cache or another process overwrites them.

Stop trusting that mmap behaves properly and instead check
whether the buffer was indeed properly terminated. If not, fall back to
using `read` to read the file contents.

This fixes an obscure clang crash bug that can occur if another program
(such as an editor) mmap's a source file and writes past the end of the
mmap'd region shortly before clang or clangd attempts to parse the file.

 [man page]: https://man7.org/linux/man-pages/man2/mmap.2.html#BUGS

(cherry picked from commit 85cd3d9)
… (#152602)

Span-dependent instructions on RISC-V interact in a complex manner with
linker relaxation. The span-dependent assembler algorithm implemented in
LLVM has to start with the smallest version of an instruction and then
only make it larger, so we compress instructions before emitting them to
the streamer.

When the instruction is streamed, the information that the instruction
(or rather, the fixup on the instruction) is linker relaxable must be
accurate, even though the assembler relaxation process may transform a
not-linker-relaxable instruction/fixup into one that that is linker
relaxable, for instance `c.jal` becoming `qc.e.jal`, or `bne` getting
turned into `beq; jal` (the `jal` is linker relaxable).

In order for this to work, the following things have to happen:
- Any instruction/fixup which might be relaxed to a linker-relaxable
instruction/fixup, gets marked as `RelaxCandidate = true` in
RISCVMCCodeEmitter.
- In RISCVAsmBackend, when emitting the `R_RISCV_RELAX` relocation, we
have to check that the relocation/fixup kind is one that may need a
relax relocation, as well as that it is marked as linker relaxable (the
latter will not be set if relaxation is disabled).
- Linker Relaxable instructions streamed to a Relaxable fragment need to
mark the fragment and its section as linker relaxable.

I also added more debug output for Sections/Fixups which are marked
Linker Relaxable.

This results in more relocations, when these PC-relative fixups cross an
instruction with a fixup that is resolved as not linker-relaxable but
caused the fragment to be marked linker relaxable at streaming time
(i.e. `c.j`).

(cherry picked from commit 9e8f7ac)

Fixes: #150071
…n instruction that reads the register. (#153644)

If we're moving the second copy before another instruction that reads
the copied register, we need to clear the kill flag on the combined
move.

Fixes #153598.

(cherry picked from commit defbbf0)
andreisfr and others added 6 commits September 11, 2025 15:31
Impelement lowering of the SETONE/SETOGT/SETOGE/SETUGT/SETUGE operations.
This fixes f32 "copysign" and "ueq" tests.
Pass along `-Os` and `-Oz` as multilib flags under ARM and AArch64 so
that they can be used as criteria for multilib selection.

This is a cherry-pick of 579a807

Reason for downstream patch: this commit was merged in upstream LLVM
after the branching for 21 had been made. The patch is not a bug fix and
is very specific to Arm's interests, therefore I don't see this commit
being accepted for a backport.

Downstream issue:#528
- All the M-profile variants have their build type changed to
`minsizerelease`. Exceptions apply to Armv8.1-M Main. See below.
- For Armv8.1-M Main, the following changes apply:
- All the PACBTI variants have their build type changed to
`minsizerelease`.
  - Ditto for all the soft nofp nomve variants.
- For the `unaligned` variants that don't fall under the previous
categories, a new version is created so that two versions now coexist:
one with `release` as the build type, and another with `minsizerelease`.
- All variants whose build type is `minsizerelease` are renamed to
include the "size" segment in their names. This includes not only
M-profile but also Armv4t and Armv5te.
- Update tests and add ones that were missing, for example for Armv4t
and Armv5te.

This is a cherry-pick of de3d304
squashed with a3488f4
…nux-gnu"

This normalizes the "aarch64-amazon-linux" to "aarch64-amazon-linux-gnu",
this allows various parts of the LLVM codebase to recognize a GNU
environment is present on Amazon Linux. This allows RuntimeLibcalls to
mark the sincos* functions as available on Amazon Linux. This allows
these functions to be vectorized on Amazon Linux.
@pawosm-arm
Copy link
Contributor Author

wrong branch! Github allows such mistake far too easy to make!

@pawosm-arm pawosm-arm closed this Sep 15, 2025
@github-actions github-actions bot added the downstream-change Downstream change to LLVM tree label Sep 15, 2025
@github-actions
Copy link

This pull review modifies files outside of the arm-software directory, so please ensure it follows the Downstream Patch Policy.
An automated check will test if the tagging requirements have been met. Please wait for approving reviews from both Arm Toolchain for Embedded and Arm Toolchain for Linux teams before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

downstream-change Downstream change to LLVM tree

Projects

None yet

Development

Successfully merging this pull request may close these issues.