Skip to content

Conversation

@owenca
Copy link
Contributor

@owenca owenca commented Apr 5, 2025

Backport d71ee7d

samvangysegem and others added 30 commits February 7, 2025 18:27
MatchTableRecord stores a 64-bit RawValue. However, this field is only
needed by a small part of the code (jump table generation).

Create a separate RecordAndValue structure that is used in just the
necessary places.

Based on massif, this reduces memory usage on RISCVGenGlobalISel.inc by
about 100MB (to 2.15GB).

(cherry picked from commit e2301d6)
)

The ReferenceLocs are not enabled by default (they are used by the
tablegen lsp server), and as such always empty, but still allocate
inline storage for the SmallVector. Disabling it saves about 200MB on
RISCVGenGlobalISel.inc.

(The equivalent field in Record already disables inline storage.)

(cherry picked from commit c640f97)
)

This gets rid of some extra IO from driver startup, and possiblity of
emitting warnings twice.

(cherry picked from commit df22bbe)
…libexecinfo (llvm#125998)

Also link with libexecinfo on FreeBSD, NetBSD, OpenBSD and DragonFly
for the backtrace functions.

(cherry picked from commit d1de75a)
…5716)

The legacy and vplan cost models did not agree because
VPWidenCallRecipe::computeCost only calculates the cost of the
call instruction, whereas
LoopVectorizationCostModel::setVectorizedCallDecision in some
cases adds on the cost of a synthesised mask argument. However,
this mask is always 'splat(i1 true)' which should be hoisted out
of the loop during codegen. In order to synchronise the two cost
models I have two options:

1) Also add the cost of the splat to the vplan model, or
2) Remove the cost of the splat from the legacy model.

I chose 2) because I feel this more closely represents what the
final code will look like. There is an argument that we should
take account of such broadcast costs in the preheader when
deciding if it's profitable to vectorise a loop, however there
isn't currently a mechanism to do this. We currently only take
account of the runtime checks when assessing profitability and
what the minimum trip count should be. However, I don't believe
this work needs doing as part of this PR.

(cherry picked from commit 1930524)
Fixes issue caused by 1930524

Unused variable UsesMask in LoopVectorize.cpp

(cherry picked from commit 3872e55)
This is an external tool, so I don't think there is an expectation that
it has to be in the LLVM tools bindir. It may also be in the default
system bindir (which is not necessarily the same).

(cherry picked from commit 26ecddb)
…vm#126308)

Prior workflow runs were not being cancelled when the pull request was
closed, and I think this was why. Also, there is no advantage to having
the definitions at the job level.

(cherry picked from commit 6e59888)
Per to discussions in llvm#125324, most participants are opposed to this
optimization. So remove the combination to address the concerns.

Fixes llvm#125324

(cherry picked from commit 8c222c1)
…llvm#125714)

We can emit diagnostics while parsing warning-suppression-mapping, make
sure command line flags take affect when emitting those.

(cherry picked from commit ecb016a)
…5982)

Summary:
Pretty dumb mistake of me, forgot that this is run per-device and
per-plugin, which fell through the cracks with my testing because I have
two GPUs that use different plugins.

(cherry picked from commit 7a87794)
LLVM has started to emit AArch64 build attributes sections called
.ARM.attributes. LLD does not yet have support for these so they are
accumulating in the ELF output. As the first part of that support
discard all the .ARM.attributes sections. This can be built upon by the
full implementation in LLD.

The build attributes specification only defines build attributes for
relocatable objects. The intention for LLD is that files of type ET_EXEC
and ET_SHARED will not have a build attributes in the output. A
relocatable link with -r will need a merged build attributes, but until
the merge is implemented it is better to discard.

(cherry picked from commit ba476d0)
These features FEAT_FAMINMAX, FEAT_LUT and FEAT_FP8 depends on
FEAT_NEON.

Update dependency from FEAT_FP8DOT4 and FEAT_FP8DOT2. Now depends
indirectly on FEAT_NEON through FEAT_FP8
…9.3. (llvm#125261)

As added in llvm#124274, CPUs in this range can suffer from performance
issues with ldapur. As the gain from ldar->ldapr is expected to be
greater than the minor gain from ldapr->ldapur, this opts to avoid the
instruction under the default -mcpu=generic when the -march is less that
armv8.8 / armv9.3.

I renamed AArch64Subtarget::Others to AArch64Subtarget::Generic to be
clearer what it means.

(cherry picked from commit 6424abc)
…lvm#125894)

Another follow up fix to
llvm#123910 to fix a build failure
that sometimes happens in shared library builds:
https://lab.llvm.org/buildbot/#/builders/50/builds/9724

In file included from
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Transforms/TestInlining.cpp:16:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/mlir/test/lib/Transforms/../Dialect/Test/TestOps.h:148:10:
fatal error: 'TestOps.h.inc' file not found
  148 | #include "TestOps.h.inc"
      |          ^~~~~~~~~~~~~~~
1 error generated.

(cherry picked from commit ebd23f2)
…lvm#124970)

The __is_trivially_relocatable builtin has semantics that do not
correspond to any current or future notion of trivial relocation.
Furthermore, it currently leads to incorrect optimizations for some
types on supported compilers:
- Clang on Windows where types with non-trivial destructors get
  incorrectly optimized
- AppleClang where types with non-trivial move constructors get
  incorrectly optimized

Until there is an agreed upon and bugfree implementation of what it
means to be trivially relocatable, it is safer to simply use trivially
copyable instead. This doesn't leave a lot of types behind and is
definitely correct.

(cherry picked from commit accfbd4)
If a variant part has a 128-bit discriminator, then
DwarfUnit::constructTypeDIE will assert.  This patch fixes the problem
by allowing any size of integer to be used here.  This is largely
accomplished by moving part of DwarfUnit::addConstantValue to a new
method.

Fixes llvm#119655

(cherry picked from commit 3c28076)
… chains until the end of `finishPendingActions`. (llvm#121245)

The call to `hasBody` inside `finishPendingActions` that bumps the `PendingIncompleteDeclChains`
size from `0` to `1`, and also sets the `LazyVal->LastGeneration` to `6` which matches
the `LazyVal->ExternalSource->getGeneration()` value of `6`. Later, the iterations over `redecls()`
(which calls `getNextRedeclaration`) is expected to trigger the reload, but it **does not** since
the generation numbers match.

The proposed solution is to perform the marking of incomplete decl chains at the end of `finishPendingActions`.
This way, **all** of the incomplete decls are marked incomplete as a post-condition of `finishPendingActions`.
It's also safe to delay this operation since any operation being done within `finishPendingActions` has
`NumCurrentElementsDeserializing == 1`, which means that any calls to `CompleteDeclChain` would simply
add to the `PendingIncompleteDeclChains` without doing anything anyway.

(cherry picked from commit a9e249f)
Rename MachOCompactUnwindInfoSectionName to MachOCompactUnwindSectionName.

Background:

There are two related sections used for compact-unwind info processing:
__LD,__compact_unwind -- the input table stored in relocatable object formats,
and __TEXT,__unwind_info -- the compressed table produced by the linker and
consumed by libunwind. To keep the distinction clear we'll use *CompactUnwind*
for names that refer to the __LD,__compact_unwind input tables and *UnwindInfo*
for names that refer to the __TEXT,__unwind_info output tables. Dropping 'Info'
from the variable above clarifies which section it refers to.

(cherry picked from commit a1ff2d1)
We already had a -jit-mode=orc-lazy regression test for this, but it should
work equally well in non-lazy mode.

(cherry picked from commit b46211b)
…est.

b46211b, which introduced a new copy of the minimal-throw-catch.ll test,
failed to update the run line.

(cherry picked from commit c0f7ebe)
…ame.

a1ff2d1 should have disambiguated MachOCompactUnwindInfoSectionName to
MachOUnwindInfoSectionName, given how it's used in MachOPlatform and its
value.

A MachOCompactUnwindSectionName variable with an appropriate value will be
added in an upcoming patch to re-enable compact-unwind support in JITLink.

(cherry picked from commit b84ac58)
Unwind-info records only have one keep-alive edge to their target function, but
eh-frame records may have multiple edges (to the CIE, function, personality, and
lsda). We need to identify the target-function edge differently for each section
type.

(cherry picked from commit 52b5e36)
Some eh-frame records are CIEs, which don't point to functions. We need to skip
these records. This patch reuses EHFrameCFIBlockInspector to identify function
targets, rather than a custom loop. Any performance impact will be minimal, and
essentially irrelevant once compact-unwind support re-lands (since at that
point we'll discard most eh-frame records).

For unwind-info sections: don't assume one block per record: the unwind-info
section packs all records into a single block.

(cherry picked from commit 9de581b)
…fixes.

Re-enables compact-unwind support in JITLink, which was reverted in b04847b
due to buildbot failures.

The underlying cause for the failures on the buildbots was the lack of
compact-unwind registration support on older Darwin OSes. Since the
CompactUnwindManager pass now removes eh-frames by default we were left with
unwind-info that could not be registered. On x86-64, where eh-frame info is
produced by default the solution is to fall back to using eh-frames. On arm64
we simply can't support exceptions on older OSes.

This patch updates the EHFrameRegistrationPlugin to remove the compact-unwind
section (__LD,__compact_unwind) when installed, forcing use of eh-frames when
the EHFrameRegistrationPlugin is used. In LLJIT, the EHFrameRegistrationPlugin
continues to be used for all non-Darwin platform, and will be added on Darwin
platforms when the a CompactUnwindRegistrationPlugin instance can't be created
(e.g. due to missing support for compact-unwind info registration).

The lit.cfg.py script is updated to check whether the host OSes default unwind
info supports JIT registration, allowing tests to be disabled for older Darwin
OSes on arm64.

(cherry picked from commit eae6d6d)
SecondLevelPageOffset should be incremented by SecondLevelPageSize bytes, not
one byte.

Failure to calculate the offset correctly leads to corrupted unwind-info (and
consequently broken exceptions / unwinding) when more than one second level
page is needed. Since JITLink's unwind support only produces
UNWIND_SECOND_LEVEL_REGULAR-style pages this would trigger for any file
containing more than 511 functions with unwind info. The included test-case
contains 1022 functions (sufficient for both the current format and any
future implementation that supports UNWIND_SECOND_LEVEL_COMPRESSED pages).

Thanks to @edoardo on discord for spotting this bug!

(cherry picked from commit 88f55d1)
lei137 and others added 19 commits March 27, 2025 16:34
Enables conversion between f16 and f128.
Expanding on pre-Power9 targets and using HW instructions on Power9.

Fixes llvm#92866
Commandeer of:  llvm#97677

---------

Co-authored-by: esmeyi <[email protected]>
(cherry picked from commit ade22fc)
Set the default compilation target to V68 if no Hexagon processor is
specified at the command-line.
Add the elf header changes for v81/v83/v85 architectures.

(cherry picked from commit 759ef58)
Set the default processor version to v68 when the user does not specify
one in the command line. This includes changes in the LLVM backed and
linker (lld). Since lld normally sets the version based on inputs, this
change will only affect cases when there are no inputs.

Fixes llvm#127558

(cherry picked from commit c0b2c10)
…3291)

Reverts llvm#108880 .

The patch has no regression test, no description of why the fix is
necessary, and the code is modifying MC datastructures in a way that's
forbidden in the AsmPrinter.

Fixes llvm#132055.

(cherry picked from commit cd6e959)
In 664f345, a fix was introduced,
attempting to restore LLVM_DIR and Clang_DIR after doing
find_package(Clang).

However, 6775285 added a return if the
clangTidy target wasn't found. If this is hit, we don't restore LLVM_DIR
and Clang_DIR, which causes strange effects if CMake is rerun a second
time.

Move the code for restoring LLVM_DIR and Clang_DIR to directly after the
find_package calls, to make sure they are restored, regardless of the
find_package outcome.

(cherry picked from commit 51bceb4)
…llvm#132367)

-Wnon-trivial-memcall was incorrectly added to modified flags instead of
added flags. This commit moves it to the added compiler flags.
The directive temporarily switches to the .sxdata section to emit data,
and then calls `insert`, which makes `CurFrag` out of sync of the
current section. Call push/switch/pop instead.

Related to llvm#132464

(cherry picked from commit ece72e2)
This attempts to put limits onto CombineBaseUpdate for degenerate cases
like llvm#127477. The biggest change is to add a limit to the number of base
updates to check in CombineBaseUpdate. 64 is hopefully plenty high
enough for most runtime unrolled loops to generate postinc where they
are beneficial.

It also moves the check for isValidBaseUpdate later so that it only
happens if we will generate a valid instruction. The 1024 limit to
hasPredecessorHelper comes from the X86 backend, which uses the same
limit.

I haven't added a test case as it would need to be very big and my
attempts at generating a smaller version did not show anything useful.

Fixes llvm#127477.

(cherry picked from commit 86cf4ed)
…H module file refs (llvm#105994) (llvm#132802)

The File ID is incorrectly calculated, resulting in an out-of-bounds
access. The test code is more complex because the File fetching only
happens in specific scenarios.

---------

Co-authored-by: ShaderKeeper <[email protected]>
Co-authored-by: Chuanqi Xu <[email protected]>
(cherry picked from commit cca0f81)
Reviewed By: SixWeining

Pull Request: llvm#133224

(cherry picked from commit 725a7b6)
This also fixes errors when using Clang with step-by-step compilation.
Because the optimization will pass relocation information to memory
access instructions. For example:
t.c:
```
float f = 0.1;
float foo() { return f;}
```
```
clang --target=loongarch64 -O2 -c t.c --save-temps
```

Reviewed By: tangaac, SixWeining

Pull Request: llvm#133225

(cherry picked from commit d055e58)
While building llvm (clang, lld) against emscripten we see this
[error](https://github.com/emscripten-forge/recipes/actions/runs/13803029307/job/38608794602#step:9:1715)

```
 │ │ In file included from $SRC_DIR/llvm/lib/Frontend/OpenACC/ACC.cpp:9:
 │ │ $SRC_DIR/build/include/llvm/Frontend/OpenACC/ACC.h.inc:192:1: error: unknown type name 'LLVM_ABI'
 │ │   192 | LLVM_ABI Directive getOpenACCDirectiveKind(llvm::StringRef Str);
 │ │       | ^
 │ │ $SRC_DIR/build/include/llvm/Frontend/OpenACC/ACC.h.inc:192:19: error: expected ';' after top level declarator
 │ │   192 | LLVM_ABI Directive getOpenACCDirectiveKind(llvm::StringRef Str);
 │ │       |                   ^
 ```

Now this was happening because we weren't defining LLVM_ABI correctly when building against emscripten. If you see [llvm/Support/Compiler.h](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/Compiler.h#L206-L210), the condition only checked for the platform __WASM__ . Now Emscripten targets WebAssembly but doesn't imply the platform by default so the check isn't complete to define LLVM_ABI.

The successful build after using this patch can be seen [here](https://github.com/emscripten-forge/recipes/actions/runs/13805214092/job/38614585621)

(cherry picked from commit e57cd10)
@owenca owenca changed the title Release/20.x: [clang-format] Set C11 instead of C17 for LK_C (#134472) Release/20.x: [clang-format] Set C11 instead of C17 for LK_C Apr 5, 2025
@owenca owenca closed this Apr 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.