Skip to content

Conversation

@svkeerthy
Copy link
Contributor

No description provided.

IgWod-IMG and others added 30 commits May 11, 2025 19:45
…lvm#138685)

With the current implementation only one attribute is attached to the
argument and the deserializer fails if more decorations are specified,
however I believe that the spec does not prohibit having both
`Aliased`/`Restrict` and `RelaxedPrecision`. I am not sure how to attach
multiple attributes to a single argument with the current code and at
the same time I do not have a use case for it, so I think the patch in
the current state is a good starting point and can be extended in the
future.
…hruVMV_V_V (llvm#138847)

Without clearing kill flags, this pass will generate bad machine code.

```
*** Bad machine code: Using a killed virtual register ***
- function:    main
- basic block: %bb.0 entry (0x437ef928)
- instruction: %12:vrn7m1 = INSERT_SUBREG %11:vrn7m1(tied-def 0), %0:vr, %subreg.sub_vrm1_0
- operand 2:   %0:vr
```
)

Copy the minnum and maxnum tests into versions with minimum/maximum
and minimumnum/maximumnum.
With the IEEE bit disabled, the hardware instructions have the
same behavior as these operations.
…vm#137849)

Do not suppress the pointer overflow check for the `(i8*) nullptr + N`
idiom.

Related issue: llvm#137833
Add a new Cygwin toolchain that just goes through the motions to
initialize the Generic_GCC base properly. This allows removing some old,
almost certainly wrong hard-coded paths from Lex/InitHeaderSearch.cpp.

MSYS2 (GCC triple (arch)-pc-msys) is a fork of Cygwin (GCC triple
(arch)-pc-cygwin), and this driver can be used for either.

Add a simple test for this driver.
…138879)

We often see initializers like

unsigned a = 10;

which take an integer literal and immediately cast it to another type.
Recognize this pattern and omit the cast, simply emitting the value as a
different type directly.

This reduces the instruction count by up to 0.13%:
http://llvm-compile-time-tracker.com/compare.php?from=303436c6d16518b35288d63a859506ffcc1681e4&to=648f5202f906d1606390b2d1081e4502dc74acc2&stat=instructions:u
…8673)"

This reverts commit d35ad58.

This breaks the clang build:
https://lab.llvm.org/buildbot/#/builders/132/builds/1033

/home/buildbot-worker/bbroot/clang-riscv-rva23-evl-vec-2stage/stage2/lib/Target/RISCV/RISCVGenGlobalISel.inc:1512:44: note: cannot allocate array; evaluated array bound 2431270 exceeds the limit (1048576); use '-fconstexpr-steps' to increase this limit
The context string can be added to indicate the source of the duplicate
definition. E.g. if the context is set to "foo2.o", then:

"Duplicate definition of symbol 'foo'"

becomes

"In foo2.o, duplicate definition of symbol 'foo'".

The JITDylib::defineImpl method is updated to use the name of the
MaterializationUnit being added as the context string for duplicate definition
errors. The JITDylib::defineMaterializing method is updated to use
"defineMaterializing operation" as the conext string.
…lvm#138419)

The shuffle needn't be twice the original number of vector elements, so
the intermediate type used between the shuffle and the intrinsic should
use the ShuffleDstTy number of elements.

I found this when looking at shuffle costs and do not have test where it
alters the output, but have added some cases where the shuffle output is
not twice the size of the input.
FEAT_FP8DOT4 and FEAT_FP8FMA are supported by FUJITSU-MONAKA. These were
previously enabled due to dependencies, but now require explicit
activation due to modifications in the dependencies.
This is needed for forced unwind, for some testcases in
libunwind/libcxxabi.

This adds an aarch64 case for extracting the LanguageHandler and
HandlerData fields from unwind info, in UnwindCursor::getInfoFromSEH,
corresponding to the existing case for x86_64.

This uses the struct IMAGE_ARM64_RUNTIME_FUNCTION_ENTRY_XDATA; this only
became available in WinSDK 10.0.19041.0 and mingw-w64 v11.0 (or a
mingw-w64 git snapshot after April 2023).

(This is only a build-time requirement though; the format for the unwind
data has been fixed since the start of Windows 10 on ARM64, so this
doesn't impose any runtime requirement.)
The SCEV multiply by 1 doesn't make sense, because SCEV would fold it:
therefore, the OrigPtr == Ptr branch effectively rejects a multiply.
However, in this branch, we have a pointer SCEV that cannot be a
multiply, and hence the code the code is dead. Strip it.
Now we define FMAXNUM and FMINNUM as IEEE754-2008 with +0.0>-0.0.
LoongArch's fmax/fmin just follow this rules full.

FMAXNUM_IEEE and FMINNUM_IEEE will be removed in future once:

Fixes FMAXNUM/FMINNUM for all targets
The use of FMAXNUM_IEEE/FMINNUM_IEEE are not used by middle end anymore.
…8875)

This is a fix for the issue llvm#137126
that turned out to be a driver issue.

FrontendActions has a loop to process multiple input files and `flang -fc1`
accept multiple files, but the semantic, lowering, and llvm codegen
actions were not re-entrant, and crash or weird behaviors occurred
when processing multiple files with `-fc1`. 

This patch makes the actions reentrant by cleaning-up the contexts/modules
if needed on entry.
It failed on armv7 with "Architecture not supported" which is due to
StubTests being not supported on ARM


/builds/fossdd/aports/main/llvm20/src/llvm-project-20.1.0.src/llvm/unittests/ExecutionEngine/Orc/ReOptimizeLayerTest.cpp:140:
Failure
	Value of: llvm::detail::TakeError(RM.takeError())
	Expected: succeeded
Actual: failed (Architecture not supported) (of type
llvm::detail::ErrorHolder)
Resolves llvm#137162

For cases when there isn't any `XOR` in the transformation, replace with
a zero register.
When diagnosing scheduling issues it can be useful to know which
heuristics are driving the scheduler. This adds pre-RA and post-RA
statistics for all heuristics.
Ankur-0429 and others added 18 commits May 11, 2025 19:46
Update recipe construction to use VPBBs to look up masks, in preparation
for llvm#128420.
This consolidates node definitions into one place and enables automatic
node verification.

Part of llvm#119709.
Prefer DenseMap::lookup over DenseMap::find.
Directly compute costs for binary ops and GEPs in
VPReplicateRecipe::computeCost. This simply ports the legacy cost
computation for uniform/replicating binary ops to the VPlan cost model.
This patch uses consume_back while changing the type of TrailingDot to
bool, indicating whether we have consumed "." or not.
Extract values state and operands analysis/building into a separate
class. This class allows to localize instrutions state and operands
building for future support of copyable elements vectorization.

Recommit after revert 10f5120

Recommit after revert 6a2a8eb

Reviewers: HanKuanChen, RKSimon

Reviewed By: HanKuanChen

Pull Request: llvm#138724
…m#139455)

StringRef::substr is shorter here because we can rely on its default
second parameter.
@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@svkeerthy svkeerthy closed this May 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.