Skip to content

Conversation

aemerson
Copy link
Contributor

[clang-reorder-fields] Check for flexible array member (#160262)

A flexible array member must remain the last field in the struct.

[CAS][CMake] Fix rhel bots missing symbol failure from #114100 (#161283)

Link LLVM_PTHREAD_LIB from LLVMCAS library to fix rhel bots.

[clang-scan-deps] Remove const from ModuleDeps loop to enable move. (#161109)

This changes the iteration from const to non-const so that std::move
results in a true move rather than a copy.

Create function declaration in the proper module (#161281)

Using memref.dealloc in the gpu module would add a function definition
for @free in the the top level module instead of the gpu module. The
fix is to do what is already done for memref.alloc which is to use
op->getParentWithTrait<OpTrait::SymbolTable>() instead of
op->getParentOfType<ModuleOp>() to create the call in the proper
module.

[lld][WebAssembly] Fix visibility of __stack_pointer global (#161284)

The stack pointer should be global, not hidden / dso-local. Marking it
as global allows it to be exported from the main module and imported
into side modules.

[CIR] Add GlobalOp ctor and dtor regions (#160779)

This adds support for ctor and dtor regions in cir::GlobalOp. These
regions are used to capture the code that initializes and cleans up the
variable, keeping this initialization and cleanup code with the variable
definition.

This change only adds the CIR dialect support for these regions. Support
for generating the code in these regions from source and lowering these
to LLVM IR will be added in a later change, as will LoweringPrepare
support to move the code into the __cxx_global_var_init() function.

[Clang] Avoid null deref in lambda attribute compat warning (#161096)

Fixes #161070


This PR addresses the issue in ext_decl_attrs_on_lambda by using
%0=attribute name and %1=selector, which prevents a null
IdentifierInfo*.

Diag(Tok, getLangOpts().CPlusPlus23
? diag::warn_cxx20_compat_decl_attrs_on_lambda
: diag::ext_decl_attrs_on_lambda)
<< Tok.getIdentifierInfo() << Tok.isRegularKeywordAttribute();

def ext_decl_attrs_on_lambda : ExtWarn<
"%select{an attribute specifier sequence|%0}1 in this position "
"is a C++23 extension">, InGroup<CXX23AttrsOnLambda>;

def warn_cxx20_compat_decl_attrs_on_lambda : Warning<
"%select{an attribute specifier sequence|%1}0 in this position "
"is incompatible with C++ standards before C++23">,
InGroup<CXXPre23Compat>, DefaultIgnore;

[lld][macho][NFC] Add release note for #158720 (#161295)

I forgot to add a release note for
#158720 so I'll add it here.

[llvm][mustache] Support setting delimiters in templates (#159187)

The base mustache spec allows setting custom delimiters, which slightly
change parsing of partials. This patch implements that feature by adding
a new token type, and changing the tokenizer's behavior to allow setting
custom delimiters.

[profcheck] Add unknown branch weight for inlined memchr calls. (#160964)

The memchr inliner creates new switch branches but was failling to add
profile metada. This patch fixes the issue by explicitly adding unknown
branch weights to these branches.

Issue #147390

[llvm][mustache] Refactor tokenizer for clarity (#159188)

This patch refactors the Mustache tokenizer by breaking the logic up
with helper functions to improve clarity and simplify the code.

[LoongArch][NFC] Pre-commit tests for xvinsve0.{w/d} (#160829)

[mlir][GPU] Generalize gpu.printf to not need gpu.module (#161266)

In order to make the gpu.printf => [various LLVM calls] passes less
order-dependent and to allow downstreams that don't use gpu.module to
use gpu.printf, allow the flowerings for such prints to target the
nearest SymbolTable instead.

[llvm][mustache] Refactor template rendering (#159189)

Move the rendering logic into the ASTNode, and break the logic down into
individual methods.

[LoongArch] Custom legalize vector_shuffle to xvinsve0.{w/d} when possible (#161156)

[analyzer] Use sed from the ToolBox on AIX (NFC) (#161242)

The change in commit 30402c7 breaks the tests on AIX. This patch
is to change to use the sed from AIX Toolbox instead of the default
one which does not support -r and -E.

[LoongArch] Add R_LARCH_MARK_LA relocation for la.abs

Match gas behavior: generate R_LARCH_MARK_LA relocation for la.abs.

Reviewers: heiher, SixWeining

Reviewed By: SixWeining, heiher

Pull Request: #161062

[llvm][mustache] Remove out parameters from processTags() (#159190)

We can construct the return values directly and simplify the interface.

[sanitizer] Handle nullptr name in prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME) (#160824)

Summary

This PR resolves #160562

[CUDA] Enable variadic argument support in front-end (#161305)

Variadice argument for NVPTX as been support in

486d00e
We can enable it in front-end.

Co-authored-by: Yuanke Luo [email protected]

ELF: Rename Relocations.cpp functions and rewrite the file-level comment. NFC

Pull Request: #161229

ELF: Use preprocessed relocations for EhInputSection scanning

.eh_frame sections require special sub-section processing, specifically,
CIEs are de-duplicated and FDEs are garbage collected. Create a
specialized scanEhSection() function utilizing the just-added
EhInputSection::rels. OffsetGetter is moved to scanEhSection.

This improves separation of concerns between InputSection and
EhInputSection processing.

This removes another relsOrRelas call using supportsCrel=false.
DWARF.cpp now has the last call.

Pull Request: #161091

[llvm][mustache] Introduce MustacheContext to simplify mustache APIs (#159191)

[DAGCombiner] Remove most NoSignedZerosFPMath uses (#161180)

Remained two uses are related to fneg and foldFPToIntToFP, some AMDGPU
tests are duplicated and regenerated.

[clang-format] Fix a bug in wrapping { after else (#161048)

Fixes #160775

[clang][Diags] Automatically format AP(S)Int values with separators (#161047)

This adds an operator<< overload for StreamingDiagnostic that takes
an APInt/APSInt and formats it with default options, including
adding separators.

This is still an opt-in mechanism since all callers that want to use
this feature need to be changed from

  Diag() << toString(MyInt, 10);

to

  Diag() << MyInt;

This patch contains one example of a diagnostic making use of this.

[RISCV] Add commutative support for Qualcomm uC Xqcicm extension (#160653)

This is a follow-up to #145643. See
#145643 (comment).

[MLIR][Python] Fix stubgen/PYTHONPATH collision/bug (#161307)

If PYTHONPATH is set and points to the build location of the python
bindings package then when stubgen runs, _mlir will get imported twice
and bad things will happen (e.g., Assertion !instance && “PyGlobals already constructed”’). This happens because mlir is a namespace
package and the importer/loader can't distinguish between
mlir._mlir_libs._mlir and _mlir_libs._mlir imported from CWD. Or
something like that. The fix is to filter out any entries in
PYTHONPATH that end in MLIR_BINDINGS_PYTHON_INSTALL_PREFIX/.. (e.g.,
python_packages/mlir_core/).

[clang][libc++] Fix spelling of "synthesize" (#158523)

There is a tradition to use U.S. English spellings for APIs. For
example, it's uninitialized_fill and not uninitialised_fill,
specialization not specialisation, etcetera.

[clangd] Fix off-by-one error in CommandMangler (#160029)

SawInput() is intended to be called for every argument after a --, but
it was mistakenly being called for the -- itself.

Partially fixes clangd/clangd#1850

[WebAssembly] Use partial_reduce_mla ISD nodes (#161184)

Addresssing issue #160847.

Move away from combining the intrinsic call and instead lower the ISD
nodes, using tablegen for pattern matching.

[clang][Tooling] Support 'c++latest' in InterpolatingCompilationDatabase (#160030)

Fixes clangd/clangd#527
Fixes clangd/clangd#1850

[Modules] Make -module-file-info print macro names in deterministic order (#161332)

Developers reported non-deterministic output from -module-file-info,
thinking this reflected non-determinism in the .pcm files themselves.
However, it turned out it was the printing that was non-deterministic:

$ cat /tmp/a.h
#define FOO 1
#define BAR 2

$ build/bin/clang -cc1 -std=c++20 -x c++ -emit-header-unit /tmp/a.h -o /tmp/a.pcm

$ build/bin/clang -cc1 -module-file-info /tmp/a.pcm | grep -A2 Definitions
   Macro Definitions:
     FOO
     BAR

$ build/bin/clang -cc1 -module-file-info /tmp/a.pcm | grep -A2 Definitions
   Macro Definitions:
     BAR
     FOO

Making the output deterministic also simplifies the test.

This is a follow-up to 360c5fe

[MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in LevelZeroRuntimeWrappers.cpp (NFC)

[MLIR] Apply clang-tidy fixes for readability-container-size-empty in Vectorization.cpp (NFC)

[VectorCombine] Preserve cast flags in foldBitOpOfCastConstant (#161237)

Follow-up of #157822.

[Unifomity] Remove unused PhiInput definition (NFC) (#161116)

This appears to have no users.

[mlir][memref-to-spirv]: Remap Image Load Coordinates (#160495)

When converting a memref.load from the image address space to a
spirv.ImageFetch ensure that we correctly map the load indices to
width, height and depth.

The lowering currently assumes a linear image tiling, that is row-major
memory layout. This allows us to support any memref layout that is a
permutation of the dimensions, more complex layouts are not currently
supported. Because the ordering of the dimensions in the vector passed
to image fetch is the opposite to that in the memref directions a final
reversal of the mapped dimensions is always required.


Signed-off-by: Jack Frankland [email protected]

[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in ReifyValueBounds.cpp (NFC)

[LV] Add ExtNegatedMulAccReduction expression type (#160154)

This PR adds the ExtNegatedMulAccReduction expression type for
VPExpressionRecipe so that extend-multiply-accumulate reductions with a
negated multiply can be bundled.

Stacked PRs:

  1. [LV] Keep duplicate recipes in VPExpressionRecipe #156976
  2. -> [LV] Add ExtNegatedMulAccReduction expression type #160154
  3. [LV] Bundle partial reductions inside VPExpressionRecipe #147302

[AArch64] Simplify some masked integer comparisons. (#153783)

Specifically, X & M ?= C --> (C << clz(M)) ?= (X << clz(M)) where M is
a non-empty sequence of ones starting at the least significant bit with
the remainder zero and C is a constant subset of M that cannot be
materialised into a SUBS (immediate). Proof:
https://alive2.llvm.org/ce/z/haqdJ4.

This improves the comparison in isinf, for example:

int isinf(float x) {
  return __builtin_isinf(x);
}

Before:

isinf:
  fmov    w9, s0
  mov     w8, #2139095040
  and     w9, w9, #0x7fffffff
  cmp     w9, w8
  cset    w0, eq
  ret

After:

isinf:
  fmov    w9, s0
  mov     w8, #-16777216
  cmp     w8, w9, lsl #1
  cset    w0, eq
  ret

[x86][AVX-VNNI] Fix VPDPBXXD Argument Type (#159222)

Fixed intrinsic VPDP[SS,SU,UU]D[,S]_128/256/512's argument types to match with the ISA.
Fixes part of #97271.

[BOLT] Refactor MCInstReference and move it to Core (NFC) (#155846)

Refactor MCInstReference class and move it from PAuth gadget scanner to
Core.

MCInstReference is a class representing a reference to a constant
instruction inside a parent entity - either inside a basic block (which
has a reference to its parent function) or directly inside a function
(when CFG information is not available).

This patch reapplies #138655 with a fix for iterator usage and multiple
minor issues fixed during the second round of review.

[LV] Don't preserve LCSSA in SCEVExpander for runtime checks. (#159556)

LV does not preserve LCSSA, it constructs it just before processing a
loop to vectorize. Runtime check expressions are invariant to that loop,
so expanding them should not break LCSSA form for the loop we are about
to vectorize.

This fixes a crash when discarding instructions generated when expanding
runtime checks, if the expansion introduces LCSSA phis for values from
other loops which are not in LCSSA form: we would introduce new LCSSA
phis and update all outside users, some of which are not created by the
expander and cannot be cleaned up.

Fixes #158259.

PR: #159556

[Flang] Add perfect-nest and rectangular-loop semantic tests (#160283)

Add semantic tests of currently unsupported OpenMP canonical loops:

  • non-perfectly nested canonical loop nests
  • non-rectangular canonical loop nests

Both were introduced in OpenMP 5.0 and are not yet supported by Flang.

The message "Trip count must be computable and invariant" is the same
that OpenACC emits for non-rectangular loops in
AccAttributeVisitor::CheckAssociatedLoop. I considered reusing the
code, but calls OpenACC-only methods and has different behavior (e.g.
symbol resolution and does not check the step operand)

[LV] Preserve GEP nusw when widening memory (#160885)

AMDGPU: Fix s_barrier_leave to write to scc (#161221)

s_barrier_leave implicitly defines $scc
and does not use imm that represents type of barrier,
isel pattern ignores imm operand from llvm intrinsic.
Test if SIInsertWaitcnts tracks this scc write.

[clang-repl] Teach clang-repl how to load PCHs (reprise) (#157359)

This is an updated version of @vgvassilev's PR from last year here:
#94166

In short, it includes:

  1. The fix for a blocking issue where clang::Interpreter (and thus
    clang-repl) cannot resolve symbols defined in a PCH
  2. A test to prove this is working
  3. A new hidden flag for clang-repl so that llvm-lit can match the
    host JIT triple between the PCH and clang-repl; previously, they may
    differ in some cases
  4. Everything based on the latest LLVM main

Shout out to @kylc for finding a logic issue which had us stumped for a
while (and securing the
bounty).


Co-authored-by: Vassil Vassilev [email protected]
Co-authored-by: Kyle Cesare [email protected]

[Docs][RISCV]Remove experimental from Smctr, Ssctr,Sdext and Sdtrig (#161058)

AMDGPU: Fix gcc build break (#161354)

[LLVM] Fix a bug in Intrinsic::getFnAttributes (#161248)

[OpenACC][CIR] Generate private recipe pointer/array 'alloca's (#160911)

As a next step to generating pointer/array recipes, this patch generates
just the 'alloca' lines that are necessary. Copying pointers over to
restore the structure is held off to the next patch.

In the case of a pointer, we need to allocate the level 'below' it (if
we index into it), then copy the values into the pointers. In the case
of an array, we skip the alloca (since the array's alloca contains the
value).

After this, we'll need a patch that copies the pointers into place, and
finally one that does the initialization of these values.

[Clang][PowerPC] Add __dmr2048 type and DMF crypto builtins (#157152)

Define the __dmr2048 type to represent the DMR pair introduced by the
Dense Math Facility on PowerPC, and add three Clang builtins
corresponding to DMF cryptography:

__builtin_mma_dmsha2hash
__builtin_mma_dmsha3hash
__builtin_mma_dmxxshapad

The __dmr2048 type is required for the dmsha3hash crypto builtin, and,
as withother PPC MMA and DMR types, its use is strongly restricted.

[RISCV][NFC] Update ratified extensions list in riscv-target-features.c

[SLPVectorizer] Remove align 16 in a test. (#161251)

It is not necessary.

[flang][OpenMP] Move semantic checks for ALLOCATE to check-omp-structure (#161249)

The checks were previously in resolve-directives, which is mostly
intended for determining symbol properties, not performing semantic
checks.

[SLPVectorizer] Clear TreeEntryToStridedPtrInfoMap. (#160544)

We need to clear TreeEntryToStridedPtrInfoMap in deleteTree.

[flang][debug] Generate splitDebugFilename field in DICompileUnitAttr. (#161214)

This PR builds on #160540 and
allows us to set the splitDebugFilename field in DICompileUnitAttr.
The changes are mostly mechanical.

I saw some spurious white space in a test that I have cleaned up.

[MLIR] Apply clang-tidy fixes for readability-container-size-empty in LinalgTransformOps.cpp (NFC)

[AArch64][SVE2p1] Allow more uses of mask in performActiveLaneMaskCombine (#159360)

The combine replaces a get_active_lane_mask used by two extract
subvectors with a single paired whilelo intrinsic. When the instruction
is used for control flow in a vector loop, an additional extract of element
0 may introduce other uses of the intrinsic such as ptest and reinterpret
cast, which is currently not supported.

This patch changes performActiveLaneMaskCombine to count the number
of extract subvectors using the mask instead of the total number of uses,
and returns the concatenated results of get_active_lane_mask.

[IR] Don't create ptrtoint expression to determine alignment (NFCI) (#161364)

We try to determine the alignment of a constant by creating a ptrtoint
expression and seeing if it folds. I believe the only case this can
actually handle is where the constant is an inttoptr expression. Handle
that directly instead of going through another ptrtoint expression.

I ran into this while trying to clean up our isEliminableCastPair()
mess, which is going to disable ptrtoint(inttoptr) folding without
DataLayout, breaking this code.

AMDGPU: Use srcvalue and delete Ignore complex pattern (#161359)

[MLIR] Add sincos op to math dialect (#160772)

Now that sincos is a supported intrinsic in the LLVM dialect
(#160561) we are able to add the corresponding operation in
the math dialect and add conversion patterns for LLVM and NVVM.

We have several benchmarks that use sine and cosine in hot-loops, and
saving some calculations by performing them together can benefit
performance. We would like to have a way to represent sincos in the math
dialect.

[analyzer][NFC] Explain why operator new/delete should never be eval-called (#161370)

Downstream, some change triggered an investigation if we could move a
checker callback from check::PostCall to eval::Call. After a lengthy
investigation that lead to ExprEngine::VisitCXXNewExpr we realized that
CXXNewExprs only trigger a PreCall and PostCall, but never an EvalCall.
It also had a FIXME that maybe it should trigger it.

Remember, it called defaultEvalCall which either inlines or
conservatively evaluates aka. invalidates the call. But never probes the
checker eval-calls to see if any would step in.

After implementing the changes to trigger the eval call for the
checkers, I realized that it doesn't really make sense because we are
eval-calling user-provided functions, that we can't be really sure about
their semantics, thus there is no generic way to properly implement the
eval call callback.
This touches on an important point. It only ever makes sense to eval
call functions that has a clear spec. such as standard functions, as
implementing the callback would prevent the inlining of that function,
risking regressing analysis quality if the implemented model is not
complete/correct enough.

As a conclusion, I opted for not exposing the eval call event to
checkers, in other words, keep everything as-is, but document my
journey.

CPP-6585

[NFC][LV] Fix warning of unused SubConst variable

#160154 added an assertion
using a new variable, which caused a warning in builds without asserts.
This patch adds [[maybe_unused]] to prevent that warning.

Revert "[flang] Simplify hlfir.index in a few limited cases. (#157883)" (#161387)

This reverts commit edca510 due to reported failures.

[AArch64][SME] Fix typo in docs "block" -> "bundle" (NFC) (#161383)

[AMDGPU][GlobalISel] Add RegBankLegalize support for buffer loads with formats (#161291)

[flang][debug] Improve name generation for basic types. (#161361)

For basic types, currently, we use the type name (e.g., integer,
real) as the debug name. This results in types of different sizes
having the same name. This patch improves the naming by appending the
size in bytes to the type name (e.g., integer*8, real*8).

Fixes #160890

[SPIR-V] Implement SPV_KHR_float_controls2 (#146941)

Implementation of
SPV_KHR_float_controls2
extension, and corresponding tests.

Some of the tests make use of !spirv.ExecutionMode LLVM named
metadata. This is because some SPIR-V instructions don't have a direct
equivalent in LLVM IR, so the SPIR-V Target uses different LLVM named
metadata to convey the necessary information. Below, you will find an
example from one of the newly added tests:

!spirv.ExecutionMode = !{!19, !20, !21, !22, !23, !24, !25, !26, !27}
!19 = !{ptr @k_float_controls_float, i32 6028, float poison, i32 131079}
!20 = !{ptr @k_float_controls_all, i32 6028, float poison, i32 131079}
!21 = !{ptr @k_float_controls_float, i32 31}
!22 = !{ptr @k_float_controls_all, i32 31}
!23 = !{ptr @k_float_controls_float, i32 4461, i32 32}
!24 = !{ptr @k_float_controls_all, i32 4461, i32 16}
!25 = !{ptr @k_float_controls_all, i32 4461, i32 32}
!26 = !{ptr @k_float_controls_all, i32 4461, i32 64}
!27 = !{ptr @k_float_controls_all, i32 4461, i32 128}

!spirv.ExecutionMode contains a list of metadata nodes, and each of
them specifies the required operands for expressing a particular
OpExecutionMode instruction in SPIR-V. For example, !19 = !{ptr @k_float_controls_float, i32 6028, float poison, i32 131079} will be
lowered to OpExecutionMode [[k_float_controls_float_ID]] FPFastMathDefault [[float_type_ID]] 131079.


Co-authored-by: Dmitry Sidorov [email protected]

[CodeGen] Fix performance regression introduced in b05101b

The isNormalValueType = false flag was not set for this pseudo value
type, which caused significant size increases for some classes: the
size of the TargetLoweringBase class to 1.5 MB, because the size of
that class is quadratic in MVT::VALUETYPE_SIZE, and this commit
increased that from 256 to 504.

Reported by: abadams
Fixes: b05101b ("[TableGen, CodeGen, CHERI] Add support for the cPTR wildcard value type.")

Reviewed By: nikic

Pull Request: #161313

Aarch64: Emit a minimal SEH prologue when needed (#158173)

In some cases, with very simple thunks, it is possible that the
.seh_endprologue is not emitted. This causes issues in the assembler
because the epilogue ends up starting before the prologue has ended.

Bug: swiftlang#11377

PeepholeOpt: Try to constrain uses to support subregister (#161338)

This allows removing a special case hack in ARM. ARM's implementation
of getExtractSubregLikeInputs has the strange property that it reports
a register with a class that does not support the reported subregister
index. We can however reconstrain the register to support this usage.

This is an alternative to #159600. I've included the test, but
the output is different. In this case version the VMOVSR is
replaced with an ordinary subregister extract copy.

[libc++][test] Use ASSERT_WITH_LIBRARY_INTERNAL_ALLOCATIONS in more places (#144339)

ASSERT_WITH_LIBRARY_INTERNAL_ALLOCATIONS allows waiving asserts, for
cases when we can't count allocations that happen within the libc++
shared library.

When compiling with optimization, it is possible that some calls end up
generated inline, where the overridden operator new/delete do get called
(counting those calls), whereas the compiler may decide to leave some
calls to the external definition (inside the shared library, where we
can't count the calls).

In particular, in one case, a non-optimized build calls
_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEED1Ev from
the DLL, while it gets inlined (including direct calls to operator
delete) when built with optimization.

Therefore; for the cases where we can't count allocations internally
within the library, waive these asserts.

This fixes all testcases in mingw mode, when built with optimization
enabled.

[MemorySanitizer] Generate some test checks (NFC)

[libc++][istream] P3223R2: Making std::istream::ignore less surprising (#147007)

Implements https://wg21.link/P3223R2 as a DR as, as recommended in
cplusplus/papers#1871 (comment).
Resolves -1L ambiguity.

Closes #148178

[clang-tidy] New Option Invalid Enum Default Initialization (#159220)

Added a new Option IgnoredEnums to bugprone invalid enum default
initialization to limit the scope of the analysis. This is needed to
remove warnings on enums like std::errc where the enum doesn't define a
value of 0, but is still used to check if some function calls like
std::from_chars are executed correctly.

The C++ Standard section 22.13.2 mentions the following : "[...] If the
member ec of the return value is such that the value is equal to the
value of a value-initialized errc, the conversion was successful [...]"

This means that a call to std::errc{} is clearly defined by the
standard and should not raise any warning under this check.

[flang][OpenMP] Remove unused DECLARE REDUCTION from openmp-utils.h, NFC (#161390)

DECLARE REDUCTION is now handled by the generic code, and the special
handling no longer applies.

[ADT] Add const to AllocatorList::{empty,size} (#161320)

While I am at it, this patch adds [[nodiscard]].

[ADT] Add const to operator== in ArrayRef.h (#161321)

While I am at it, this patch adds [[nodiscard]].

[ADT] Make non-const functions forward to const versions (NFC) (#161323)

These functions all correspond to their respective const versions.
This patch uses the "const_cast" trick to forward to the const
versions.

[llvm] Proofread CIBestPractices.rst (#161324)

[SPIRV] Avoid OpQuantizeToF16 in SPIR-V kernel test (#158086)

This PR resolves the current failure in the integer-casts.ll SPIR-V
test during CI runs in llvm-project.
The failure occurs because the SPIR-V instruction OpQuantizeToF16
requires the Capability::Shader. However, the function in
integer-casts.ll is written as a kernel function and executed in a
kernel environment. Therefore, Capability::Kernel is emitted instead
of Capability::Shader. To fix this, we remove the QuantizeToF16 test
frominteger-casts.ll in this PR.

[LAA] Make blockNeedsPredication arguments const (NFC).

The arguments aren't modified, mark them as const. This prepares for new
users in a follow-up, which only have access to const versions of the
arguments.

[PowerPC] Implement Elliptic Curve Cryptography (ECC) Instructions (#158362)

New instructions added:

  * xxmulmul
  * xxmulmulhiadd
  * xxmulmulloadd
  * xxssumudm
  * xxssumudmc
  * xxssumudmcext
  * xsaddadduqm
  * xsaddaddsuqm
  * xsaddsubuqm
  * xsaddsubsuqm
  * xsmerge2t1uqm
  * xsmerge2t2uqm
  * xsmerge2t3uqm
  * xsmerge3t1uqm
  * xsrebase2t1uqm
  * xsrebase2t2uqm
  * xsrebase2t3uqm
  * xsrebase2t4uqm
  * xsrebase3t1uqm
  * xsrebase3t2uqm
  * xsrebase3t3uqm

[lld][macho][NFC] Factor count zeros into helper function (#161241)

Move llvm::countr_zero() into a helper function to reduce code
duplication between CStringSection and DeduplicatedCStringSection.
More importantly, this moves a giant comment to that helper function
since it pertains to both classes.

[CIR] Upstream ParenExpr for AggregateExpr (#160998)

Upstream ParenExpr support for AggregateExpr

[MLIR][SCF] Add loops as parameter to LoopTerminator callback when using CustomOp. (#161386)

This PR adds to the generateLoopTerminatorFn callback the loops
generated by GenerateLoopHeaderFn. This is needed to correctly set the
insertion point with scf.forall ops.

[AArch64] shouldFoldMaskToVariableShiftPair should be true for scalars up to the biggest legal type (#158069)

For AArch64, we want to do this up to 64-bits. Otherwise this results in
bloated code.

[llvm-readobj][NFC] Restore and disable clang-format for machine type list (#160122)

The original code was more readable, just disable clang-format for
this code.

See #159793

Signed-off-by: Sarnie, Nick [email protected]

[lld][MachO] Use llvm::Align and remove StringOffset type (#161253)

Use llvm::Align instead of directly storing the shift amount for
clarity. Also remove the DeduplicatedCStringSection::StringOffset in
favor of simply storing the uint64_t offset since trailingZeros is
not used outside of finalizeContents(). These two changes allow us to
refactor finalizeContents().

No function change intended.

Depends on #161241.

[MLIR][XeVM] Add XeVM special id ops. (#160735)

Add special GPU id, index ops.

[CIR] Implement ChooseExpr for AggregateExpr (#160999)

Implement the ChooseExpr for aggregate expr

[flang] Emit error on impossible-to-implement construct (#160384)

An assignment to a whole polymorphic allocatable changes its dynamic
type to the type of the right-hand side expression. But when the
assignment is under control of a WHERE statement, or a FORALL / DO
CONCURRENT with a mask expression, there is no interpretation of the
assignment, as the type of a variable must be the same for all of its
elements.

There is no restriction in the standard against this usage, and no other
Fortran compiler complains about it. But it is not possible to implement
it in general, and the behavior produced by other compilers is not
reasonable, much less worthy of emulating. It's best to simply disallow
it with an error message.

Fixes #133669, or more
accurately, resolves it.

[flang][runtime] Let more list-directed child input advance (#160590)

Whether list-directed child READ statements should be allowed to advance
to further records is neither explicit in the standard nor consistent in
existing Fortran implementations. We allow child namelist READ
statements to advance, but not other list- directed child input.

This patch refines our interpretation of this case. Child namelist READ
statements continue to be able to advance; in addition, non-namelist
child READ statements can now advance if their parent READ statement is
a list-directed input statement at the top level, or a child that could.
But non-namelist list-directed child input taking place in a context
with explicit format control won't advance to following records, so that
the format-controlled parent READ statement can retain control over
record advancement.

Also corrects two cases of record repositioning in numeric input
editing, which were failing under child input because they weren't
allowing for left tab limits.

Fixes #160351.

[flang][runtime] Expand IOTYPE and V_LIST (#160744)

The IOTYPE and V_LIST dummy arguments to a defined formatted I/O
subroutine are extracted from a DT edit descriptor in a FORMAT. They are
currently stored in the DataEdit structure, and their maximum sizes are
rather small since DataEdits are sometimes returned or passed by value.

This patch moves their storage into the FormattedIoStatementState
structure and enlarges them a bit.

Fixes #154954.

[flang] Fix crash in structure constructor lowering (#160769)

MLIR types created by lowering for structure constructors appear to be
sensitive to the ordering of their components in the typed expression
representation used for structure constructors and derived type constant
values.

At present, the components appear in source position order. When some
ancestral types are defined in modules, this ordering can cause their
components to be ordered after components defined in extended derived
types. This can lead to crashes from incompatible MLIR types.

To avoid this issue, sort structure constructor components first in
ascending order of derived type extension depth; retain source position
ordering for components in the same derived type and for error recovery
situations.

Fixes #143740.

[llvm] Use the VFS to make path absolute (#161271)

For the redirecting VFS, the 'overlay-relative' option controls
whether external paths should be appended to the overlay directory. This
didn't always work as expected: when the overlay file path itself was
relative, its absolute path was decided by the real FS, not the
underlying VFS, and the resulting external path didn't exist in the
underlying VFS. This PR fixes this issue.

[flang] Don't retain FIXED/FREE compiler directives (#160780)

Some old code in the prescanner, antedating the current -E output
mechanisms, retains the !DIR$ FIXED and !DIR$ FREE directives in the
input, and will even generate them to append to the scanned source from
source and include files to restore the fixed/free source form
distinction. But these directives have not been needed since the -E
output generator began generating source form insensitive output, and
they can confuse the parser's error recovery when the appended
directives follow the END statement. Change their handling so that
they're read and respected by the prescanner but no longer retained in
either the -E output or the cooked character stream passed on to the
parser.

Fixes a regression reported by @DanielCChen after PR 159834.

[flang] Catch calls to impure intrinsics from PURE subprograms (#160947)

The code in expression semantics that catches a call to an impure
procedure in a PURE context misses calls to impure intrinsics, since
their designators have a SpecificIntrinsic rather than a Symbol. Replace
the current check with a new one that uses the characteristics of the
called procedure, which works for both intrinsic and non-intrinsic
cases.

Testing this change revealed that an explicit INTRINSIC statement wasn't
doing the right thing for extension "dual" intrinsics that can be called
as either a function or as a subroutine; the use of an INTRINSIC
statement would disallow its use as a subroutine. I've fixed that here
as well.

Fixes #157124.

[flang] Improve presentation of errors after last source line (#161391)

We don't emit source file names or line numbers for error messages at
EOF. Detect these and handle them a little better, pointing at the
newline at the end of the last source line instead.

[llvm] Fix build after #161260

The modular build was failing due to a missing include.

[TableGen, CHERI] Make CPtrWildcard test tolerant to unrelated changes (#161406)

Changes to llvm/include/llvm/IR/Intrinsics.td may change the constants
that are embedded in this test. Use wildcards, so that unrelated changes
do not trip over this test failing.

Fixes: #158426

[clang][modules] Virtualize module cache pruning (#149113)

This PR virtualizes module cache pruning via the new ModuleCache
interface. Currently this is an NFC, but I left a FIXME in
InProcessModuleCache to make this more efficient for the dependency
scanner.

[MLIR] Fix gpu.launch attribution argument printing (#161408)

This was broken and never tested.
Not only this could crash for stack-use-after-scope, but it also would
have printed something like:

value <block argument> of type 'memref<7x8xf64, #gpu.address_space<workgroup>>' at index: 12

insted of the SSA value.

It turns out the gpu.func already have a very similar helper that we can
reuse here.

Fixes #161394

[RISCV] Add missing CHECK lines for Zkt to sifive-p450/p470/p670 test. NFC (#161393)

[clang-sycl-linker] Generate SymbolTable for each image (#161287)

This PR adds extraction of kernel names for each image and stores them
to the Image's StringData field.

[clang] Cleanup docs and code for legacy no_sanitize attributes (NFC). (#161311)

Update generated docs for legacy attributes:

  • no_sanitize_(address|thread|memory)
  • no_address_safety_analysis

Those are older forms of no_sanitize("list", "of", "sanitizers")
attribute. They were previously as various spellings of the same
attribute, which made the auto-generated documentation confusing.

Fix this by explicitly making them three different attributes. This
would also allow to simplify the delegation to the new no_sanitize form
slightly, as we can instead rely on auto-generated code to check that
TSan and MSan can't be disabled for globals.

HTML docs before:
rendered-docs-before

HTML docs after:
rendered-docs-after


Co-authored-by: Erich Keane [email protected]

[CIR] Upstream RTTI Builder & RTTI for VTable Definitions (#160002)

Upstream the RTTI builder with helpers and used them in the VTable
Definitions

Issue #154992

[BOLT] Introduce helpers to match MCInsts one at a time (NFC) (#138883)

Introduce a low-level instruction matching DSL to capture and/or match
the operands of MCInst, single instruction at a time. Unlike the
existing MCPlusBuilder::MCInstMatcher machinery, this DSL is intended
for the use cases when the precise control over the instruction order is
required. For example, when validating PtrAuth hardening, all registers
are usually considered unsafe after a function call, even though
callee-saved registers should preserve their old
values under normal operation.

Usage example:

// Bring the short names into the local scope:
using namespace LowLevelInstMatcherDSL;
// Declare the registers to capture:
Reg Xn, Xm;
// Capture the 0th and 1st operands, match the 2nd operand against the
// just captured Xm register, match the 3rd operand against literal 0:
if (!matchInst(MaybeAdd, AArch64::ADDXrs, Xm, Xn, Xm, Imm(0))
  return AArch64::NoRegister;
// Match the 0th operand against Xm:
if (!matchInst(MaybeBr, AArch64::BR, Xm))
  return AArch64::NoRegister;
// Manually check that Xm and Xn did not match the same register:
if (Xm.get() == Xn.get())
  return AArch64::NoRegister;
// Return the matched register:
return Xm.get();

[llvm][NFC] Simplify implementation of isa (#161403)

Using a fold instead of template recursion.

[lldb][NFC] Fix spelling of function in log message (#161261)

Fix spelling of GetMemoryRegionInfo function in
log message and comment and reformat code.

[VPlan] Handle scalar-VF in transforms (NFC) (#161365)

[flang] Implemented a warning about contiguity of compile time constant values (#161084)

Implemented common::UsageWarning::ConstantIsContiguous to warn about
the
following case:

integer, parameter :: num = 3
integer, parameter :: arr(num)=[(i, i=1,num)]
logical, parameter :: result=is_contiguous(arr(num:1:-1))
end

Here, while array section is discontiguous, arr is a compile time
constant,
so array section created at compile time will end up being contiguous
and
result will be "true". If arr wasn't a constant, the result at
runtime
would have been "false".

Fix run_clang_repl output when not present (#161412)

On the happy path, when clang-repl is present, we will invoke it in
order to determine if the host supports JIT features. That will return a
string containing "true". However, in cases where clang-repl is not
present or we fail to invoke it, we previously returned False, which
would then trigger a failure with our substring check. This PR updates
the function to return "" instead, so the substring check is still
valid.

This is related to #157359,
where the original change was introduced.

[flang] Add #include to fix MSVC build (#161415)

flang/lib/Evaluate/constant.cpp apparently needs this #include for MSVC
builds but somehow not for others.

[NFC] Remove trailing whitespaces from clang/include/clang/Basic/Attr.td

[LAA] Fix picking context instr in evaluatePtrAddRec for multiple preds.

A loop may have more than one predecessor out of the loop. In that case,
just pick the first non-phi instruction in the loop header.

[compiler-rt][asan] Add wcscpy/wcsncpy; enable wcscat/wcsncat on Windows (#160493)

Summary

  • Add ASan interceptors for wcscpy/wcsncpy on all platforms.
  • Enable wcscat/wcsncat on Windows (already enabled on POSIX via
    sanitizer_common).

Motivation

  • Use of wchar string APIs is common on Windows; improve parity with
    char* string checks.

Changes

  • Implement wcscpy/wcsncpy in asan_interceptors.cpp; check overlap and
    mark read/write ranges in bytes.
  • wcsncpy: compute write size in bytes (size * sizeof(wchar_t)) to avoid
    missed overflows when sizeof(wchar_t) != 1.
  • Use MaybeRealWcsnlen when available to bound reads.
  • Register Windows static thunk for wcscpy/wcsncpy/wcscat/wcsncat; rely
    on sanitizer_common interceptors for wcscat/wcsncat.
  • Tests: add wcscpy/wcsncpy/wcscat/wcsncat; flush stdout before crash;
    use resilient FileCheck patterns (reuse [[ADDR]], wildcard for function
    suffixes and paths, flexible line numbers).

Testing

  • AArch64 Linux: new tests pass with check-asan locally.

Follow-up to and based on prior work in PR #90909 (author: branh,
Microsoft); builds on that work and addresses review feedback. Thanks!


Signed-off-by: Yixuan Cao [email protected]

[clang-doc] Suppress long-name test on windows (#161424)

This seems to have broken some buildbots for a long time, so just
suppress it for now until we determine how/why.

[flang] Attempt to work around MSVC build problem (#161426)

Move a function that seems to be running into an MSVC problem from the
source file where I created it to another one (tools.cpp) that is
already known to be able to access the semantics::Scope type.

[MLIR][Standalone] gate wheel build behind MLIR_ENABLE_BINDINGS_PYTHON=ON (#161427)

If MLIR_ENABLE_BINDINGS_PYTHON=ON then
StandalonePythonModules
isn't a valid target.

[AMDGPU] Introduce and use NotUseRealTrue16Insts. NFC. (#161373)

This removes ~2000 lines from both AMDGPUGenDAGISel.inc and
AMDGPUGenGlobalISel.inc.

[OpenACC][CIR] Fix transform inclusive scan init parameter (#161428)

This fixes macos build, where otherwise the compilation yields an error: no viable conversion from 'bool' to 'typename iterator_traits<const QualType *>::value_type

[LAA] Add tests for using inbounds flags only used in predicated blocks.

Test for #160912.

Revert "[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)"

This reverts commit b4be7ec.

See #161404 for a crash
exposed by the change. Revert while I investigate.

[CIR] Upstream support for generating global ctor regions (#161298)

This adds support for handling global variables with non-trivial
constructors. The constructor call is emitted in CIR as a 'ctor' region
associated with the global definition. This form of global definition
cannot be lowered to LLVM IR yet.

A later change will add support in LoweringPrepare to move the ctor code
into a __cxx_global_var_init() function and add that function to the
list of global global ctors, but for now we must stop at the initial CIR
generation.

[HLSL][NFC] Add helper struct to simplify dealing with resource binding attributes (#161254)

Add new ResourceBindingAttrs struct that holds resource binding attributes HLSLResourceBindingAttr and HLSLVkBindingAttr and provides helper methods to simplify dealing with resource bindings. This code is placed in the AST library to be shared between Sema and CodeGen.

This change has been done in preparation of a third binding attribute coming soon to represent [[vk::counter_binding()]]. This new attribute and more helper member functions will be added to ResourceBindingAttrs and will be used in both Sema and in CodeGen to implement resource counter initialization.

[flang] Add missing #include for MSVC (#161437)

I moved a function to Evaluate/tools.cpp in an attempt to dodge some
MSVC compiler issue but didn't add an include directive for
Evaluate/tools.h to Evaluate/constant.cpp.

[OpenMP] Update 6.1 implementation status. (#161449)

@jhuber6: Please review

[NFC][LLVM] Use ListSeparator in AsmWriter (#161422)

Use ListSeparator instead of manual code when generating comma
separated lists. Also replace FieldSeparator with ListSeparator as
they both provide identical functionality.

[NFC] [IndVarSimplify] add overflowing tests (#159877)

Also use UTC for test instead.

Fix memory leak in Offloading API (#161430)

Fix or the failing Sanitizer buildbots from PR:
#143342

[CodingStandard] Require Unix line endings for all files (#161228)

Require all files to use Unix line endings, formalizing an already
followed convention.

[AMDGPU] Precommit test for 160181

[llvm][mustache] Fix failing StandaloneIndentation test (#159192)

When rendering partials, we need to use an indentation stream,
but when part of the partial is a unescaped sequence, we cannot
indent those. To address this, we build a common MustacheStream
interface for all the output streams to use. This allows us to
further customize the AddIndentationStream implementation
and opt it out of indenting the UnescapeSequence.

[llvm][mustache] Simplify debug logging (#159193)

The existing logging was inconsistent, and we logged too many things.
This PR introduces a more principled schema, and eliminates many,
redundant log lines.

[JITLink][MachO] Use Triple::isArm64e consistently.

Revert "Fix memory leak in Offloading API" (#161465)

Reverts #161430

[RISCV] Remove Zicntr from sifive-p450/p470/p670. (#161444)

These cores don't implement the time CSR. They require SBI to trap and
emulate it which is allowed by RVA20U.

[RISCV] Rename BFloatVectors -> BF16Vectors in tablegen. NFC (#161469)

Part of this rename is taken from #161158, but applies it more
consistently to more variables.

I think using BF16 makes it easier to not confuse BFloat and Float when
reading.

[flang][driver] Accelerate complex division when -ffast-math is specified (#159689)

This patch accelerates complex division by passing
-complex-range=basic to the frontend when the -ffast-math option is
specified. This behavior is the same as -fcomplex-arithmetic=basic. A
warning is issued if a different value is specified for
-fcomplex-arithmetic=. The warning conditions will be unified with
clang.

Reserve R9 on armv6 iOS 2.x (#150835)

The iOS 2.x ABI had R9 as a reserved register, 3.0 made it available,
but support for the 2.x ABI was never added to LLVM. We only use the 2.x
ABI on armv6 since before 3.0 armv6 was the only architecture supported
by iOS.

[HLSL][NFC] Add missing includes for standalone header compilation (#161473)

HLSLResource.h added by #161254 builds in the context of a .cpp file
(e.g. CGHLSLRuntime.cpp) but not when doing a header compilation, e.g.:

clang/include/clang/AST/Attrs.inc:12:45: error: unknown type name 'raw_ostream'; did you mean 'clang::raw_ostream'?
   12 | static inline void DelimitAttributeArgument(raw_ostream& OS, bool& IsFirst) {

[Support] Fix warnings

This patch fixes:

llvm/lib/Support/Mustache.cpp:332:20: error: unused function
'tagKindToString' [-Werror,-Wunused-function]

llvm/lib/Support/Mustache.cpp:344:20: error: unused function
'jsonKindToString' [-Werror,-Wunused-function]

[ADT] Consolidate uninitialized_copy in SmallVector (NFC) (#161043)

This patch consolidates two implementations of uninitialized_copy into
a single template function.

[LVI] Handle constant value lattice in getEdgeValueLocal (#161410)

Closes #161367.

In #157614, we ignored cases
where OpLatticeVal might be a constant or notconstant. Directly
returning the result causes a type mismatch. I apologize for the
oversight in the previous code review.

This patch applies the cast op to constants. For notconstant value
lattices, I'd leave it as a todo (it is similar to the constant case,
except for trunc without nsw/nuw).

[IR] Introduce !captures metadata (#160913)

This introduces !captures metadata on stores, which looks like this:

store ptr %x, ptr %y, !captures !{!"address", !"read_provenance"}

The semantics are the same as replacing the store with a call like this:

call void @llvm.store(ptr captures(address, read_provenance) %x, ptr %y)

This metadata is intended for annotation by frontends -- it's not
something we can feasibly infer at this point, as it would require
analyzing uses of the pointer stored in memory.

The motivating use case for this is Rust's println!() machinery, which
involves storing a reference to the value inside a structure. This means
that printing code (including conditional debugging code), can inhibit
optimizations because the pointer escapes. With the new metadata we can
annotate this as a read-only capture, which has less impact on
optimizations.

[RISCV] Add commutative support for Qualcomm uC Xqcics extension (#161328)

This is a follow-up to #160653 doing similar changes for Xqcics.

[flang] add helper to create descriptor with new base address (#161347)

There is currently no helper to create a descriptor for a copy of a
Fortran entity based on the descriptor of the original entity and the
base address of the copy (most places that are doing this currently are
also doing allocation of the copy at the same time or using the
runtime).
Add a helper for this with a unit test.

[lldb][IRExecutionUnit] Return error on failure to resolve function address (#161363)

Starting with #148877 we
started encoding the module ID of the function DIE we are currently
parsing into its AsmLabel in the AST. When the JIT asks LLDB to
resolve our special mangled name, we would locate the module and resolve
the function/symbol we found in it.

If we are debugging with a SymbolFileDWARFDebugMap, the module ID we
encode is that of the .o file that is tracked by the debug-map. To
resolve the address of the DIE in that .o file, we have to ask
SymbolFileDWARFDebugMap::LinkOSOAddress to turn the address of the
.o DIE into a real address in the linked executable. This will only
work if the .o address was actually tracked by the debug-map. However,
if the function definition appears in multiple .o files (which is the
case for functions defined in headers), the linker will most likely
de-deuplicate that definition. So most .o's definition DIEs for that
function won't have a contribution in the debug-map, and thus we fail to
resolve the address.

When debugging Clang on Darwin, e.g., you'd see:

(lldb) expr CXXDecl->getName()

error: Couldn't look up symbols:
  $__lldb_func::0x1:0x4000d000002359da:_ZNK5clang9NamedDecl7getNameEv
Hint: The expression tried to call a function that is not present in the target, perhaps because it was optimized out by the compiler.

unless you were stopped in the .o file whose definition of getName
made it into the final executable.

The fix here is to error out if we fail to resolve the address, causing
us to fall back on the old flow which did a lookup by mangled name,
which the SymbolFileDWARFDebugMap will handle correctly.

An alternative fix to this would be to encode the
SymbolFileDWARFDebugMap's module-id. And implement
SymbolFileDWARFDebugMap::ResolveFunctionCallLabel by doing a mangled
name lookup. The proposed approach doesn't stop us from implementing
that, so we could choose to do it in a follow-up.

rdar://161393045

[mlir][transform] Add PromoteTensorOp (#158318)

Transform op to request a tensor value to live in a specific memory
space after bufferization

Co-authored-by: Nicolas Vasilache [email protected]
Co-authored-by: Alex Zinenko [email protected]

[MemorySanitizer] Generate check lines for some vararg tests (NFC)

Use UTC_ARGS: --disable to skip the tests with many arguments.

[flang][debug] Change type*N to type(kind=N). (#161432)

It was discussed in #161361.

[MemorySanitizer] Generate test checks for kmsan test (NFC)

[InstCombine] Opt phi(freeze(undef), C) -> phi(C, C) (#161181)

Try to choose a value for freeze that enables the PHI to be replaced
with its input constants if they are equal.

[libc][math] Refactor exp10m1f16 implementation to header-only in src/__support/math folder. (#161119)

Part of #147386

in preparation for: https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450

[CIR] Refactor cir.cast to use uniform assembly form w/o parens, commas (#161431)

This mirrors incubator changes from llvm/clangir#1922

[GlobalOpt] Check if users are CallBase when changing CC (#161399)

Fixes #156656
hasChangeableCCImpl guarantees the address of the function is not
taken, but it ignores assume-like calls.
This patch ignores assume-like calls when changing CC.

[AMDGPU][InsertWaitCnts] Refactor some helper functions, NFC (#161160)

  • Remove one-line wrappers around a simple function call when they're
    only used once or twice.
  • Move very generic helpers into SIInstrInfo
  • Delete unused functions

The goal is simply to reduce the noise in SIInsertWaitCnts without
hiding functionality. I focused on moving trivial helpers, or helpers
with very descriptive/verbose names (so it doesn't hide too much logic
away from the pass), and that have some reusability potential.

I'm also trying to make the code style more consistent. It doesn't make
sense to see a function call TII->isXXX then suddenly call a random
isY method that just wraps around TII->isY.

The context of this work is that I'm trying to learn how this pass
works, and while going through the code I noticed some little things
here and there that I thought would be good to fix.

[AMDGPU][SIInsertWaitCnts] De-duplicate code (NFC) (#161161)

I'm reading through the pass over and over again to try and learn how it works. I noticed some code duplication here and there while doing that.

[DAGCombine] Support (shl %x, constant) in foldPartialReduceMLAMulOp. (#160663)

Support shifts in foldPartialReduceMLAMulOp by treating (shl %x, %c) as
(mul %x, (shl 1, %c)).

PR: #160663

[AMDGPU] Remove duplicate definition of isGFX12CacheInvOrWBInst

Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFC. (#161496)

[lldb][Mangled][NFC] Remove redundant const-qualifier on llvm::StringRef argument

[AMDGPU][SIInsertWaitCnts] Remove redundant TII/TRI/MRI arguments (NFC) (#161357)

WaitCntBrackets already has a pointer to its SIInsertWaitCnt instance.
With a small change, it can directly access TII/TRI/MRI that way.
This simplifies a lot of call sites which make the code easier to
follow.

[lldb][TypeSystemClang] Added unique builtins types for __bf16 and _Float16 (#157674)

During debugging applization with __bf16 and _Float16 float types it was
discovered that lldb creates the same CompilerType for them. This can
cause an infinite recursion error, if one tries to create two struct
specializations with these types and then inherit one specialization
from another.

[MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in OpenMPDialect.cpp (NFC)

[BOLT] Gadget scanner: optionally assume auth traps on failure (#139778)

On AArch64 it is possible for an auth instruction to either return an
invalid address value on failure (without FEAT_FPAC) or generate an
error (with FEAT_FPAC). It thus may be possible to never emit explicit
pointer checks, if the target CPU is known to support FEAT_FPAC.

This commit implements an --auth-traps-on-failure command line option,
which essentially makes "safe-to-dereference" and "trusted" register
properties identical and disables scanning for authentication oracles
completely.

[BOLT] Gadget scanner: make use of C++17 features and LLVM helpers (#141665)

Perform trivial syntactical cleanups:

  • make use of structured binding declarations
  • use LLVM utility functions when appropriate
  • omit braces around single expression inside single-line LLVM_DEBUG()

This patch is NFC aside from minor debug output changes.

[MLIR] Apply clang-tidy fixes for performance-move-const-arg in SimplifyAffineMinMax.cpp (NFC)

[AArch64] Some tests for cbz/tbz with wzr. NFC

[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in Rewrite.cpp (NFC)

[MLIR] Apply clang-tidy fixes for performance-unnecessary-copy-initialization in InferIntRangeCommon.cpp (NFC)

[clang][bytecode] Fix integral cast edge case (#161506)

We were converting the ASInt to as sign-less APInt too early and
losing the sign information.

[mlir][memref] Introduce memref.distinct_objects op (#156913)

The distinct_objects operation takes a list of memrefs and returns a
list of memrefs of the same types, with the additional assumption that
accesses to these memrefs will never alias with each other. This means
that loads and stores to different memrefs in the list can be safely
reordered.

The discussion
https://discourse.llvm.org/t/rfc-introducing-memref-aliasing-attributes/88049

[lldb][CPlusPlusLanguage] Avoid redundant const char* -> StringRef roundtrip (#161499)

We've been seen (very sporadic) lifetime issues around this area. Here's
an example backtrace:

[  8] 0x0000000188e56743 libsystem_platform.dylib`_sigtramp + 55
[  9] 0x00000001181e041f LLDB`lldb_private::CPlusPlusLanguage::SymbolNameFitsToLanguage(lldb_private::Mangled) const [inlined] unsigned long std::1::constexpr_strlen[abi:nn200100]<char>(char const*) + 7 at constexpr_c_functions.h:63:10
[  9] 0x00000001181e0418 LLDB`lldb_private::CPlusPlusLanguage::SymbolNameFitsToLanguage(lldb_private::Mangled) const [inlined] std::__1::char_traits<char>::length[abi:nn200100](char const*) at char_traits.h:232:12
[  9] 0x00000001181e0418 LLDB`lldb_private::CPlusPlusLanguage::SymbolNameFitsToLanguage(lldb_private::Mangled) const [inlined] llvm::StringRef::StringRef(char const*) at StringRef.h:90:33
[  9] 0x00000001181e0418 LLDB`lldb_private::CPlusPlusLanguage::SymbolNameFitsToLanguage(lldb_private::Mangled) const [inlined] llvm::StringRef::StringRef(char const*) at StringRef.h:92:38
[  9] 0x00000001181e0418 LLDB`lldb_private::CPlusPlusLanguage::SymbolNameFitsToLanguage(lldb_private::Mangled) const + 20 at CPlusPlusLanguage.cpp:68:62

Looks like we're calling strlen on a nullptr. I stared at this
codepath for a while but am still not sure how that could happen unless
the underlying ConstString somehow pointed to corrupted data.

But SymbolNameFitsToLanguage does some roundtripping through a const char* before calling GetManglingScheme. No other callsite does this
and it just seems redundant.

This patch cleans this up.

rdar://161128180

[MLIR] Remove unused debug macros (NFC)

[NFC][LLVM][AsmWriter] Move type printing to WriteAsOperandInternal (#161456)

Add option to WriteAsOperandInternal to print the type and use that to
eliminate explicit type printing code in several places.

[CodeGen] Remove shouldExpandPartialReductionIntrinsic() hook (NFC) (#161498)

This is unused. Targets can lower/expand the PARTIAL_REDUCE_* ISD
nodes.

[NFC][AArch64][ISEL] Remove unnecessary predicates from partial_reduce_*mla patterns.

[OpenACC][CIR] Implement 'alloca copying' for private lowering (#161382)

The previous patch ensured that we correctly got the allocas put in
place. This patch takes the address of each element of each alloca, and
copies it to the previous one. This allows us to re-form the
pointer-structure for a recipe.

[MLIR][Transform][Tune] Introduce transform.tune.alternatives op (#160724)

This op enables expressing uncertainty regarding what should be
happening at particular places in transform-dialect schedules. In
particular, it enables representing a choice among alternative regions.
This choice is resolved through providing a selected_region argument.
When this argument is provided, the semantics are such that it is valid
to rewrite the op through substituting in the selected region -- with
the op's interpreted semantics corresponding to exactly this.

This op represents another piece of the puzzle w.r.t. a toolkit for
expressing autotuning problems with the transform dialect. Note that
this goes beyond tuning knobs on transforms, going further by making
it tunable which (sequences of) transforms are to be applied.

AMDGPU: Add peephole opt baseline tests (#161309)

Add tests which show missed folds of subregister extracts with
intermediate full copies.

[X86] Add test showing failure to remove sign splats from PACKSS intrinsics (#161518)

PACKSS intrinsic calls are only expanded to X86ISD::PACKSS nodes during
legalisation, after which time we fail to handle cases where ASHR sign
splats (now lowered to X86ISD::VSRAI) are unnecessary.

Add additional example of FREEZE(PACKSS()) as that's an issue as well.

[InstCombine] Drop poison-generating flags when reusing existing or instruction (#161504)

Closes #161493.

[InstCombine] Avoid self-replacing in getUndefReplacement (#161500)

Self-replacing has a different meaning in InstCombine. It will replace
all uses with poison.
Closes #161492.

[DFAJumpThreading] Unfold select to the incoming block of phi user (#160987)

Fixes #160250
We previously assumed the select to unfold is defined in the incoming
block of phi user, as isValidSelectInst filters other cases at the
initial stage. However, the selects not defined in the incoming block
may occur after unfolding the arms of the unfolded select.
This patch sinks the select into the incoming block of the phi user and
unfolds it at the incoming block.

[MLIR] Add sincos fusion pass (#161413)

We see performance improvements from using sincos to reuse calculations
in hot loops that compute sin() and cos() of the same operand. Add a
pass to identify sin() and cos() calls in the same block with the same
operand and fast-math flags, and fuse them into a sincos op.

Follow-up to:

[AMDGPU] Use common allUsesAvailableAt implementation [nfc] (#161418)

Replace the target specific copy with a call to the generic routine. I
don't spot any differences by eye, and there's nothing in the original
review discussion (#124327) which makes it clear why this was
duplicated.

[AArch64][SME] Precommit tests for LUT4I Chain issues (NFC) (#161505)

These tests show that luti4 intrinsics are currently incorrectly
CSD'd.

[clang] Convert second arg of __builtin_assume_aligned to ConstantExpr (#161314)

Since the second argument must be a constant integer, we can as well
convert it to a ConstantExpr in Sema.

Fixes #161272

[X86] SimplifyDemandedBitsForTargetNode - generalize X86ISD::VSRAI handling when only demanding 'known signbits' (#161523)

If we only demand bits that already match the signbit then we don't need to shift.

Generalizes an existing pattern that just handled signbit-only demanded bits to match what we do for ISD::SRA.

[OpenACC] Remove unnecessary uses of getResult, fix cast tests (#161526)

A previous review comment pointed out that operations with only a single
result implicitly convert to mlir::Value. This patch removes the
explicit use of getResult where it is unnecessary in OpenACC lowering.

However, there ARE a few cases where it is necessary where the
mlir::ValueRange implicit constructor from a single value is being
used, so those are untouched.

Additionally, while the previous patch was being committed (#161382), a
second patch (#161431) changed the format of cir.casts, so this patch
fixes the additional test lines for that as well.

[LLVM][SCEV] udiv (mul nuw a, vscale), (mul nuw b, vscale) -> udiv a, b (#157836)

[lldb][test] Fix bf16 test cases on Arm 32-bit (#161528)

Fixes #157674

On ARM, the presence of a specific bf16 type in the AST is gated by:

bool ARMTargetInfo::hasBFloat16Type() const {
  // The __bf16 type is generally available so long as we have any fp registers.
  return HasBFloat16 || (FPU && !SoftFloat);
}

And the target we use when evaluating symbols (derived from the program
file, I think, haven't found it yet) does not enable any of this.

This means that we fall back to __fp16.

So for parts of the testing we just need to expect __fp16 instead

When the stars align to conspire against stack alignment, when we have
frame-pointer=non-leaf we can incorrectly skip preserving fp/r7 in the prolog.
The fix here first makes sure we're using the right frame pointer register in
the context of preserving the incoming FP, and then make sure that we save
the FP when re-alignment is known to be necessary.

rdar://162462271
Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@aemerson aemerson closed this Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment