[clang-reorder-fields] Check for flexible array member (#160262) #163697
+43
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[clang-reorder-fields] Check for flexible array member (#160262)
A flexible array member must remain the last field in the struct.
[CAS][CMake] Fix rhel bots missing symbol failure from #114100 (#161283)
Link LLVM_PTHREAD_LIB from LLVMCAS library to fix rhel bots.
[clang-scan-deps] Remove const from ModuleDeps loop to enable move. (#161109)
This changes the iteration from const to non-const so that std::move
results in a true move rather than a copy.
Create function declaration in the proper module (#161281)
Using
memref.dealloc
in the gpu module would add a function definitionfor
@free
in the the top level module instead of the gpu module. Thefix is to do what is already done for memref.alloc which is to use
op->getParentWithTrait<OpTrait::SymbolTable>()
instead ofop->getParentOfType<ModuleOp>()
to create the call in the propermodule.
[lld][WebAssembly] Fix visibility of
__stack_pointer
global (#161284)The stack pointer should be global, not hidden / dso-local. Marking it
as global allows it to be exported from the main module and imported
into side modules.
[CIR] Add GlobalOp ctor and dtor regions (#160779)
This adds support for ctor and dtor regions in cir::GlobalOp. These
regions are used to capture the code that initializes and cleans up the
variable, keeping this initialization and cleanup code with the variable
definition.
This change only adds the CIR dialect support for these regions. Support
for generating the code in these regions from source and lowering these
to LLVM IR will be added in a later change, as will LoweringPrepare
support to move the code into the __cxx_global_var_init() function.
[Clang] Avoid null deref in lambda attribute compat warning (#161096)
Fixes #161070
This PR addresses the issue in
ext_decl_attrs_on_lambda
by using%0
=attribute name and%1
=selector, which prevents a nullIdentifierInfo*
.llvm-project/clang/lib/Parse/ParseExprCXX.cpp
Lines 1299 to 1302 in 48a6f2f
llvm-project/clang/include/clang/Basic/DiagnosticParseKinds.td
Lines 1143 to 1145 in 48a6f2f
llvm-project/clang/include/clang/Basic/DiagnosticParseKinds.td
Lines 1149 to 1152 in 48a6f2f
[lld][macho][NFC] Add release note for #158720 (#161295)
I forgot to add a release note for
#158720 so I'll add it here.
[llvm][mustache] Support setting delimiters in templates (#159187)
The base mustache spec allows setting custom delimiters, which slightly
change parsing of partials. This patch implements that feature by adding
a new token type, and changing the tokenizer's behavior to allow setting
custom delimiters.
[profcheck] Add unknown branch weight for inlined memchr calls. (#160964)
The memchr inliner creates new switch branches but was failling to add
profile metada. This patch fixes the issue by explicitly adding unknown
branch weights to these branches.
Issue #147390
[llvm][mustache] Refactor tokenizer for clarity (#159188)
This patch refactors the Mustache tokenizer by breaking the logic up
with helper functions to improve clarity and simplify the code.
[LoongArch][NFC] Pre-commit tests for
xvinsve0.{w/d}
(#160829)[mlir][GPU] Generalize gpu.printf to not need gpu.module (#161266)
In order to make the gpu.printf => [various LLVM calls] passes less
order-dependent and to allow downstreams that don't use gpu.module to
use gpu.printf, allow the flowerings for such prints to target the
nearest
SymbolTable
instead.[llvm][mustache] Refactor template rendering (#159189)
Move the rendering logic into the ASTNode, and break the logic down into
individual methods.
[LoongArch] Custom legalize vector_shuffle to
xvinsve0.{w/d}
when possible (#161156)[analyzer] Use sed from the ToolBox on AIX (NFC) (#161242)
The change in commit 30402c7 breaks the tests on AIX. This patch
is to change to use the
sed
from AIX Toolbox instead of the defaultone which does not support
-r
and-E
.[LoongArch] Add R_LARCH_MARK_LA relocation for la.abs
Match gas behavior: generate
R_LARCH_MARK_LA
relocation forla.abs
.Reviewers: heiher, SixWeining
Reviewed By: SixWeining, heiher
Pull Request: #161062
[llvm][mustache] Remove out parameters from processTags() (#159190)
We can construct the return values directly and simplify the interface.
[sanitizer] Handle nullptr name in prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME) (#160824)
Summary
This PR resolves #160562
[CUDA] Enable variadic argument support in front-end (#161305)
Variadice argument for NVPTX as been support in
486d00e
We can enable it in front-end.
Co-authored-by: Yuanke Luo [email protected]
ELF: Rename Relocations.cpp functions and rewrite the file-level comment. NFC
Pull Request: #161229
ELF: Use preprocessed relocations for EhInputSection scanning
.eh_frame sections require special sub-section processing, specifically,
CIEs are de-duplicated and FDEs are garbage collected. Create a
specialized scanEhSection() function utilizing the just-added
EhInputSection::rels. OffsetGetter is moved to scanEhSection.
This improves separation of concerns between InputSection and
EhInputSection processing.
This removes another
relsOrRelas
call usingsupportsCrel=false
.DWARF.cpp now has the last call.
Pull Request: #161091
[llvm][mustache] Introduce MustacheContext to simplify mustache APIs (#159191)
[DAGCombiner] Remove most
NoSignedZerosFPMath
uses (#161180)Remained two uses are related to fneg and foldFPToIntToFP, some AMDGPU
tests are duplicated and regenerated.
[clang-format] Fix a bug in wrapping { after else (#161048)
Fixes #160775
[clang][Diags] Automatically format AP(S)Int values with separators (#161047)
This adds an
operator<<
overload forStreamingDiagnostic
that takesan
APInt
/APSInt
and formats it with default options, includingadding separators.
This is still an opt-in mechanism since all callers that want to use
this feature need to be changed from
to
Diag() << MyInt;
This patch contains one example of a diagnostic making use of this.
[RISCV] Add commutative support for Qualcomm uC Xqcicm extension (#160653)
This is a follow-up to #145643. See
#145643 (comment).
[MLIR][Python] Fix stubgen/PYTHONPATH collision/bug (#161307)
If
PYTHONPATH
is set and points to the build location of the pythonbindings package then when stubgen runs,
_mlir
will get imported twiceand bad things will happen (e.g.,
Assertion !instance && “PyGlobals already constructed”’
). This happens becausemlir
is a namespacepackage and the importer/loader can't distinguish between
mlir._mlir_libs._mlir
and_mlir_libs._mlir
imported fromCWD
. Orsomething like that. The fix is to filter out any entries in
PYTHONPATH
that end inMLIR_BINDINGS_PYTHON_INSTALL_PREFIX/..
(e.g.,python_packages/mlir_core/
).[clang][libc++] Fix spelling of "synthesize" (#158523)
There is a tradition to use U.S. English spellings for APIs. For
example, it's uninitialized_fill and not uninitialised_fill,
specialization not specialisation, etcetera.
[clangd] Fix off-by-one error in CommandMangler (#160029)
SawInput() is intended to be called for every argument after a
--
, butit was mistakenly being called for the
--
itself.Partially fixes clangd/clangd#1850
[WebAssembly] Use partial_reduce_mla ISD nodes (#161184)
Addresssing issue #160847.
Move away from combining the intrinsic call and instead lower the ISD
nodes, using tablegen for pattern matching.
[clang][Tooling] Support 'c++latest' in InterpolatingCompilationDatabase (#160030)
Fixes clangd/clangd#527
Fixes clangd/clangd#1850
[Modules] Make -module-file-info print macro names in deterministic order (#161332)
Developers reported non-deterministic output from
-module-file-info
,thinking this reflected non-determinism in the .pcm files themselves.
However, it turned out it was the printing that was non-deterministic:
Making the output deterministic also simplifies the test.
This is a follow-up to 360c5fe
[MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in LevelZeroRuntimeWrappers.cpp (NFC)
[MLIR] Apply clang-tidy fixes for readability-container-size-empty in Vectorization.cpp (NFC)
[VectorCombine] Preserve cast flags in foldBitOpOfCastConstant (#161237)
Follow-up of #157822.
[Unifomity] Remove unused PhiInput definition (NFC) (#161116)
This appears to have no users.
[mlir][memref-to-spirv]: Remap Image Load Coordinates (#160495)
When converting a
memref.load
from the image address space to aspirv.ImageFetch
ensure that we correctly map the load indices towidth, height and depth.
The lowering currently assumes a linear image tiling, that is row-major
memory layout. This allows us to support any memref layout that is a
permutation of the dimensions, more complex layouts are not currently
supported. Because the ordering of the dimensions in the vector passed
to image fetch is the opposite to that in the memref directions a final
reversal of the mapped dimensions is always required.
Signed-off-by: Jack Frankland [email protected]
[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in ReifyValueBounds.cpp (NFC)
[LV] Add ExtNegatedMulAccReduction expression type (#160154)
This PR adds the ExtNegatedMulAccReduction expression type for
VPExpressionRecipe so that extend-multiply-accumulate reductions with a
negated multiply can be bundled.
Stacked PRs:
[AArch64] Simplify some masked integer comparisons. (#153783)
Specifically,
X & M ?= C --> (C << clz(M)) ?= (X << clz(M))
where M isa non-empty sequence of ones starting at the least significant bit with
the remainder zero and C is a constant subset of M that cannot be
materialised into a SUBS (immediate). Proof:
https://alive2.llvm.org/ce/z/haqdJ4.
This improves the comparison in isinf, for example:
Before:
After:
[x86][AVX-VNNI] Fix VPDPBXXD Argument Type (#159222)
Fixed intrinsic VPDP[SS,SU,UU]D[,S]_128/256/512's argument types to match with the ISA.
Fixes part of #97271.
[BOLT] Refactor MCInstReference and move it to Core (NFC) (#155846)
Refactor MCInstReference class and move it from PAuth gadget scanner to
Core.
MCInstReference is a class representing a reference to a constant
instruction inside a parent entity - either inside a basic block (which
has a reference to its parent function) or directly inside a function
(when CFG information is not available).
This patch reapplies #138655 with a fix for iterator usage and multiple
minor issues fixed during the second round of review.
[LV] Don't preserve LCSSA in SCEVExpander for runtime checks. (#159556)
LV does not preserve LCSSA, it constructs it just before processing a
loop to vectorize. Runtime check expressions are invariant to that loop,
so expanding them should not break LCSSA form for the loop we are about
to vectorize.
This fixes a crash when discarding instructions generated when expanding
runtime checks, if the expansion introduces LCSSA phis for values from
other loops which are not in LCSSA form: we would introduce new LCSSA
phis and update all outside users, some of which are not created by the
expander and cannot be cleaned up.
Fixes #158259.
PR: #159556
[Flang] Add perfect-nest and rectangular-loop semantic tests (#160283)
Add semantic tests of currently unsupported OpenMP canonical loops:
Both were introduced in OpenMP 5.0 and are not yet supported by Flang.
The message "Trip count must be computable and invariant" is the same
that OpenACC emits for non-rectangular loops in
AccAttributeVisitor::CheckAssociatedLoop
. I considered reusing thecode, but calls OpenACC-only methods and has different behavior (e.g.
symbol resolution and does not check the step operand)
[LV] Preserve GEP nusw when widening memory (#160885)
AMDGPU: Fix s_barrier_leave to write to scc (#161221)
s_barrier_leave implicitly defines $scc
and does not use imm that represents type of barrier,
isel pattern ignores imm operand from llvm intrinsic.
Test if SIInsertWaitcnts tracks this scc write.
[clang-repl] Teach clang-repl how to load PCHs (reprise) (#157359)
This is an updated version of @vgvassilev's PR from last year here:
#94166
In short, it includes:
clang::Interpreter
(and thusclang-repl
) cannot resolve symbols defined in a PCHclang-repl
so thatllvm-lit
can match thehost JIT triple between the PCH and
clang-repl
; previously, they maydiffer in some cases
Shout out to @kylc for finding a logic issue which had us stumped for a
while (and securing the
bounty).
Co-authored-by: Vassil Vassilev [email protected]
Co-authored-by: Kyle Cesare [email protected]
[Docs][RISCV]Remove experimental from Smctr, Ssctr,Sdext and Sdtrig (#161058)
AMDGPU: Fix gcc build break (#161354)
[LLVM] Fix a bug in
Intrinsic::getFnAttributes
(#161248)[OpenACC][CIR] Generate private recipe pointer/array 'alloca's (#160911)
As a next step to generating pointer/array recipes, this patch generates
just the 'alloca' lines that are necessary. Copying pointers over to
restore the structure is held off to the next patch.
In the case of a pointer, we need to allocate the level 'below' it (if
we index into it), then copy the values into the pointers. In the case
of an array, we skip the alloca (since the array's alloca contains the
value).
After this, we'll need a patch that copies the pointers into place, and
finally one that does the initialization of these values.
[Clang][PowerPC] Add __dmr2048 type and DMF crypto builtins (#157152)
Define the __dmr2048 type to represent the DMR pair introduced by the
Dense Math Facility on PowerPC, and add three Clang builtins
corresponding to DMF cryptography:
__builtin_mma_dmsha2hash
__builtin_mma_dmsha3hash
__builtin_mma_dmxxshapad
The __dmr2048 type is required for the dmsha3hash crypto builtin, and,
as withother PPC MMA and DMR types, its use is strongly restricted.
[RISCV][NFC] Update ratified extensions list in riscv-target-features.c
[SLPVectorizer] Remove
align 16
in a test. (#161251)It is not necessary.
[flang][OpenMP] Move semantic checks for ALLOCATE to check-omp-structure (#161249)
The checks were previously in resolve-directives, which is mostly
intended for determining symbol properties, not performing semantic
checks.
[SLPVectorizer] Clear
TreeEntryToStridedPtrInfoMap
. (#160544)We need to clear
TreeEntryToStridedPtrInfoMap
indeleteTree
.[flang][debug] Generate splitDebugFilename field in DICompileUnitAttr. (#161214)
This PR builds on #160540 and
allows us to set the
splitDebugFilename
field inDICompileUnitAttr
.The changes are mostly mechanical.
I saw some spurious white space in a test that I have cleaned up.
[MLIR] Apply clang-tidy fixes for readability-container-size-empty in LinalgTransformOps.cpp (NFC)
[AArch64][SVE2p1] Allow more uses of mask in performActiveLaneMaskCombine (#159360)
The combine replaces a get_active_lane_mask used by two extract
subvectors with a single paired whilelo intrinsic. When the instruction
is used for control flow in a vector loop, an additional extract of element
0 may introduce other uses of the intrinsic such as ptest and reinterpret
cast, which is currently not supported.
This patch changes performActiveLaneMaskCombine to count the number
of extract subvectors using the mask instead of the total number of uses,
and returns the concatenated results of get_active_lane_mask.
[IR] Don't create ptrtoint expression to determine alignment (NFCI) (#161364)
We try to determine the alignment of a constant by creating a ptrtoint
expression and seeing if it folds. I believe the only case this can
actually handle is where the constant is an inttoptr expression. Handle
that directly instead of going through another ptrtoint expression.
I ran into this while trying to clean up our isEliminableCastPair()
mess, which is going to disable ptrtoint(inttoptr) folding without
DataLayout, breaking this code.
AMDGPU: Use srcvalue and delete Ignore complex pattern (#161359)
[MLIR] Add sincos op to math dialect (#160772)
Now that
sincos
is a supported intrinsic in the LLVM dialect(#160561) we are able to add the corresponding operation in
the math dialect and add conversion patterns for LLVM and NVVM.
We have several benchmarks that use sine and cosine in hot-loops, and
saving some calculations by performing them together can benefit
performance. We would like to have a way to represent sincos in the math
dialect.
[analyzer][NFC] Explain why operator new/delete should never be eval-called (#161370)
Downstream, some change triggered an investigation if we could move a
checker callback from check::PostCall to eval::Call. After a lengthy
investigation that lead to ExprEngine::VisitCXXNewExpr we realized that
CXXNewExprs only trigger a PreCall and PostCall, but never an EvalCall.
It also had a FIXME that maybe it should trigger it.
Remember, it called
defaultEvalCall
which either inlines orconservatively evaluates aka. invalidates the call. But never probes the
checker eval-calls to see if any would step in.
After implementing the changes to trigger the eval call for the
checkers, I realized that it doesn't really make sense because we are
eval-calling user-provided functions, that we can't be really sure about
their semantics, thus there is no generic way to properly implement the
eval call callback.
This touches on an important point. It only ever makes sense to eval
call functions that has a clear spec. such as standard functions, as
implementing the callback would prevent the inlining of that function,
risking regressing analysis quality if the implemented model is not
complete/correct enough.
As a conclusion, I opted for not exposing the eval call event to
checkers, in other words, keep everything as-is, but document my
journey.
CPP-6585
[NFC][LV] Fix warning of unused SubConst variable
#160154 added an assertion
using a new variable, which caused a warning in builds without asserts.
This patch adds [[maybe_unused]] to prevent that warning.
Revert "[flang] Simplify hlfir.index in a few limited cases. (#157883)" (#161387)
This reverts commit edca510 due to reported failures.
[AArch64][SME] Fix typo in docs "block" -> "bundle" (NFC) (#161383)
[AMDGPU][GlobalISel] Add RegBankLegalize support for buffer loads with formats (#161291)
[flang][debug] Improve name generation for basic types. (#161361)
For basic types, currently, we use the type name (e.g.,
integer
,real
) as the debug name. This results in types of different sizeshaving the same name. This patch improves the naming by appending the
size in bytes to the type name (e.g.,
integer*8
,real*8
).Fixes #160890
[SPIR-V] Implement SPV_KHR_float_controls2 (#146941)
Implementation of
SPV_KHR_float_controls2
extension, and corresponding tests.
Some of the tests make use of
!spirv.ExecutionMode
LLVM namedmetadata. This is because some SPIR-V instructions don't have a direct
equivalent in LLVM IR, so the SPIR-V Target uses different LLVM named
metadata to convey the necessary information. Below, you will find an
example from one of the newly added tests:
!spirv.ExecutionMode
contains a list of metadata nodes, and each ofthem specifies the required operands for expressing a particular
OpExecutionMode
instruction in SPIR-V. For example,!19 = !{ptr @k_float_controls_float, i32 6028, float poison, i32 131079}
will belowered to
OpExecutionMode [[k_float_controls_float_ID]] FPFastMathDefault [[float_type_ID]] 131079
.Co-authored-by: Dmitry Sidorov [email protected]
[CodeGen] Fix performance regression introduced in b05101b
The isNormalValueType = false flag was not set for this pseudo value
type, which caused significant size increases for some classes: the
size of the TargetLoweringBase class to 1.5 MB, because the size of
that class is quadratic in MVT::VALUETYPE_SIZE, and this commit
increased that from 256 to 504.
Reported by: abadams
Fixes: b05101b ("[TableGen, CodeGen, CHERI] Add support for the cPTR wildcard value type.")
Reviewed By: nikic
Pull Request: #161313
Aarch64: Emit a minimal SEH prologue when needed (#158173)
In some cases, with very simple thunks, it is possible that the
.seh_endprologue
is not emitted. This causes issues in the assemblerbecause the epilogue ends up starting before the prologue has ended.
Bug: swiftlang#11377
PeepholeOpt: Try to constrain uses to support subregister (#161338)
This allows removing a special case hack in ARM. ARM's implementation
of getExtractSubregLikeInputs has the strange property that it reports
a register with a class that does not support the reported subregister
index. We can however reconstrain the register to support this usage.
This is an alternative to #159600. I've included the test, but
the output is different. In this case version the VMOVSR is
replaced with an ordinary subregister extract copy.
[libc++][test] Use ASSERT_WITH_LIBRARY_INTERNAL_ALLOCATIONS in more places (#144339)
ASSERT_WITH_LIBRARY_INTERNAL_ALLOCATIONS allows waiving asserts, for
cases when we can't count allocations that happen within the libc++
shared library.
When compiling with optimization, it is possible that some calls end up
generated inline, where the overridden operator new/delete do get called
(counting those calls), whereas the compiler may decide to leave some
calls to the external definition (inside the shared library, where we
can't count the calls).
In particular, in one case, a non-optimized build calls
_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEED1Ev from
the DLL, while it gets inlined (including direct calls to operator
delete) when built with optimization.
Therefore; for the cases where we can't count allocations internally
within the library, waive these asserts.
This fixes all testcases in mingw mode, when built with optimization
enabled.
[MemorySanitizer] Generate some test checks (NFC)
[libc++][istream] P3223R2: Making
std::istream::ignore
less surprising (#147007)Implements https://wg21.link/P3223R2 as a DR as, as recommended in
cplusplus/papers#1871 (comment).
Resolves -1L ambiguity.
Closes #148178
[clang-tidy] New Option Invalid Enum Default Initialization (#159220)
Added a new Option IgnoredEnums to bugprone invalid enum default
initialization to limit the scope of the analysis. This is needed to
remove warnings on enums like std::errc where the enum doesn't define a
value of 0, but is still used to check if some function calls like
std::from_chars are executed correctly.
The C++ Standard section 22.13.2 mentions the following : "[...] If the
member ec of the return value is such that the value is equal to the
value of a value-initialized errc, the conversion was successful [...]"
This means that a call to
std::errc{}
is clearly defined by thestandard and should not raise any warning under this check.
[flang][OpenMP] Remove unused DECLARE REDUCTION from openmp-utils.h, NFC (#161390)
DECLARE REDUCTION is now handled by the generic code, and the special
handling no longer applies.
[ADT] Add const to AllocatorList::{empty,size} (#161320)
While I am at it, this patch adds [[nodiscard]].
[ADT] Add const to operator== in ArrayRef.h (#161321)
While I am at it, this patch adds [[nodiscard]].
[ADT] Make non-const functions forward to const versions (NFC) (#161323)
These functions all correspond to their respective const versions.
This patch uses the "const_cast" trick to forward to the const
versions.
[llvm] Proofread CIBestPractices.rst (#161324)
[SPIRV] Avoid OpQuantizeToF16 in SPIR-V kernel test (#158086)
This PR resolves the current failure in the
integer-casts.ll
SPIR-Vtest during CI runs in
llvm-project
.The failure occurs because the SPIR-V instruction
OpQuantizeToF16
requires the
Capability::Shader
. However, the function ininteger-casts.ll
is written as a kernel function and executed in akernel environment. Therefore,
Capability::Kernel
is emitted insteadof
Capability::Shader
. To fix this, we remove theQuantizeToF16
testfrom
integer-casts.ll
in this PR.[LAA] Make blockNeedsPredication arguments const (NFC).
The arguments aren't modified, mark them as const. This prepares for new
users in a follow-up, which only have access to const versions of the
arguments.
[PowerPC] Implement Elliptic Curve Cryptography (ECC) Instructions (#158362)
New instructions added:
[lld][macho][NFC] Factor count zeros into helper function (#161241)
Move
llvm::countr_zero()
into a helper function to reduce codeduplication between
CStringSection
andDeduplicatedCStringSection
.More importantly, this moves a giant comment to that helper function
since it pertains to both classes.
[CIR] Upstream ParenExpr for AggregateExpr (#160998)
Upstream ParenExpr support for AggregateExpr
[MLIR][SCF] Add loops as parameter to LoopTerminator callback when using CustomOp. (#161386)
This PR adds to the generateLoopTerminatorFn callback the loops
generated by GenerateLoopHeaderFn. This is needed to correctly set the
insertion point with scf.forall ops.
[AArch64] shouldFoldMaskToVariableShiftPair should be true for scalars up to the biggest legal type (#158069)
For AArch64, we want to do this up to 64-bits. Otherwise this results in
bloated code.
[llvm-readobj][NFC] Restore and disable clang-format for machine type list (#160122)
The original code was more readable, just disable
clang-format
forthis code.
See #159793
Signed-off-by: Sarnie, Nick [email protected]
[lld][MachO] Use llvm::Align and remove StringOffset type (#161253)
Use
llvm::Align
instead of directly storing the shift amount forclarity. Also remove the
DeduplicatedCStringSection::StringOffset
infavor of simply storing the
uint64_t
offset sincetrailingZeros
isnot used outside of
finalizeContents()
. These two changes allow us torefactor
finalizeContents()
.No function change intended.
Depends on #161241.
[MLIR][XeVM] Add XeVM special id ops. (#160735)
Add special GPU id, index ops.
[CIR] Implement ChooseExpr for AggregateExpr (#160999)
Implement the ChooseExpr for aggregate expr
[flang] Emit error on impossible-to-implement construct (#160384)
An assignment to a whole polymorphic allocatable changes its dynamic
type to the type of the right-hand side expression. But when the
assignment is under control of a WHERE statement, or a FORALL / DO
CONCURRENT with a mask expression, there is no interpretation of the
assignment, as the type of a variable must be the same for all of its
elements.
There is no restriction in the standard against this usage, and no other
Fortran compiler complains about it. But it is not possible to implement
it in general, and the behavior produced by other compilers is not
reasonable, much less worthy of emulating. It's best to simply disallow
it with an error message.
Fixes #133669, or more
accurately, resolves it.
[flang][runtime] Let more list-directed child input advance (#160590)
Whether list-directed child READ statements should be allowed to advance
to further records is neither explicit in the standard nor consistent in
existing Fortran implementations. We allow child namelist READ
statements to advance, but not other list- directed child input.
This patch refines our interpretation of this case. Child namelist READ
statements continue to be able to advance; in addition, non-namelist
child READ statements can now advance if their parent READ statement is
a list-directed input statement at the top level, or a child that could.
But non-namelist list-directed child input taking place in a context
with explicit format control won't advance to following records, so that
the format-controlled parent READ statement can retain control over
record advancement.
Also corrects two cases of record repositioning in numeric input
editing, which were failing under child input because they weren't
allowing for left tab limits.
Fixes #160351.
[flang][runtime] Expand IOTYPE and V_LIST (#160744)
The IOTYPE and V_LIST dummy arguments to a defined formatted I/O
subroutine are extracted from a DT edit descriptor in a FORMAT. They are
currently stored in the DataEdit structure, and their maximum sizes are
rather small since DataEdits are sometimes returned or passed by value.
This patch moves their storage into the FormattedIoStatementState
structure and enlarges them a bit.
Fixes #154954.
[flang] Fix crash in structure constructor lowering (#160769)
MLIR types created by lowering for structure constructors appear to be
sensitive to the ordering of their components in the typed expression
representation used for structure constructors and derived type constant
values.
At present, the components appear in source position order. When some
ancestral types are defined in modules, this ordering can cause their
components to be ordered after components defined in extended derived
types. This can lead to crashes from incompatible MLIR types.
To avoid this issue, sort structure constructor components first in
ascending order of derived type extension depth; retain source position
ordering for components in the same derived type and for error recovery
situations.
Fixes #143740.
[llvm] Use the VFS to make path absolute (#161271)
For the redirecting VFS, the
'overlay-relative'
option controlswhether external paths should be appended to the overlay directory. This
didn't always work as expected: when the overlay file path itself was
relative, its absolute path was decided by the real FS, not the
underlying VFS, and the resulting external path didn't exist in the
underlying VFS. This PR fixes this issue.
[flang] Don't retain FIXED/FREE compiler directives (#160780)
Some old code in the prescanner, antedating the current -E output
mechanisms, retains the !DIR$ FIXED and !DIR$ FREE directives in the
input, and will even generate them to append to the scanned source from
source and include files to restore the fixed/free source form
distinction. But these directives have not been needed since the -E
output generator began generating source form insensitive output, and
they can confuse the parser's error recovery when the appended
directives follow the END statement. Change their handling so that
they're read and respected by the prescanner but no longer retained in
either the -E output or the cooked character stream passed on to the
parser.
Fixes a regression reported by @DanielCChen after PR 159834.
[flang] Catch calls to impure intrinsics from PURE subprograms (#160947)
The code in expression semantics that catches a call to an impure
procedure in a PURE context misses calls to impure intrinsics, since
their designators have a SpecificIntrinsic rather than a Symbol. Replace
the current check with a new one that uses the characteristics of the
called procedure, which works for both intrinsic and non-intrinsic
cases.
Testing this change revealed that an explicit INTRINSIC statement wasn't
doing the right thing for extension "dual" intrinsics that can be called
as either a function or as a subroutine; the use of an INTRINSIC
statement would disallow its use as a subroutine. I've fixed that here
as well.
Fixes #157124.
[flang] Improve presentation of errors after last source line (#161391)
We don't emit source file names or line numbers for error messages at
EOF. Detect these and handle them a little better, pointing at the
newline at the end of the last source line instead.
[llvm] Fix build after #161260
The modular build was failing due to a missing include.
[TableGen, CHERI] Make CPtrWildcard test tolerant to unrelated changes (#161406)
Changes to llvm/include/llvm/IR/Intrinsics.td may change the constants
that are embedded in this test. Use wildcards, so that unrelated changes
do not trip over this test failing.
Fixes: #158426
[clang][modules] Virtualize module cache pruning (#149113)
This PR virtualizes module cache pruning via the new
ModuleCache
interface. Currently this is an NFC, but I left a FIXME in
InProcessModuleCache
to make this more efficient for the dependencyscanner.
[MLIR] Fix gpu.launch attribution argument printing (#161408)
This was broken and never tested.
Not only this could crash for stack-use-after-scope, but it also would
have printed something like:
insted of the SSA value.
It turns out the gpu.func already have a very similar helper that we can
reuse here.
Fixes #161394
[RISCV] Add missing CHECK lines for Zkt to sifive-p450/p470/p670 test. NFC (#161393)
[clang-sycl-linker] Generate SymbolTable for each image (#161287)
This PR adds extraction of kernel names for each image and stores them
to the Image's StringData field.
[clang] Cleanup docs and code for legacy no_sanitize attributes (NFC). (#161311)
Update generated docs for legacy attributes:
Those are older forms of no_sanitize("list", "of", "sanitizers")
attribute. They were previously as various spellings of the same
attribute, which made the auto-generated documentation confusing.
Fix this by explicitly making them three different attributes. This
would also allow to simplify the delegation to the new no_sanitize form
slightly, as we can instead rely on auto-generated code to check that
TSan and MSan can't be disabled for globals.
HTML docs before:

HTML docs after:

Co-authored-by: Erich Keane [email protected]
[CIR] Upstream RTTI Builder & RTTI for VTable Definitions (#160002)
Upstream the RTTI builder with helpers and used them in the VTable
Definitions
Issue #154992
[BOLT] Introduce helpers to match
MCInst
s one at a time (NFC) (#138883)Introduce a low-level instruction matching DSL to capture and/or match
the operands of MCInst, single instruction at a time. Unlike the
existing
MCPlusBuilder::MCInstMatcher
machinery, this DSL is intendedfor the use cases when the precise control over the instruction order is
required. For example, when validating PtrAuth hardening, all registers
are usually considered unsafe after a function call, even though
callee-saved registers should preserve their old
values under normal operation.
Usage example:
[llvm][NFC] Simplify implementation of
isa
(#161403)Using a fold instead of template recursion.
[lldb][NFC] Fix spelling of function in log message (#161261)
Fix spelling of
GetMemoryRegionInfo
function inlog message and comment and reformat code.
[VPlan] Handle scalar-VF in transforms (NFC) (#161365)
[flang] Implemented a warning about contiguity of compile time constant values (#161084)
Implemented
common::UsageWarning::ConstantIsContiguous
to warn aboutthe
following case:
Here, while array section is discontiguous,
arr
is a compile timeconstant,
so array section created at compile time will end up being contiguous
and
result
will be "true". Ifarr
wasn't a constant, the result atruntime
would have been "false".
Fix
run_clang_repl
output when not present (#161412)On the happy path, when
clang-repl
is present, we will invoke it inorder to determine if the host supports JIT features. That will return a
string containing "true". However, in cases where
clang-repl
is notpresent or we fail to invoke it, we previously returned
False
, whichwould then trigger a failure with our substring check. This PR updates
the function to return
""
instead, so the substring check is stillvalid.
This is related to #157359,
where the original change was introduced.
[flang] Add #include to fix MSVC build (#161415)
flang/lib/Evaluate/constant.cpp apparently needs this #include for MSVC
builds but somehow not for others.
[NFC] Remove trailing whitespaces from
clang/include/clang/Basic/Attr.td
[LAA] Fix picking context instr in evaluatePtrAddRec for multiple preds.
A loop may have more than one predecessor out of the loop. In that case,
just pick the first non-phi instruction in the loop header.
[compiler-rt][asan] Add wcscpy/wcsncpy; enable wcscat/wcsncat on Windows (#160493)
Summary
sanitizer_common).
Motivation
char* string checks.
Changes
mark read/write ranges in bytes.
missed overflows when sizeof(wchar_t) != 1.
on sanitizer_common interceptors for wcscat/wcsncat.
use resilient FileCheck patterns (reuse [[ADDR]], wildcard for function
suffixes and paths, flexible line numbers).
Testing
Follow-up to and based on prior work in PR #90909 (author: branh,
Microsoft); builds on that work and addresses review feedback. Thanks!
Signed-off-by: Yixuan Cao [email protected]
[clang-doc] Suppress long-name test on windows (#161424)
This seems to have broken some buildbots for a long time, so just
suppress it for now until we determine how/why.
[flang] Attempt to work around MSVC build problem (#161426)
Move a function that seems to be running into an MSVC problem from the
source file where I created it to another one (tools.cpp) that is
already known to be able to access the semantics::Scope type.
[MLIR][Standalone] gate wheel build behind MLIR_ENABLE_BINDINGS_PYTHON=ON (#161427)
If MLIR_ENABLE_BINDINGS_PYTHON=ON then
StandalonePythonModules
isn't a valid target.
[AMDGPU] Introduce and use NotUseRealTrue16Insts. NFC. (#161373)
This removes ~2000 lines from both AMDGPUGenDAGISel.inc and
AMDGPUGenGlobalISel.inc.
[OpenACC][CIR] Fix transform inclusive scan init parameter (#161428)
This fixes macos build, where otherwise the compilation yields an error:
no viable conversion from 'bool' to 'typename iterator_traits<const QualType *>::value_type
[LAA] Add tests for using inbounds flags only used in predicated blocks.
Test for #160912.
Revert "[VPlan] Compute cost of more replicating loads/stores in ::computeCost. (#160053)"
This reverts commit b4be7ec.
See #161404 for a crash
exposed by the change. Revert while I investigate.
[CIR] Upstream support for generating global ctor regions (#161298)
This adds support for handling global variables with non-trivial
constructors. The constructor call is emitted in CIR as a 'ctor' region
associated with the global definition. This form of global definition
cannot be lowered to LLVM IR yet.
A later change will add support in LoweringPrepare to move the ctor code
into a __cxx_global_var_init() function and add that function to the
list of global global ctors, but for now we must stop at the initial CIR
generation.
[HLSL][NFC] Add helper struct to simplify dealing with resource binding attributes (#161254)
Add new
ResourceBindingAttrs
struct that holds resource binding attributesHLSLResourceBindingAttr
andHLSLVkBindingAttr
and provides helper methods to simplify dealing with resource bindings. This code is placed in the AST library to be shared between Sema and CodeGen.This change has been done in preparation of a third binding attribute coming soon to represent
[[vk::counter_binding()]]
. This new attribute and more helper member functions will be added toResourceBindingAttrs
and will be used in both Sema and in CodeGen to implement resource counter initialization.[flang] Add missing #include for MSVC (#161437)
I moved a function to Evaluate/tools.cpp in an attempt to dodge some
MSVC compiler issue but didn't add an include directive for
Evaluate/tools.h to Evaluate/constant.cpp.
[OpenMP] Update 6.1 implementation status. (#161449)
@jhuber6: Please review
[NFC][LLVM] Use ListSeparator in AsmWriter (#161422)
Use
ListSeparator
instead of manual code when generating commaseparated lists. Also replace
FieldSeparator
withListSeparator
asthey both provide identical functionality.
[NFC] [IndVarSimplify] add overflowing tests (#159877)
Also use UTC for test instead.
Fix memory leak in Offloading API (#161430)
Fix or the failing Sanitizer buildbots from PR:
#143342
[CodingStandard] Require Unix line endings for all files (#161228)
Require all files to use Unix line endings, formalizing an already
followed convention.
[AMDGPU] Precommit test for 160181
[llvm][mustache] Fix failing StandaloneIndentation test (#159192)
When rendering partials, we need to use an indentation stream,
but when part of the partial is a unescaped sequence, we cannot
indent those. To address this, we build a common MustacheStream
interface for all the output streams to use. This allows us to
further customize the AddIndentationStream implementation
and opt it out of indenting the UnescapeSequence.
[llvm][mustache] Simplify debug logging (#159193)
The existing logging was inconsistent, and we logged too many things.
This PR introduces a more principled schema, and eliminates many,
redundant log lines.
[JITLink][MachO] Use Triple::isArm64e consistently.
Revert "Fix memory leak in Offloading API" (#161465)
Reverts #161430
[RISCV] Remove Zicntr from sifive-p450/p470/p670. (#161444)
These cores don't implement the
time
CSR. They require SBI to trap andemulate it which is allowed by RVA20U.
[RISCV] Rename BFloatVectors -> BF16Vectors in tablegen. NFC (#161469)
Part of this rename is taken from #161158, but applies it more
consistently to more variables.
I think using BF16 makes it easier to not confuse BFloat and Float when
reading.
[flang][driver] Accelerate complex division when
-ffast-math
is specified (#159689)This patch accelerates complex division by passing
-complex-range=basic
to the frontend when the-ffast-math
option isspecified. This behavior is the same as
-fcomplex-arithmetic=basic
. Awarning is issued if a different value is specified for
-fcomplex-arithmetic=
. The warning conditions will be unified withclang.
Reserve R9 on armv6 iOS 2.x (#150835)
The iOS 2.x ABI had R9 as a reserved register, 3.0 made it available,
but support for the 2.x ABI was never added to LLVM. We only use the 2.x
ABI on armv6 since before 3.0 armv6 was the only architecture supported
by iOS.
[HLSL][NFC] Add missing includes for standalone header compilation (#161473)
HLSLResource.h added by #161254 builds in the context of a .cpp file
(e.g. CGHLSLRuntime.cpp) but not when doing a header compilation, e.g.:
[Support] Fix warnings
This patch fixes:
llvm/lib/Support/Mustache.cpp:332:20: error: unused function
'tagKindToString' [-Werror,-Wunused-function]
llvm/lib/Support/Mustache.cpp:344:20: error: unused function
'jsonKindToString' [-Werror,-Wunused-function]
[ADT] Consolidate uninitialized_copy in SmallVector (NFC) (#161043)
This patch consolidates two implementations of uninitialized_copy into
a single template function.
[LVI] Handle constant value lattice in
getEdgeValueLocal
(#161410)Closes #161367.
In #157614, we ignored cases
where OpLatticeVal might be a constant or notconstant. Directly
returning the result causes a type mismatch. I apologize for the
oversight in the previous code review.
This patch applies the cast op to constants. For notconstant value
lattices, I'd leave it as a todo (it is similar to the constant case,
except for trunc without nsw/nuw).
[IR] Introduce !captures metadata (#160913)
This introduces
!captures
metadata on stores, which looks like this:The semantics are the same as replacing the store with a call like this:
This metadata is intended for annotation by frontends -- it's not
something we can feasibly infer at this point, as it would require
analyzing uses of the pointer stored in memory.
The motivating use case for this is Rust's
println!()
machinery, whichinvolves storing a reference to the value inside a structure. This means
that printing code (including conditional debugging code), can inhibit
optimizations because the pointer escapes. With the new metadata we can
annotate this as a read-only capture, which has less impact on
optimizations.
[RISCV] Add commutative support for Qualcomm uC Xqcics extension (#161328)
This is a follow-up to #160653 doing similar changes for Xqcics.
[flang] add helper to create descriptor with new base address (#161347)
There is currently no helper to create a descriptor for a copy of a
Fortran entity based on the descriptor of the original entity and the
base address of the copy (most places that are doing this currently are
also doing allocation of the copy at the same time or using the
runtime).
Add a helper for this with a unit test.
[lldb][IRExecutionUnit] Return error on failure to resolve function address (#161363)
Starting with #148877 we
started encoding the module ID of the function DIE we are currently
parsing into its
AsmLabel
in the AST. When the JIT asks LLDB toresolve our special mangled name, we would locate the module and resolve
the function/symbol we found in it.
If we are debugging with a
SymbolFileDWARFDebugMap
, the module ID weencode is that of the
.o
file that is tracked by the debug-map. Toresolve the address of the DIE in that
.o
file, we have to askSymbolFileDWARFDebugMap::LinkOSOAddress
to turn the address of the.o
DIE into a real address in the linked executable. This will onlywork if the
.o
address was actually tracked by the debug-map. However,if the function definition appears in multiple
.o
files (which is thecase for functions defined in headers), the linker will most likely
de-deuplicate that definition. So most
.o
's definition DIEs for thatfunction won't have a contribution in the debug-map, and thus we fail to
resolve the address.
When debugging Clang on Darwin, e.g., you'd see:
unless you were stopped in the
.o
file whose definition ofgetName
made it into the final executable.
The fix here is to error out if we fail to resolve the address, causing
us to fall back on the old flow which did a lookup by mangled name,
which the
SymbolFileDWARFDebugMap
will handle correctly.An alternative fix to this would be to encode the
SymbolFileDWARFDebugMap
's module-id. And implementSymbolFileDWARFDebugMap::ResolveFunctionCallLabel
by doing a mangledname lookup. The proposed approach doesn't stop us from implementing
that, so we could choose to do it in a follow-up.
rdar://161393045
[mlir][transform] Add PromoteTensorOp (#158318)
Transform op to request a tensor value to live in a specific memory
space after bufferization
Co-authored-by: Nicolas Vasilache [email protected]
Co-authored-by: Alex Zinenko [email protected]
[MemorySanitizer] Generate check lines for some vararg tests (NFC)
Use UTC_ARGS: --disable to skip the tests with many arguments.
[flang][debug] Change type*N to type(kind=N). (#161432)
It was discussed in #161361.
[MemorySanitizer] Generate test checks for kmsan test (NFC)
[InstCombine] Opt phi(freeze(undef), C) -> phi(C, C) (#161181)
Try to choose a value for freeze that enables the PHI to be replaced
with its input constants if they are equal.
[libc][math] Refactor exp10m1f16 implementation to header-only in src/__support/math folder. (#161119)
Part of #147386
in preparation for: https://discourse.llvm.org/t/rfc-make-clang-builtin-math-functions-constexpr-with-llvm-libc-to-support-c-23-constexpr-math-functions/86450
[CIR] Refactor cir.cast to use uniform assembly form w/o parens, commas (#161431)
This mirrors incubator changes from llvm/clangir#1922
[GlobalOpt] Check if users are CallBase when changing CC (#161399)
Fixes #156656
hasChangeableCCImpl
guarantees the address of the function is nottaken, but it ignores assume-like calls.
This patch ignores assume-like calls when changing CC.
[AMDGPU][InsertWaitCnts] Refactor some helper functions, NFC (#161160)
only used once or twice.
The goal is simply to reduce the noise in SIInsertWaitCnts without
hiding functionality. I focused on moving trivial helpers, or helpers
with very descriptive/verbose names (so it doesn't hide too much logic
away from the pass), and that have some reusability potential.
I'm also trying to make the code style more consistent. It doesn't make
sense to see a function call
TII->isXXX
then suddenly call a randomisY
method that just wraps aroundTII->isY
.The context of this work is that I'm trying to learn how this pass
works, and while going through the code I noticed some little things
here and there that I thought would be good to fix.
[AMDGPU][SIInsertWaitCnts] De-duplicate code (NFC) (#161161)
I'm reading through the pass over and over again to try and learn how it works. I noticed some code duplication here and there while doing that.
[DAGCombine] Support (shl %x, constant) in foldPartialReduceMLAMulOp. (#160663)
Support shifts in foldPartialReduceMLAMulOp by treating (shl %x, %c) as
(mul %x, (shl 1, %c)).
PR: #160663
[AMDGPU] Remove duplicate definition of isGFX12CacheInvOrWBInst
Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFC. (#161496)
[lldb][Mangled][NFC] Remove redundant const-qualifier on llvm::StringRef argument
[AMDGPU][SIInsertWaitCnts] Remove redundant TII/TRI/MRI arguments (NFC) (#161357)
WaitCntBrackets already has a pointer to its SIInsertWaitCnt instance.
With a small change, it can directly access TII/TRI/MRI that way.
This simplifies a lot of call sites which make the code easier to
follow.
[lldb][TypeSystemClang] Added unique builtins types for __bf16 and _Float16 (#157674)
During debugging applization with __bf16 and _Float16 float types it was
discovered that lldb creates the same CompilerType for them. This can
cause an infinite recursion error, if one tries to create two struct
specializations with these types and then inherit one specialization
from another.
[MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in OpenMPDialect.cpp (NFC)
[BOLT] Gadget scanner: optionally assume auth traps on failure (#139778)
On AArch64 it is possible for an auth instruction to either return an
invalid address value on failure (without FEAT_FPAC) or generate an
error (with FEAT_FPAC). It thus may be possible to never emit explicit
pointer checks, if the target CPU is known to support FEAT_FPAC.
This commit implements an --auth-traps-on-failure command line option,
which essentially makes "safe-to-dereference" and "trusted" register
properties identical and disables scanning for authentication oracles
completely.
[BOLT] Gadget scanner: make use of C++17 features and LLVM helpers (#141665)
Perform trivial syntactical cleanups:
This patch is NFC aside from minor debug output changes.
[MLIR] Apply clang-tidy fixes for performance-move-const-arg in SimplifyAffineMinMax.cpp (NFC)
[AArch64] Some tests for cbz/tbz with wzr. NFC
[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in Rewrite.cpp (NFC)
[MLIR] Apply clang-tidy fixes for performance-unnecessary-copy-initialization in InferIntRangeCommon.cpp (NFC)
[clang][bytecode] Fix integral cast edge case (#161506)
We were converting the
ASInt
to as sign-lessAPInt
too early andlosing the sign information.
[mlir][memref] Introduce
memref.distinct_objects
op (#156913)The
distinct_objects
operation takes a list of memrefs and returns alist of memrefs of the same types, with the additional assumption that
accesses to these memrefs will never alias with each other. This means
that loads and stores to different memrefs in the list can be safely
reordered.
The discussion
https://discourse.llvm.org/t/rfc-introducing-memref-aliasing-attributes/88049
[lldb][CPlusPlusLanguage] Avoid redundant const char* -> StringRef roundtrip (#161499)
We've been seen (very sporadic) lifetime issues around this area. Here's
an example backtrace:
Looks like we're calling
strlen
on a nullptr. I stared at thiscodepath for a while but am still not sure how that could happen unless
the underlying
ConstString
somehow pointed to corrupted data.But
SymbolNameFitsToLanguage
does some roundtripping through aconst char*
before callingGetManglingScheme
. No other callsite does thisand it just seems redundant.
This patch cleans this up.
rdar://161128180
[MLIR] Remove unused debug macros (NFC)
[NFC][LLVM][AsmWriter] Move type printing to
WriteAsOperandInternal
(#161456)Add option to
WriteAsOperandInternal
to print the type and use that toeliminate explicit type printing code in several places.
[CodeGen] Remove
shouldExpandPartialReductionIntrinsic()
hook (NFC) (#161498)This is unused. Targets can lower/expand the
PARTIAL_REDUCE_*
ISDnodes.
[NFC][AArch64][ISEL] Remove unnecessary predicates from partial_reduce_*mla patterns.
[OpenACC][CIR] Implement 'alloca copying' for private lowering (#161382)
The previous patch ensured that we correctly got the allocas put in
place. This patch takes the address of each element of each alloca, and
copies it to the previous one. This allows us to re-form the
pointer-structure for a recipe.
[MLIR][Transform][Tune] Introduce
transform.tune.alternatives
op (#160724)This op enables expressing uncertainty regarding what should be
happening at particular places in transform-dialect schedules. In
particular, it enables representing a choice among alternative regions.
This choice is resolved through providing a
selected_region
argument.When this argument is provided, the semantics are such that it is valid
to rewrite the op through substituting in the selected region -- with
the op's interpreted semantics corresponding to exactly this.
This op represents another piece of the puzzle w.r.t. a toolkit for
expressing autotuning problems with the transform dialect. Note that
this goes beyond tuning knobs on transforms, going further by making
it tunable which (sequences of) transforms are to be applied.
AMDGPU: Add peephole opt baseline tests (#161309)
Add tests which show missed folds of subregister extracts with
intermediate full copies.
[X86] Add test showing failure to remove sign splats from PACKSS intrinsics (#161518)
PACKSS intrinsic calls are only expanded to X86ISD::PACKSS nodes during
legalisation, after which time we fail to handle cases where ASHR sign
splats (now lowered to X86ISD::VSRAI) are unnecessary.
Add additional example of FREEZE(PACKSS()) as that's an issue as well.
[InstCombine] Drop poison-generating flags when reusing existing or instruction (#161504)
Closes #161493.
[InstCombine] Avoid self-replacing in
getUndefReplacement
(#161500)Self-replacing has a different meaning in InstCombine. It will replace
all uses with poison.
Closes #161492.
[DFAJumpThreading] Unfold select to the incoming block of phi user (#160987)
Fixes #160250
We previously assumed the select to unfold is defined in the incoming
block of phi user, as
isValidSelectInst
filters other cases at theinitial stage. However, the selects not defined in the incoming block
may occur after unfolding the arms of the unfolded select.
This patch sinks the select into the incoming block of the phi user and
unfolds it at the incoming block.
[MLIR] Add sincos fusion pass (#161413)
We see performance improvements from using sincos to reuse calculations
in hot loops that compute sin() and cos() of the same operand. Add a
pass to identify sin() and cos() calls in the same block with the same
operand and fast-math flags, and fuse them into a sincos op.
Follow-up to:
[AMDGPU] Use common allUsesAvailableAt implementation [nfc] (#161418)
Replace the target specific copy with a call to the generic routine. I
don't spot any differences by eye, and there's nothing in the original
review discussion (#124327) which makes it clear why this was
duplicated.
[AArch64][SME] Precommit tests for LUT4I
Chain
issues (NFC) (#161505)These tests show that
luti4
intrinsics are currently incorrectlyCSD'd.
[clang] Convert second arg of __builtin_assume_aligned to ConstantExpr (#161314)
Since the second argument must be a constant integer, we can as well
convert it to a
ConstantExpr
in Sema.Fixes #161272
[X86] SimplifyDemandedBitsForTargetNode - generalize X86ISD::VSRAI handling when only demanding 'known signbits' (#161523)
If we only demand bits that already match the signbit then we don't need to shift.
Generalizes an existing pattern that just handled signbit-only demanded bits to match what we do for ISD::SRA.
[OpenACC] Remove unnecessary uses of
getResult
, fix cast tests (#161526)A previous review comment pointed out that operations with only a single
result implicitly convert to
mlir::Value
. This patch removes theexplicit use of
getResult
where it is unnecessary in OpenACC lowering.However, there ARE a few cases where it is necessary where the
mlir::ValueRange
implicit constructor from a single value is beingused, so those are untouched.
Additionally, while the previous patch was being committed (#161382), a
second patch (#161431) changed the format of cir.casts, so this patch
fixes the additional test lines for that as well.
[LLVM][SCEV] udiv (mul nuw a, vscale), (mul nuw b, vscale) -> udiv a, b (#157836)
[lldb][test] Fix bf16 test cases on Arm 32-bit (#161528)
Fixes #157674
On ARM, the presence of a specific bf16 type in the AST is gated by:
And the target we use when evaluating symbols (derived from the program
file, I think, haven't found it yet) does not enable any of this.
This means that we fall back to __fp16.
So for parts of the testing we just need to expect __fp16 instead