Skip to content

Conversation

@stumpOS
Copy link
Owner

@stumpOS stumpOS commented Apr 23, 2025

No description provided.

superbobry and others added 30 commits April 22, 2025 21:53
…#136676)

It claimed to return an `io.StringIO` or an `io.BytesIO`, but it did in
fact return `str` or `bytes`.
…lvm#136762)

Any kill flags that were present for the old register are not valid for
the replacement and the replacement may have extended the live range of
the replacement register.
This is a follow-up of 13aac46.
This commit adjusts the implementation of `hasBooleanRepresentation` to
be somewhat aligned to `hasIntegerRepresentation`.
In particular vector of booleans should be handled in
`hasBooleanRepresentation`, while `_Atomic(bool)` should not.
…136779)

The setter is only used when changing the setting programmatically. When
using the settings command, we need to monitor SetPropertyValue.
…vm#102731)

DAG combiner already does this transformation, but in some cases it does
not have a chance because either CodeGenPrepare or SelectionDAGBuilder
move icmp to a different basic block.

https://alive2.llvm.org/ce/z/ARzh99

Fixes llvm#94829

Pull Request: llvm#102731
Andes N45/NX45 are 32/64bit in-order dual-issue 8-stage pipeline CPU
architecture implementing the RV[32|64]IMAFDC_Zba_Zbb_Zbs ISA
extensions. They are developed by Andes Technology
https://www.andestech.com, a RISC-V IP provider.

The overviews for N45/NX45:
https://www.andestech.com/en/products-solutions/andescore-processors/riscv-n45/
https://www.andestech.com/en/products-solutions/andescore-processors/riscv-nx45/

Scheduling model will be implemented in a later PR.
At the moment, the `CHECK-SAME` lines generated by
"generate-test-checks.py" (i.e. check-lines that correspond to the
preceeding `CHECK-LABEL` line) are indented to match the label length.
For example,

```mlir
func.func @batch_reduce_matmul_bcast_k_to_fill_missing_dims_A(%arg0: memref<5xf32>, %arg1: memref<2x5x7xf32>, %arg2: memref<3x7xf32>) {
  linalg.batch_reduce_matmul indexing_maps = (...)
}
```

will lead to the following:

```mlir
// CHECK-LABEL:   func.func @batch_reduce_matmul_bcast_k_to_fill_missing_dims_A(
// CHECK-SAME:                                                                  %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<5xf32>,
// CHECK-SAME:                                                                  %[[VAL_1:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<2x5x7xf32>,
// CHECK-SAME:                                                                  %[[VAL_2:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<3x7xf32>) {
// CHECK:           linalg.batch_reduce_matmul indexing_maps = (...)
```

This indentation is unnecasarilly deep. With this change, for labales
that are longer than 20 chars, the indentation is trimmed to 4 spaces:
```mlir
// CHECK-LABEL:   func.func @batch_reduce_matmul_bcast_k_to_fill_missing_dims_A(
// CHECK-SAME:        %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<5xf32>,
// CHECK-SAME:        %[[VAL_1:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<2x5x7xf32>,
// CHECK-SAME:        %[[VAL_2:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<3x7xf32>) {
// CHECK:           linalg.batch_reduce_matmul indexing_maps = (...)
```
 * Only show for blocks 10 lines or taller (including braces)
 * Add parens for function call: "// if foo" -> "// if foo()" or "// if foo(...)"
 * Print literal nullptr
 * Escaping for abbreviated strings

Fixes clangd/clangd#1807.

Based on the original PR at llvm#72345.

Co-authored-by: daiyousei-qz <[email protected]>
…5596)

InstructionCost is already an optional value, containing an Invalid
state that can be checked with isValid(). There is little point in
returning another optional from getValue(). Most uses do not make use of
it being a std::optional, dereferencing the value directly (either
isValid has been checked previously or the Cost is assumed to be valid).
The one case that does in AMDGPU used value_or which has been replaced
by a isValid() check.
GCC on Cygwin and MSYS2 are built with --enable-__cxa_atexit.

Adjust test to expect this change.
)"

This reverts commit 8fc8a84, which caused a
regression.

Fixes llvm#136675.
DAGCombiner::hoistLogicOpWithSameOpcodeHands will hoist

(or disjoint (ext a), (ext b)) -> (ext (or disjoint a, b))

So this adds patterns to match vwadd[u].v{v,x} in this case.

We have to teach the combine to preserve the disjoint flag.
Fix a reference to getValue() being optional in InlineSizeEstimatorAnalysis, a
file that is not included in the default build. A "warning: enumerated and
non-enumerated type in conditional expression" warning is fixed in AMDGPU too.
…ules Doc (llvm#136719)

"Dependant BMI" / "Dependent BMI" was used incorrectly in the
documentation:
"Dependent BMI" refers to a BMI that depends on the current TU, but it
was used for the BMI that current TU depends on.

I replaced all the mentions with "BMI dependency".
…put operands. (llvm#135961)

It looks like this code is only considering buildvector inputs,
expecting the inputs to have at least 16 operands. This adds a check to
make sure that is true.

Fixes llvm#135950
After upgrading the default code model from small to medium on
LoongArch, function calls using expression may fail. This is because the
function call instruction has changed from `bl` to `pcalau18i + jirl`,
but `RuntimeDyld` does not handle out-of-range jumps for this
instruction sequence.

This patch fixes: llvm#136561

Reviewed By: SixWeining

Pull Request: llvm#136563
Dear developer:

I have recently working with LLVM IR and I want to isolate basic blocks
using the command "llvm-extract". However, I found that the command
option "llvm-extract --bb func_name:bb_name" will only function when
dumping source code into IRs with options "-fno-discard-value-names".
That is to say, the "llvm-extract" command cannot support unnamed basic
blocks, which is a default output of the compiler. So, I made these
changes and hope they will make LLVM better.

Best regards,

Co-authored-by: Yilin Li <[email protected]>
mrkajetanp and others added 26 commits April 24, 2025 16:09
…136856)

In order for precompiled headers to work with ccache, a specific flag
needs to be passed to the compiler and ccache's sloppiness configuration
option needs to be set appropriately.

Due to issues with configuring CMake on certain Windows platforms, set
the required ccache option only on non-Windows systems for the time
being.

-----

Signed-off-by: Kajetan Puchalski <[email protected]>
… isStore and a memory VT. (llvm#137080)

This removes the need to explicitly set isTruncStore on truncstorei8 and
other similar PatFrags that include truncstore in their frags DAG.

This allows some new patterns to be imported for AMDGPU as you can see
in the changed test.

The extra isTruncStore were added in ae2b36e, along with some
other tablegen changes to look for MemoryVT along with isTruncStore. I
did not remove the code, because I'm not sure if any out of tree users
have become dependent on it. It's no longer exercised in tree.
llvm#136363)

These were added to the migration from v4 to v5 and should be removed
now
that the default has changed.
Static analysis flagged this code b/c we are copying the temp variable
back in when we could move it instead.
…vm#136733)

We're duplicating uses here, so we need to freeze the inputs.

---------

Co-authored-by: Luke Lau <[email protected]>
Add some intrinsics and LIT tests for PPC dmr insert/extract
instructions.
…ignExtLoad/isZeroExtLoad for IsAtomic in SelectionDAG. (llvm#137096)

Support isAnyExtLoad() for IsAtomic in GISel.

Modify atomic_load_az* to check for extload or zextload. And rename to
atomic_load_azext*

Add atomic_load_asext* and use in RISC-V. I used "asext" rather than
"as" so it wouldn't be confused with the word "as".
)

Handle the case where the base expression is a pointer to a vector type.

rdar://149223362
…vm#130781)

MASM supports some built-in macro-type functions.

We start our support for these with `@CatStr`, one of the more commonly used.
…lvm#135074)

This PR is a second attempt for issue llvm#111743 to finish reverted PR
llvm#113925.

Added option "--unify-instantiations" to llvm-cov export to combine branch execution counts of C++ template instantiations.  Fix non-deterministic behavior.
PR llvm#131756 introduced a patch to fix a deadlock between LSan and ASan.

The relevant deadlock only occurs when LSan is enabled and
`dl_iterate_phdr` is used for Stop-the-World, i.e., under the condition
`CAN_SANITIZE_LEAKS && (SANITIZER_LINUX || SANITIZER_NETBSD)`.

Therefore, this commit also sets the effective condition of this patch
to the above condition, avoiding unnecessary problems in other
environments, e.g., stack overflow on MSVC/Windows.
llvm#136747)

- It was determined to define the parsing methods much more inline with
a recursive descent parser to follow the EBNF notation better
- As part of this change, we decided to go with a calling convention to
the parse.* methods of returning an optional rather than a bool and a
reference to the parsed struct

This is a clean-up task from
llvm#133800
…on (llvm#137073)

When an asynchronous allocation is made, we call `cudaMallocAsync` with
a stream. For deallocation, we need to call `cudaFreeAsync` with the
same stream. in order to achieve that, we need to track the allocation
and their respective stream.

This patch adds a simple sorted array of asynchronous allocations. A
binary search is performed to retrieve the allocation when deallocation
is needed.
Avoid baking in absolute paths in check lines generated for DIFile
metadata. Generated test checks cannot be sensitive to absolute paths
anyway, as those vary with the environment, but there could be
situations where some sensitivity to partial paths is required for
certain tests. This implementation just assumes such tests aren't worth
the effort to support, but it could be supported in the future.

This is most useful for update_cc_test_checks with debug info enabled,
where the test writer cannot manipulate the paths within the generated
IR directly.
…3231)

Below are two examples of "narrow" `vector.stores`. The first example
  does not require partial stores and hence no RMW stores. This is
  currently emulated correctly.
  ```mlir
  func.func @example_1(%arg0: vector<4xi2>) {
      %0 = memref.alloc() : memref<13xi2>
      %c4 = arith.constant 4 : index
      vector.store %arg0, %0[%c4] : memref<13xi2>, vector<4xi2>
      return
  }
  ```

  The second example requires a partial (and hence RMW) store due to the
  offset pointing outside the emulated type boundary (`%c3`).
  ```mlir
  func.func @example_2(%arg0: vector<4xi2>) {
      %0 = memref.alloc() : memref<13xi2>
      %c3 = arith.constant 3 : index
      vector.store %arg0, %0[%c3] : memref<13xi2>, vector<4xi2>
      return
  }
  ```

  This is currently incorrectly emulated as a single "full" store (note
  that the offset is incorrect) instead of partial stores:
  ```mlir
  func.func @example_2(%arg0: vector<4xi2>) {
    %alloc = memref.alloc() : memref<4xi8>
    %0 = vector.bitcast %arg0 : vector<4xi2> to vector<1xi8>
    %c0 = arith.constant 0 : index
    vector.store %0, %alloc[%c0] : memref<4xi8>, vector<1xi8>
    return
  }
  ```

  The incorrect emulation stems from this simplified (i.e. incomplete)
  calculation of the front padding:
  ```cpp
      std::optional<int64_t> foldedNumFrontPadElems =
          isDivisibleInSize ? 0
: getConstantIntValue(linearizedInfo.intraDataOffset);
  ```

  Since `isDivisibleInSize` is `true` (i8 / i2 = 4):
    * front padding is set to `0` and, as a result,
    * the input offset (`%c3`) is ignored, and
    * we incorrectly assume that partial stores won't be needed.

  Note that in both examples we are storing `vector<4xi2>` into
  `memref<13xi2>` (note _different_ trailing dims) and hence partial
  stores might in fact be required. The condition above is updated to:
  ```cpp
      std::optional<int64_t> foldedNumFrontPadElems =
          (isDivisibleInSize && trailingDimsMatch)
              ? 0
              : getConstantIntValue(linearizedInfo.intraDataOffset);
  ```

  This change ensures that the input offset is properly taken into
  account, which fixes the issue. It doesn't affect `@example1`.

  Additional comments are added to clarify the current logic.
Fixes parsing of an ObjC type encoding such as `{?="a""b"}`. Parsing of such a type
encoding would lead to an assert. This was observed when running `language objc
class-table dump`.

The function `ReadQuotedString` consumes the closing quote, however one of its two
callers (`ReadStructElement`) was also consuming a quote. For the above type encoding,
where two quoted strings occur back to back, the parser would unintentionally consume
the opening quote of the second quoted string - leaving the remaining text with an
unbalanced quote.

This changes fixes `ReadStructElement` to not consume a quote after calling
`ReadQuotedString`.

For callers to know whether a string was successfully parsed, `ReadQuotedString` now
returns an optional string.
This PR makes another piece of the
`CompilerInstance::cloneForModuleCompile()` result thread-safe: the
module build stack. This data structure is used to detect cyclic
dependencies between modules. The problem is that it uses
`FullSourceLoc` which refers to the `SourceManager` of the parent
`CompilerInstance`: if two threads happen to execute `CompilerInstance`s
cloned from the same parent concurrently, and both discover a dependency
cycle, they may concurrently access the parent `SourceManager` when
emitting the diagnostic, creating a data race.

In this PR, we prevent this by keeping the stack empty and moving the
responsibility of cycle detection to the client. The client can recreate
the same module build stack externally and ensure thread-safety by
enforcing mutual exclusion.
…_TAG_lexical_blocks (llvm#136205)

During the discussion under
llvm#119001, it was noticed that
concrete DW_TAG_lexical_blocks should refer to corresponding abstract
DW_TAG_lexical_blocks by having DW_AT_abstract_origin, to avoid
ambiguity. This behavior is implemented in GCC
(https://godbolt.org/z/Khrzdq1Wx), but not in LLVM.

Fixes llvm#49297.
…llvm#136413)

Make sure the builtin header sqrts work with
-fno-hip-f32-correctly-rounded-divide-sqrt, and we end up with
properly annotated sqrt intrinsic callsites.
stumpOS pushed a commit that referenced this pull request May 6, 2025
`clang-repl --cuda` was previously crashing with a segmentation fault,
instead of reporting a clean error
```
(base) anutosh491@Anutoshs-MacBook-Air bin % ./clang-repl --cuda
#0 0x0000000111da4fbc llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/local/libexec/llvm-20/lib/libLLVM.dylib+0x150fbc)
#1 0x0000000111da31dc llvm::sys::RunSignalHandlers() (/opt/local/libexec/llvm-20/lib/libLLVM.dylib+0x14f1dc)
#2 0x0000000111da5628 SignalHandler(int) (/opt/local/libexec/llvm-20/lib/libLLVM.dylib+0x151628)
#3 0x000000019b242de4 (/usr/lib/system/libsystem_platform.dylib+0x180482de4)
llvm#4 0x0000000107f638d0 clang::IncrementalCUDADeviceParser::IncrementalCUDADeviceParser(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, clang::CompilerInstance&, llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem>, llvm::Error&, std::__1::list<clang::PartialTranslationUnit, std::__1::allocator<clang::PartialTranslationUnit>> const&) (/opt/local/libexec/llvm-20/lib/libclang-cpp.dylib+0x216b8d0)
llvm#5 0x0000000107f638d0 clang::IncrementalCUDADeviceParser::IncrementalCUDADeviceParser(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, clang::CompilerInstance&, llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem>, llvm::Error&, std::__1::list<clang::PartialTranslationUnit, std::__1::allocator<clang::PartialTranslationUnit>> const&) (/opt/local/libexec/llvm-20/lib/libclang-cpp.dylib+0x216b8d0)
llvm#6 0x0000000107f6bac8 clang::Interpreter::createWithCUDA(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>) (/opt/local/libexec/llvm-20/lib/libclang-cpp.dylib+0x2173ac8)
llvm#7 0x000000010206f8a8 main (/opt/local/libexec/llvm-20/bin/clang-repl+0x1000038a8)
llvm#8 0x000000019ae8c274 
Segmentation fault: 11
```


The underlying issue was that the `DeviceCompilerInstance` (used for
device-side CUDA compilation) was never initialized with a `Sema`, which
is required before constructing the `IncrementalCUDADeviceParser`.


https://github.com/llvm/llvm-project/blob/89687e6f383b742a3c6542dc673a84d9f82d02de/clang/lib/Interpreter/DeviceOffload.cpp#L32


https://github.com/llvm/llvm-project/blob/89687e6f383b742a3c6542dc673a84d9f82d02de/clang/lib/Interpreter/IncrementalParser.cpp#L31

Unlike the host-side `CompilerInstance` which runs `ExecuteAction`
inside the Interpreter constructor (thereby setting up Sema), the
device-side CI was passed into the parser uninitialized, leading to an
assertion or crash when accessing its internals.

To fix this, I refactored the `Interpreter::create` method to include an
optional `DeviceCI` parameter. If provided, we know we need to take care
of this instance too. Only then do we construct the
`IncrementalCUDADeviceParser`.
stumpOS pushed a commit that referenced this pull request May 6, 2025
llvm#138091)

Check this error for more context
(https://github.com/compiler-research/CppInterOp/actions/runs/14749797085/job/41407625681?pr=491#step:10:531)

This fails with 
```
* thread #1, name = 'CppInterOpTests', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x55500356d6d3)
  * frame #0: 0x00007fffee41cfe3 libclangCppInterOp.so.21.0gitclang::PragmaNamespace::~PragmaNamespace() + 99
    frame #1: 0x00007fffee435666 libclangCppInterOp.so.21.0gitclang::Preprocessor::~Preprocessor() + 3830
    frame #2: 0x00007fffee20917a libclangCppInterOp.so.21.0gitstd::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 58
    frame #3: 0x00007fffee224796 libclangCppInterOp.so.21.0gitclang::CompilerInstance::~CompilerInstance() + 838
    frame llvm#4: 0x00007fffee22494d libclangCppInterOp.so.21.0gitclang::CompilerInstance::~CompilerInstance() + 13
    frame llvm#5: 0x00007fffed95ec62 libclangCppInterOp.so.21.0gitclang::IncrementalCUDADeviceParser::~IncrementalCUDADeviceParser() + 98
    frame llvm#6: 0x00007fffed9551b6 libclangCppInterOp.so.21.0gitclang::Interpreter::~Interpreter() + 102
    frame llvm#7: 0x00007fffed95598d libclangCppInterOp.so.21.0gitclang::Interpreter::~Interpreter() + 13
    frame llvm#8: 0x00007fffed9181e7 libclangCppInterOp.so.21.0gitcompat::createClangInterpreter(std::vector<char const*, std::allocator<char const*>>&) + 2919
```

Problem : 

1) The destructor currently handles no clearance for the DeviceParser
and the DeviceAct. We currently only have this

https://github.com/llvm/llvm-project/blob/976493822443c52a71ed3c67aaca9a555b20c55d/clang/lib/Interpreter/Interpreter.cpp#L416-L419

2) The ownership for DeviceCI currently is present in
IncrementalCudaDeviceParser. But this should be similar to how the
combination for hostCI, hostAction and hostParser are managed by the
Interpreter. As on master the DeviceAct and DeviceParser are managed by
the Interpreter but not DeviceCI. This is problematic because :
IncrementalParser holds a Sema& which points into the DeviceCI. On
master, DeviceCI is destroyed before the base class ~IncrementalParser()
runs, causing Parser::reset() to access a dangling Sema (and as Sema
holds a reference to Preprocessor which owns PragmaNamespace) we see
this
```
  * frame #0: 0x00007fffee41cfe3 libclangCppInterOp.so.21.0gitclang::PragmaNamespace::~PragmaNamespace() + 99
    frame #1: 0x00007fffee435666 libclangCppInterOp.so.21.0gitclang::Preprocessor::~Preprocessor() + 3830
    
```
stumpOS pushed a commit that referenced this pull request May 6, 2025
Fix for:
`Assertion failed: (false && "Architecture or OS not supported"),
function CreateRegisterContextForFrame, file
/usr/src/contrib/llvm-project/lldb/source/Plugins/Process/elf-core/ThreadElfCore.cpp,
line 182.
PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and
include the crash backtrace.
#0 0x000000080cd857c8 llvm::sys::PrintStackTrace(llvm::raw_ostream&,
int)
/usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:13
#1 0x000000080cd85ed4
/usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:797:3
#2 0x000000080cd82ae8 llvm::sys::RunSignalHandlers()
/usr/src/contrib/llvm-project/llvm/lib/Support/Signals.cpp:104:5
#3 0x000000080cd861f0 SignalHandler
/usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:403:3 llvm#4
0x000000080f159644 handle_signal
/usr/src/lib/libthr/thread/thr_sig.c:298:3
`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.