forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 0
[wip] support bf16 #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
stumpOS
wants to merge
245
commits into
main
Choose a base branch
from
stumpos/bf16Fix
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…#136676) It claimed to return an `io.StringIO` or an `io.BytesIO`, but it did in fact return `str` or `bytes`.
…lvm#136762) Any kill flags that were present for the old register are not valid for the replacement and the replacement may have extended the live range of the replacement register.
This is a follow-up of 13aac46. This commit adjusts the implementation of `hasBooleanRepresentation` to be somewhat aligned to `hasIntegerRepresentation`. In particular vector of booleans should be handled in `hasBooleanRepresentation`, while `_Atomic(bool)` should not.
…_MATH_SKIP_ACCURATE_PASS is set. (llvm#130968)
…136779) The setter is only used when changing the setting programmatically. When using the settings command, we need to monitor SetPropertyValue.
Mirrors incubator changes from llvm/clangir#1582
…vm#102731) DAG combiner already does this transformation, but in some cases it does not have a chance because either CodeGenPrepare or SelectionDAGBuilder move icmp to a different basic block. https://alive2.llvm.org/ce/z/ARzh99 Fixes llvm#94829 Pull Request: llvm#102731
…llvm#136714) Otherwise, add the missing diagnostic.
Andes N45/NX45 are 32/64bit in-order dual-issue 8-stage pipeline CPU architecture implementing the RV[32|64]IMAFDC_Zba_Zbb_Zbs ISA extensions. They are developed by Andes Technology https://www.andestech.com, a RISC-V IP provider. The overviews for N45/NX45: https://www.andestech.com/en/products-solutions/andescore-processors/riscv-n45/ https://www.andestech.com/en/products-solutions/andescore-processors/riscv-nx45/ Scheduling model will be implemented in a later PR.
At the moment, the `CHECK-SAME` lines generated by
"generate-test-checks.py" (i.e. check-lines that correspond to the
preceeding `CHECK-LABEL` line) are indented to match the label length.
For example,
```mlir
func.func @batch_reduce_matmul_bcast_k_to_fill_missing_dims_A(%arg0: memref<5xf32>, %arg1: memref<2x5x7xf32>, %arg2: memref<3x7xf32>) {
linalg.batch_reduce_matmul indexing_maps = (...)
}
```
will lead to the following:
```mlir
// CHECK-LABEL: func.func @batch_reduce_matmul_bcast_k_to_fill_missing_dims_A(
// CHECK-SAME: %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<5xf32>,
// CHECK-SAME: %[[VAL_1:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<2x5x7xf32>,
// CHECK-SAME: %[[VAL_2:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<3x7xf32>) {
// CHECK: linalg.batch_reduce_matmul indexing_maps = (...)
```
This indentation is unnecasarilly deep. With this change, for labales
that are longer than 20 chars, the indentation is trimmed to 4 spaces:
```mlir
// CHECK-LABEL: func.func @batch_reduce_matmul_bcast_k_to_fill_missing_dims_A(
// CHECK-SAME: %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<5xf32>,
// CHECK-SAME: %[[VAL_1:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<2x5x7xf32>,
// CHECK-SAME: %[[VAL_2:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<3x7xf32>) {
// CHECK: linalg.batch_reduce_matmul indexing_maps = (...)
```
* Only show for blocks 10 lines or taller (including braces) * Add parens for function call: "// if foo" -> "// if foo()" or "// if foo(...)" * Print literal nullptr * Escaping for abbreviated strings Fixes clangd/clangd#1807. Based on the original PR at llvm#72345. Co-authored-by: daiyousei-qz <[email protected]>
…5596) InstructionCost is already an optional value, containing an Invalid state that can be checked with isValid(). There is little point in returning another optional from getValue(). Most uses do not make use of it being a std::optional, dereferencing the value directly (either isValid has been checked previously or the Cost is assumed to be valid). The one case that does in AMDGPU used value_or which has been replaced by a isValid() check.
GCC on Cygwin and MSYS2 are built with --enable-__cxa_atexit. Adjust test to expect this change.
)" This reverts commit 8fc8a84, which caused a regression. Fixes llvm#136675.
Fix for port of e112dcc.
DAGCombiner::hoistLogicOpWithSameOpcodeHands will hoist
(or disjoint (ext a), (ext b)) -> (ext (or disjoint a, b))
So this adds patterns to match vwadd[u].v{v,x} in this case.
We have to teach the combine to preserve the disjoint flag.
Fix a reference to getValue() being optional in InlineSizeEstimatorAnalysis, a file that is not included in the default build. A "warning: enumerated and non-enumerated type in conditional expression" warning is fixed in AMDGPU too.
…ules Doc (llvm#136719) "Dependant BMI" / "Dependent BMI" was used incorrectly in the documentation: "Dependent BMI" refers to a BMI that depends on the current TU, but it was used for the BMI that current TU depends on. I replaced all the mentions with "BMI dependency".
…put operands. (llvm#135961) It looks like this code is only considering buildvector inputs, expecting the inputs to have at least 16 operands. This adds a check to make sure that is true. Fixes llvm#135950
After upgrading the default code model from small to medium on LoongArch, function calls using expression may fail. This is because the function call instruction has changed from `bl` to `pcalau18i + jirl`, but `RuntimeDyld` does not handle out-of-range jumps for this instruction sequence. This patch fixes: llvm#136561 Reviewed By: SixWeining Pull Request: llvm#136563
Dear developer: I have recently working with LLVM IR and I want to isolate basic blocks using the command "llvm-extract". However, I found that the command option "llvm-extract --bb func_name:bb_name" will only function when dumping source code into IRs with options "-fno-discard-value-names". That is to say, the "llvm-extract" command cannot support unnamed basic blocks, which is a default output of the compiler. So, I made these changes and hope they will make LLVM better. Best regards, Co-authored-by: Yilin Li <[email protected]>
…136856) In order for precompiled headers to work with ccache, a specific flag needs to be passed to the compiler and ccache's sloppiness configuration option needs to be set appropriately. Due to issues with configuring CMake on certain Windows platforms, set the required ccache option only on non-Windows systems for the time being. ----- Signed-off-by: Kajetan Puchalski <[email protected]>
… isStore and a memory VT. (llvm#137080) This removes the need to explicitly set isTruncStore on truncstorei8 and other similar PatFrags that include truncstore in their frags DAG. This allows some new patterns to be imported for AMDGPU as you can see in the changed test. The extra isTruncStore were added in ae2b36e, along with some other tablegen changes to look for MemoryVT along with isTruncStore. I did not remove the code, because I'm not sure if any out of tree users have become dependent on it. It's no longer exercised in tree.
llvm#136363) These were added to the migration from v4 to v5 and should be removed now that the default has changed.
Static analysis flagged this code b/c we are copying the temp variable back in when we could move it instead.
…vm#136733) We're duplicating uses here, so we need to freeze the inputs. --------- Co-authored-by: Luke Lau <[email protected]>
Add some intrinsics and LIT tests for PPC dmr insert/extract instructions.
…ignExtLoad/isZeroExtLoad for IsAtomic in SelectionDAG. (llvm#137096) Support isAnyExtLoad() for IsAtomic in GISel. Modify atomic_load_az* to check for extload or zextload. And rename to atomic_load_azext* Add atomic_load_asext* and use in RISC-V. I used "asext" rather than "as" so it wouldn't be confused with the word "as".
…vm#130781) MASM supports some built-in macro-type functions. We start our support for these with `@CatStr`, one of the more commonly used.
…lvm#135074) This PR is a second attempt for issue llvm#111743 to finish reverted PR llvm#113925. Added option "--unify-instantiations" to llvm-cov export to combine branch execution counts of C++ template instantiations. Fix non-deterministic behavior.
PR llvm#131756 introduced a patch to fix a deadlock between LSan and ASan. The relevant deadlock only occurs when LSan is enabled and `dl_iterate_phdr` is used for Stop-the-World, i.e., under the condition `CAN_SANITIZE_LEAKS && (SANITIZER_LINUX || SANITIZER_NETBSD)`. Therefore, this commit also sets the effective condition of this patch to the above condition, avoiding unnecessary problems in other environments, e.g., stack overflow on MSVC/Windows.
llvm#136747) - It was determined to define the parsing methods much more inline with a recursive descent parser to follow the EBNF notation better - As part of this change, we decided to go with a calling convention to the parse.* methods of returning an optional rather than a bool and a reference to the parsed struct This is a clean-up task from llvm#133800
…on (llvm#137073) When an asynchronous allocation is made, we call `cudaMallocAsync` with a stream. For deallocation, we need to call `cudaFreeAsync` with the same stream. in order to achieve that, we need to track the allocation and their respective stream. This patch adds a simple sorted array of asynchronous allocations. A binary search is performed to retrieve the allocation when deallocation is needed.
Avoid baking in absolute paths in check lines generated for DIFile metadata. Generated test checks cannot be sensitive to absolute paths anyway, as those vary with the environment, but there could be situations where some sensitivity to partial paths is required for certain tests. This implementation just assumes such tests aren't worth the effort to support, but it could be supported in the future. This is most useful for update_cc_test_checks with debug info enabled, where the test writer cannot manipulate the paths within the generated IR directly.
…3231) Below are two examples of "narrow" `vector.stores`. The first example does not require partial stores and hence no RMW stores. This is currently emulated correctly. ```mlir func.func @example_1(%arg0: vector<4xi2>) { %0 = memref.alloc() : memref<13xi2> %c4 = arith.constant 4 : index vector.store %arg0, %0[%c4] : memref<13xi2>, vector<4xi2> return } ``` The second example requires a partial (and hence RMW) store due to the offset pointing outside the emulated type boundary (`%c3`). ```mlir func.func @example_2(%arg0: vector<4xi2>) { %0 = memref.alloc() : memref<13xi2> %c3 = arith.constant 3 : index vector.store %arg0, %0[%c3] : memref<13xi2>, vector<4xi2> return } ``` This is currently incorrectly emulated as a single "full" store (note that the offset is incorrect) instead of partial stores: ```mlir func.func @example_2(%arg0: vector<4xi2>) { %alloc = memref.alloc() : memref<4xi8> %0 = vector.bitcast %arg0 : vector<4xi2> to vector<1xi8> %c0 = arith.constant 0 : index vector.store %0, %alloc[%c0] : memref<4xi8>, vector<1xi8> return } ``` The incorrect emulation stems from this simplified (i.e. incomplete) calculation of the front padding: ```cpp std::optional<int64_t> foldedNumFrontPadElems = isDivisibleInSize ? 0 : getConstantIntValue(linearizedInfo.intraDataOffset); ``` Since `isDivisibleInSize` is `true` (i8 / i2 = 4): * front padding is set to `0` and, as a result, * the input offset (`%c3`) is ignored, and * we incorrectly assume that partial stores won't be needed. Note that in both examples we are storing `vector<4xi2>` into `memref<13xi2>` (note _different_ trailing dims) and hence partial stores might in fact be required. The condition above is updated to: ```cpp std::optional<int64_t> foldedNumFrontPadElems = (isDivisibleInSize && trailingDimsMatch) ? 0 : getConstantIntValue(linearizedInfo.intraDataOffset); ``` This change ensures that the input offset is properly taken into account, which fixes the issue. It doesn't affect `@example1`. Additional comments are added to clarify the current logic.
Fixes parsing of an ObjC type encoding such as `{?="a""b"}`. Parsing of such a type
encoding would lead to an assert. This was observed when running `language objc
class-table dump`.
The function `ReadQuotedString` consumes the closing quote, however one of its two
callers (`ReadStructElement`) was also consuming a quote. For the above type encoding,
where two quoted strings occur back to back, the parser would unintentionally consume
the opening quote of the second quoted string - leaving the remaining text with an
unbalanced quote.
This changes fixes `ReadStructElement` to not consume a quote after calling
`ReadQuotedString`.
For callers to know whether a string was successfully parsed, `ReadQuotedString` now
returns an optional string.
This PR makes another piece of the `CompilerInstance::cloneForModuleCompile()` result thread-safe: the module build stack. This data structure is used to detect cyclic dependencies between modules. The problem is that it uses `FullSourceLoc` which refers to the `SourceManager` of the parent `CompilerInstance`: if two threads happen to execute `CompilerInstance`s cloned from the same parent concurrently, and both discover a dependency cycle, they may concurrently access the parent `SourceManager` when emitting the diagnostic, creating a data race. In this PR, we prevent this by keeping the stack empty and moving the responsibility of cycle detection to the client. The client can recreate the same module build stack externally and ensure thread-safety by enforcing mutual exclusion.
…_TAG_lexical_blocks (llvm#136205) During the discussion under llvm#119001, it was noticed that concrete DW_TAG_lexical_blocks should refer to corresponding abstract DW_TAG_lexical_blocks by having DW_AT_abstract_origin, to avoid ambiguity. This behavior is implemented in GCC (https://godbolt.org/z/Khrzdq1Wx), but not in LLVM. Fixes llvm#49297.
…llvm#136413) Make sure the builtin header sqrts work with -fno-hip-f32-correctly-rounded-divide-sqrt, and we end up with properly annotated sqrt intrinsic callsites.
1a9f4c5 to
b85acb2
Compare
stumpOS
pushed a commit
that referenced
this pull request
May 6, 2025
`clang-repl --cuda` was previously crashing with a segmentation fault, instead of reporting a clean error ``` (base) anutosh491@Anutoshs-MacBook-Air bin % ./clang-repl --cuda #0 0x0000000111da4fbc llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/local/libexec/llvm-20/lib/libLLVM.dylib+0x150fbc) #1 0x0000000111da31dc llvm::sys::RunSignalHandlers() (/opt/local/libexec/llvm-20/lib/libLLVM.dylib+0x14f1dc) #2 0x0000000111da5628 SignalHandler(int) (/opt/local/libexec/llvm-20/lib/libLLVM.dylib+0x151628) #3 0x000000019b242de4 (/usr/lib/system/libsystem_platform.dylib+0x180482de4) llvm#4 0x0000000107f638d0 clang::IncrementalCUDADeviceParser::IncrementalCUDADeviceParser(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, clang::CompilerInstance&, llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem>, llvm::Error&, std::__1::list<clang::PartialTranslationUnit, std::__1::allocator<clang::PartialTranslationUnit>> const&) (/opt/local/libexec/llvm-20/lib/libclang-cpp.dylib+0x216b8d0) llvm#5 0x0000000107f638d0 clang::IncrementalCUDADeviceParser::IncrementalCUDADeviceParser(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, clang::CompilerInstance&, llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem>, llvm::Error&, std::__1::list<clang::PartialTranslationUnit, std::__1::allocator<clang::PartialTranslationUnit>> const&) (/opt/local/libexec/llvm-20/lib/libclang-cpp.dylib+0x216b8d0) llvm#6 0x0000000107f6bac8 clang::Interpreter::createWithCUDA(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>) (/opt/local/libexec/llvm-20/lib/libclang-cpp.dylib+0x2173ac8) llvm#7 0x000000010206f8a8 main (/opt/local/libexec/llvm-20/bin/clang-repl+0x1000038a8) llvm#8 0x000000019ae8c274 Segmentation fault: 11 ``` The underlying issue was that the `DeviceCompilerInstance` (used for device-side CUDA compilation) was never initialized with a `Sema`, which is required before constructing the `IncrementalCUDADeviceParser`. https://github.com/llvm/llvm-project/blob/89687e6f383b742a3c6542dc673a84d9f82d02de/clang/lib/Interpreter/DeviceOffload.cpp#L32 https://github.com/llvm/llvm-project/blob/89687e6f383b742a3c6542dc673a84d9f82d02de/clang/lib/Interpreter/IncrementalParser.cpp#L31 Unlike the host-side `CompilerInstance` which runs `ExecuteAction` inside the Interpreter constructor (thereby setting up Sema), the device-side CI was passed into the parser uninitialized, leading to an assertion or crash when accessing its internals. To fix this, I refactored the `Interpreter::create` method to include an optional `DeviceCI` parameter. If provided, we know we need to take care of this instance too. Only then do we construct the `IncrementalCUDADeviceParser`.
stumpOS
pushed a commit
that referenced
this pull request
May 6, 2025
llvm#138091) Check this error for more context (https://github.com/compiler-research/CppInterOp/actions/runs/14749797085/job/41407625681?pr=491#step:10:531) This fails with ``` * thread #1, name = 'CppInterOpTests', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x55500356d6d3) * frame #0: 0x00007fffee41cfe3 libclangCppInterOp.so.21.0gitclang::PragmaNamespace::~PragmaNamespace() + 99 frame #1: 0x00007fffee435666 libclangCppInterOp.so.21.0gitclang::Preprocessor::~Preprocessor() + 3830 frame #2: 0x00007fffee20917a libclangCppInterOp.so.21.0gitstd::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 58 frame #3: 0x00007fffee224796 libclangCppInterOp.so.21.0gitclang::CompilerInstance::~CompilerInstance() + 838 frame llvm#4: 0x00007fffee22494d libclangCppInterOp.so.21.0gitclang::CompilerInstance::~CompilerInstance() + 13 frame llvm#5: 0x00007fffed95ec62 libclangCppInterOp.so.21.0gitclang::IncrementalCUDADeviceParser::~IncrementalCUDADeviceParser() + 98 frame llvm#6: 0x00007fffed9551b6 libclangCppInterOp.so.21.0gitclang::Interpreter::~Interpreter() + 102 frame llvm#7: 0x00007fffed95598d libclangCppInterOp.so.21.0gitclang::Interpreter::~Interpreter() + 13 frame llvm#8: 0x00007fffed9181e7 libclangCppInterOp.so.21.0gitcompat::createClangInterpreter(std::vector<char const*, std::allocator<char const*>>&) + 2919 ``` Problem : 1) The destructor currently handles no clearance for the DeviceParser and the DeviceAct. We currently only have this https://github.com/llvm/llvm-project/blob/976493822443c52a71ed3c67aaca9a555b20c55d/clang/lib/Interpreter/Interpreter.cpp#L416-L419 2) The ownership for DeviceCI currently is present in IncrementalCudaDeviceParser. But this should be similar to how the combination for hostCI, hostAction and hostParser are managed by the Interpreter. As on master the DeviceAct and DeviceParser are managed by the Interpreter but not DeviceCI. This is problematic because : IncrementalParser holds a Sema& which points into the DeviceCI. On master, DeviceCI is destroyed before the base class ~IncrementalParser() runs, causing Parser::reset() to access a dangling Sema (and as Sema holds a reference to Preprocessor which owns PragmaNamespace) we see this ``` * frame #0: 0x00007fffee41cfe3 libclangCppInterOp.so.21.0gitclang::PragmaNamespace::~PragmaNamespace() + 99 frame #1: 0x00007fffee435666 libclangCppInterOp.so.21.0gitclang::Preprocessor::~Preprocessor() + 3830 ```
stumpOS
pushed a commit
that referenced
this pull request
May 6, 2025
Fix for: `Assertion failed: (false && "Architecture or OS not supported"), function CreateRegisterContextForFrame, file /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/elf-core/ThreadElfCore.cpp, line 182. PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace. #0 0x000000080cd857c8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:13 #1 0x000000080cd85ed4 /usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:797:3 #2 0x000000080cd82ae8 llvm::sys::RunSignalHandlers() /usr/src/contrib/llvm-project/llvm/lib/Support/Signals.cpp:104:5 #3 0x000000080cd861f0 SignalHandler /usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:403:3 llvm#4 0x000000080f159644 handle_signal /usr/src/lib/libthr/thread/thr_sig.c:298:3 `
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.