[wip] support bf16 #3

stumpOS · 2025-04-23T18:06:48Z

No description provided.

…#136676) It claimed to return an `io.StringIO` or an `io.BytesIO`, but it did in fact return `str` or `bytes`.

…lvm#136762) Any kill flags that were present for the old register are not valid for the replacement and the replacement may have extended the live range of the replacement register.

) Fix llvm#135959

Fix llvm#61612

…6545) Fix llvm#136541

)

This is a follow-up of 13aac46. This commit adjusts the implementation of `hasBooleanRepresentation` to be somewhat aligned to `hasIntegerRepresentation`. In particular vector of booleans should be handled in `hasBooleanRepresentation`, while `_Atomic(bool)` should not.

…_MATH_SKIP_ACCURATE_PASS is set. (llvm#130968)

…136779) The setter is only used when changing the setting programmatically. When using the settings command, we need to monitor SetPropertyValue.

Mirrors incubator changes from llvm/clangir#1582

…vm#102731) DAG combiner already does this transformation, but in some cases it does not have a chance because either CodeGenPrepare or SelectionDAGBuilder move icmp to a different basic block. https://alive2.llvm.org/ce/z/ARzh99 Fixes llvm#94829 Pull Request: llvm#102731

…llvm#136714) Otherwise, add the missing diagnostic.

Andes N45/NX45 are 32/64bit in-order dual-issue 8-stage pipeline CPU architecture implementing the RV[32|64]IMAFDC_Zba_Zbb_Zbs ISA extensions. They are developed by Andes Technology https://www.andestech.com, a RISC-V IP provider. The overviews for N45/NX45: https://www.andestech.com/en/products-solutions/andescore-processors/riscv-n45/ https://www.andestech.com/en/products-solutions/andescore-processors/riscv-nx45/ Scheduling model will be implemented in a later PR.

At the moment, the `CHECK-SAME` lines generated by "generate-test-checks.py" (i.e. check-lines that correspond to the preceeding `CHECK-LABEL` line) are indented to match the label length. For example, ```mlir func.func @batch_reduce_matmul_bcast_k_to_fill_missing_dims_A(%arg0: memref<5xf32>, %arg1: memref<2x5x7xf32>, %arg2: memref<3x7xf32>) { linalg.batch_reduce_matmul indexing_maps = (...) } ``` will lead to the following: ```mlir // CHECK-LABEL: func.func @batch_reduce_matmul_bcast_k_to_fill_missing_dims_A( // CHECK-SAME: %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<5xf32>, // CHECK-SAME: %[[VAL_1:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<2x5x7xf32>, // CHECK-SAME: %[[VAL_2:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<3x7xf32>) { // CHECK: linalg.batch_reduce_matmul indexing_maps = (...) ``` This indentation is unnecasarilly deep. With this change, for labales that are longer than 20 chars, the indentation is trimmed to 4 spaces: ```mlir // CHECK-LABEL: func.func @batch_reduce_matmul_bcast_k_to_fill_missing_dims_A( // CHECK-SAME: %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<5xf32>, // CHECK-SAME: %[[VAL_1:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<2x5x7xf32>, // CHECK-SAME: %[[VAL_2:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: memref<3x7xf32>) { // CHECK: linalg.batch_reduce_matmul indexing_maps = (...) ```

* Only show for blocks 10 lines or taller (including braces) * Add parens for function call: "// if foo" -> "// if foo()" or "// if foo(...)" * Print literal nullptr * Escaping for abbreviated strings Fixes clangd/clangd#1807. Based on the original PR at llvm#72345. Co-authored-by: daiyousei-qz <[email protected]>

…5596) InstructionCost is already an optional value, containing an Invalid state that can be checked with isValid(). There is little point in returning another optional from getValue(). Most uses do not make use of it being a std::optional, dereferencing the value directly (either isValid has been checked previously or the Cost is assumed to be valid). The one case that does in AMDGPU used value_or which has been replaced by a isValid() check.

GCC on Cygwin and MSYS2 are built with --enable-__cxa_atexit. Adjust test to expect this change.

)" This reverts commit 8fc8a84, which caused a regression. Fixes llvm#136675.

Fix for port of e112dcc.

DAGCombiner::hoistLogicOpWithSameOpcodeHands will hoist (or disjoint (ext a), (ext b)) -> (ext (or disjoint a, b)) So this adds patterns to match vwadd[u].v{v,x} in this case. We have to teach the combine to preserve the disjoint flag.

Fix a reference to getValue() being optional in InlineSizeEstimatorAnalysis, a file that is not included in the default build. A "warning: enumerated and non-enumerated type in conditional expression" warning is fixed in AMDGPU too.

…ules Doc (llvm#136719) "Dependant BMI" / "Dependent BMI" was used incorrectly in the documentation: "Dependent BMI" refers to a BMI that depends on the current TU, but it was used for the BMI that current TU depends on. I replaced all the mentions with "BMI dependency".

…put operands. (llvm#135961) It looks like this code is only considering buildvector inputs, expecting the inputs to have at least 16 operands. This adds a check to make sure that is true. Fixes llvm#135950

After upgrading the default code model from small to medium on LoongArch, function calls using expression may fail. This is because the function call instruction has changed from `bl` to `pcalau18i + jirl`, but `RuntimeDyld` does not handle out-of-range jumps for this instruction sequence. This patch fixes: llvm#136561 Reviewed By: SixWeining Pull Request: llvm#136563

Dear developer: I have recently working with LLVM IR and I want to isolate basic blocks using the command "llvm-extract". However, I found that the command option "llvm-extract --bb func_name:bb_name" will only function when dumping source code into IRs with options "-fno-discard-value-names". That is to say, the "llvm-extract" command cannot support unnamed basic blocks, which is a default output of the compiler. So, I made these changes and hope they will make LLVM better. Best regards, Co-authored-by: Yilin Li <[email protected]>

…136856) In order for precompiled headers to work with ccache, a specific flag needs to be passed to the compiler and ccache's sloppiness configuration option needs to be set appropriately. Due to issues with configuring CMake on certain Windows platforms, set the required ccache option only on non-Windows systems for the time being. ----- Signed-off-by: Kajetan Puchalski <[email protected]>

… isStore and a memory VT. (llvm#137080) This removes the need to explicitly set isTruncStore on truncstorei8 and other similar PatFrags that include truncstore in their frags DAG. This allows some new patterns to be imported for AMDGPU as you can see in the changed test. The extra isTruncStore were added in ae2b36e, along with some other tablegen changes to look for MemoryVT along with isTruncStore. I did not remove the code, because I'm not sure if any out of tree users have become dependent on it. It's no longer exercised in tree.

llvm#136363) These were added to the migration from v4 to v5 and should be removed now that the default has changed.

Static analysis flagged this code b/c we are copying the temp variable back in when we could move it instead.

…vm#136733) We're duplicating uses here, so we need to freeze the inputs. --------- Co-authored-by: Luke Lau <[email protected]>

Add some intrinsics and LIT tests for PPC dmr insert/extract instructions.

…ignExtLoad/isZeroExtLoad for IsAtomic in SelectionDAG. (llvm#137096) Support isAnyExtLoad() for IsAtomic in GISel. Modify atomic_load_az* to check for extload or zextload. And rename to atomic_load_azext* Add atomic_load_asext* and use in RISC-V. I used "asext" rather than "as" so it wouldn't be confused with the word "as".

) Handle the case where the base expression is a pointer to a vector type. rdar://149223362

…vm#130781) MASM supports some built-in macro-type functions. We start our support for these with `@CatStr`, one of the more commonly used.

…lvm#135074) This PR is a second attempt for issue llvm#111743 to finish reverted PR llvm#113925. Added option "--unify-instantiations" to llvm-cov export to combine branch execution counts of C++ template instantiations. Fix non-deterministic behavior.

PR llvm#131756 introduced a patch to fix a deadlock between LSan and ASan. The relevant deadlock only occurs when LSan is enabled and `dl_iterate_phdr` is used for Stop-the-World, i.e., under the condition `CAN_SANITIZE_LEAKS && (SANITIZER_LINUX || SANITIZER_NETBSD)`. Therefore, this commit also sets the effective condition of this patch to the above condition, avoiding unnecessary problems in other environments, e.g., stack overflow on MSVC/Windows.

llvm#136747) - It was determined to define the parsing methods much more inline with a recursive descent parser to follow the EBNF notation better - As part of this change, we decided to go with a calling convention to the parse.* methods of returning an optional rather than a bool and a reference to the parsed struct This is a clean-up task from llvm#133800

…on (llvm#137073) When an asynchronous allocation is made, we call `cudaMallocAsync` with a stream. For deallocation, we need to call `cudaFreeAsync` with the same stream. in order to achieve that, we need to track the allocation and their respective stream. This patch adds a simple sorted array of asynchronous allocations. A binary search is performed to retrieve the allocation when deallocation is needed.

Avoid baking in absolute paths in check lines generated for DIFile metadata. Generated test checks cannot be sensitive to absolute paths anyway, as those vary with the environment, but there could be situations where some sensitivity to partial paths is required for certain tests. This implementation just assumes such tests aren't worth the effort to support, but it could be supported in the future. This is most useful for update_cc_test_checks with debug info enabled, where the test writer cannot manipulate the paths within the generated IR directly.

…3231) Below are two examples of "narrow" `vector.stores`. The first example does not require partial stores and hence no RMW stores. This is currently emulated correctly. ```mlir func.func @example_1(%arg0: vector<4xi2>) { %0 = memref.alloc() : memref<13xi2> %c4 = arith.constant 4 : index vector.store %arg0, %0[%c4] : memref<13xi2>, vector<4xi2> return } ``` The second example requires a partial (and hence RMW) store due to the offset pointing outside the emulated type boundary (`%c3`). ```mlir func.func @example_2(%arg0: vector<4xi2>) { %0 = memref.alloc() : memref<13xi2> %c3 = arith.constant 3 : index vector.store %arg0, %0[%c3] : memref<13xi2>, vector<4xi2> return } ``` This is currently incorrectly emulated as a single "full" store (note that the offset is incorrect) instead of partial stores: ```mlir func.func @example_2(%arg0: vector<4xi2>) { %alloc = memref.alloc() : memref<4xi8> %0 = vector.bitcast %arg0 : vector<4xi2> to vector<1xi8> %c0 = arith.constant 0 : index vector.store %0, %alloc[%c0] : memref<4xi8>, vector<1xi8> return } ``` The incorrect emulation stems from this simplified (i.e. incomplete) calculation of the front padding: ```cpp std::optional<int64_t> foldedNumFrontPadElems = isDivisibleInSize ? 0 : getConstantIntValue(linearizedInfo.intraDataOffset); ``` Since `isDivisibleInSize` is `true` (i8 / i2 = 4): * front padding is set to `0` and, as a result, * the input offset (`%c3`) is ignored, and * we incorrectly assume that partial stores won't be needed. Note that in both examples we are storing `vector<4xi2>` into `memref<13xi2>` (note _different_ trailing dims) and hence partial stores might in fact be required. The condition above is updated to: ```cpp std::optional<int64_t> foldedNumFrontPadElems = (isDivisibleInSize && trailingDimsMatch) ? 0 : getConstantIntValue(linearizedInfo.intraDataOffset); ``` This change ensures that the input offset is properly taken into account, which fixes the issue. It doesn't affect `@example1`. Additional comments are added to clarify the current logic.

Fixes parsing of an ObjC type encoding such as `{?="a""b"}`. Parsing of such a type encoding would lead to an assert. This was observed when running `language objc class-table dump`. The function `ReadQuotedString` consumes the closing quote, however one of its two callers (`ReadStructElement`) was also consuming a quote. For the above type encoding, where two quoted strings occur back to back, the parser would unintentionally consume the opening quote of the second quoted string - leaving the remaining text with an unbalanced quote. This changes fixes `ReadStructElement` to not consume a quote after calling `ReadQuotedString`. For callers to know whether a string was successfully parsed, `ReadQuotedString` now returns an optional string.

This PR makes another piece of the `CompilerInstance::cloneForModuleCompile()` result thread-safe: the module build stack. This data structure is used to detect cyclic dependencies between modules. The problem is that it uses `FullSourceLoc` which refers to the `SourceManager` of the parent `CompilerInstance`: if two threads happen to execute `CompilerInstance`s cloned from the same parent concurrently, and both discover a dependency cycle, they may concurrently access the parent `SourceManager` when emitting the diagnostic, creating a data race. In this PR, we prevent this by keeping the stack empty and moving the responsibility of cycle detection to the client. The client can recreate the same module build stack externally and ensure thread-safety by enforcing mutual exclusion.

…_TAG_lexical_blocks (llvm#136205) During the discussion under llvm#119001, it was noticed that concrete DW_TAG_lexical_blocks should refer to corresponding abstract DW_TAG_lexical_blocks by having DW_AT_abstract_origin, to avoid ambiguity. This behavior is implemented in GCC (https://godbolt.org/z/Khrzdq1Wx), but not in LLVM. Fixes llvm#49297.

…llvm#136413) Make sure the builtin header sqrts work with -fno-hip-f32-correctly-rounded-divide-sqrt, and we end up with properly annotated sqrt intrinsic callsites.

`clang-repl --cuda` was previously crashing with a segmentation fault, instead of reporting a clean error ``` (base) anutosh491@Anutoshs-MacBook-Air bin % ./clang-repl --cuda #0 0x0000000111da4fbc llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/local/libexec/llvm-20/lib/libLLVM.dylib+0x150fbc) #1 0x0000000111da31dc llvm::sys::RunSignalHandlers() (/opt/local/libexec/llvm-20/lib/libLLVM.dylib+0x14f1dc) #2 0x0000000111da5628 SignalHandler(int) (/opt/local/libexec/llvm-20/lib/libLLVM.dylib+0x151628) #3 0x000000019b242de4 (/usr/lib/system/libsystem_platform.dylib+0x180482de4) llvm#4 0x0000000107f638d0 clang::IncrementalCUDADeviceParser::IncrementalCUDADeviceParser(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, clang::CompilerInstance&, llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem>, llvm::Error&, std::__1::list<clang::PartialTranslationUnit, std::__1::allocator<clang::PartialTranslationUnit>> const&) (/opt/local/libexec/llvm-20/lib/libclang-cpp.dylib+0x216b8d0) llvm#5 0x0000000107f638d0 clang::IncrementalCUDADeviceParser::IncrementalCUDADeviceParser(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, clang::CompilerInstance&, llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem>, llvm::Error&, std::__1::list<clang::PartialTranslationUnit, std::__1::allocator<clang::PartialTranslationUnit>> const&) (/opt/local/libexec/llvm-20/lib/libclang-cpp.dylib+0x216b8d0) llvm#6 0x0000000107f6bac8 clang::Interpreter::createWithCUDA(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>) (/opt/local/libexec/llvm-20/lib/libclang-cpp.dylib+0x2173ac8) llvm#7 0x000000010206f8a8 main (/opt/local/libexec/llvm-20/bin/clang-repl+0x1000038a8) llvm#8 0x000000019ae8c274 Segmentation fault: 11 ``` The underlying issue was that the `DeviceCompilerInstance` (used for device-side CUDA compilation) was never initialized with a `Sema`, which is required before constructing the `IncrementalCUDADeviceParser`. https://github.com/llvm/llvm-project/blob/89687e6f383b742a3c6542dc673a84d9f82d02de/clang/lib/Interpreter/DeviceOffload.cpp#L32 https://github.com/llvm/llvm-project/blob/89687e6f383b742a3c6542dc673a84d9f82d02de/clang/lib/Interpreter/IncrementalParser.cpp#L31 Unlike the host-side `CompilerInstance` which runs `ExecuteAction` inside the Interpreter constructor (thereby setting up Sema), the device-side CI was passed into the parser uninitialized, leading to an assertion or crash when accessing its internals. To fix this, I refactored the `Interpreter::create` method to include an optional `DeviceCI` parameter. If provided, we know we need to take care of this instance too. Only then do we construct the `IncrementalCUDADeviceParser`.

llvm#138091) Check this error for more context (https://github.com/compiler-research/CppInterOp/actions/runs/14749797085/job/41407625681?pr=491#step:10:531) This fails with ``` * thread #1, name = 'CppInterOpTests', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x55500356d6d3) * frame #0: 0x00007fffee41cfe3 libclangCppInterOp.so.21.0gitclang::PragmaNamespace::~PragmaNamespace() + 99 frame #1: 0x00007fffee435666 libclangCppInterOp.so.21.0gitclang::Preprocessor::~Preprocessor() + 3830 frame #2: 0x00007fffee20917a libclangCppInterOp.so.21.0gitstd::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 58 frame #3: 0x00007fffee224796 libclangCppInterOp.so.21.0gitclang::CompilerInstance::~CompilerInstance() + 838 frame llvm#4: 0x00007fffee22494d libclangCppInterOp.so.21.0gitclang::CompilerInstance::~CompilerInstance() + 13 frame llvm#5: 0x00007fffed95ec62 libclangCppInterOp.so.21.0gitclang::IncrementalCUDADeviceParser::~IncrementalCUDADeviceParser() + 98 frame llvm#6: 0x00007fffed9551b6 libclangCppInterOp.so.21.0gitclang::Interpreter::~Interpreter() + 102 frame llvm#7: 0x00007fffed95598d libclangCppInterOp.so.21.0gitclang::Interpreter::~Interpreter() + 13 frame llvm#8: 0x00007fffed9181e7 libclangCppInterOp.so.21.0gitcompat::createClangInterpreter(std::vector<char const*, std::allocator<char const*>>&) + 2919 ``` Problem : 1) The destructor currently handles no clearance for the DeviceParser and the DeviceAct. We currently only have this https://github.com/llvm/llvm-project/blob/976493822443c52a71ed3c67aaca9a555b20c55d/clang/lib/Interpreter/Interpreter.cpp#L416-L419 2) The ownership for DeviceCI currently is present in IncrementalCudaDeviceParser. But this should be similar to how the combination for hostCI, hostAction and hostParser are managed by the Interpreter. As on master the DeviceAct and DeviceParser are managed by the Interpreter but not DeviceCI. This is problematic because : IncrementalParser holds a Sema& which points into the DeviceCI. On master, DeviceCI is destroyed before the base class ~IncrementalParser() runs, causing Parser::reset() to access a dangling Sema (and as Sema holds a reference to Preprocessor which owns PragmaNamespace) we see this ``` * frame #0: 0x00007fffee41cfe3 libclangCppInterOp.so.21.0gitclang::PragmaNamespace::~PragmaNamespace() + 99 frame #1: 0x00007fffee435666 libclangCppInterOp.so.21.0gitclang::Preprocessor::~Preprocessor() + 3830 ```

Fix for: `Assertion failed: (false && "Architecture or OS not supported"), function CreateRegisterContextForFrame, file /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/elf-core/ThreadElfCore.cpp, line 182. PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace. #0 0x000000080cd857c8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:13 #1 0x000000080cd85ed4 /usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:797:3 #2 0x000000080cd82ae8 llvm::sys::RunSignalHandlers() /usr/src/contrib/llvm-project/llvm/lib/Support/Signals.cpp:104:5 #3 0x000000080cd861f0 SignalHandler /usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:403:3 llvm#4 0x000000080f159644 handle_signal /usr/src/lib/libthr/thread/thr_sig.c:298:3 `

superbobry and others added 30 commits April 22, 2025 21:53

[MLIR] [python] Fixed the signature of _OperationBase.get_asm (llvm…

4e679ea

…#136676) It claimed to return an `io.StringIO` or an `io.BytesIO`, but it did in fact return `str` or `bytes`.

[RISCV] Clear kill flags after replaceRegWith in RISCVFoldMemOffset. (l…

2484060

…lvm#136762) Any kill flags that were present for the old register are not valid for the replacement and the replacement may have extended the live range of the replacement register.

gn build: Port d1cce66 more

122e515

[clang-format] Fix a bug in parsing C-style cast of lambdas (llvm#136099

4f71655

) Fix llvm#135959

[clang-format] Fix a bug in lexing C++ UDL ending in $ (llvm#136476)

9efabbb

Fix llvm#61612

[clang-format] Correctly annotate kw_operator in using decls (llvm#13…

037657d

…6545) Fix llvm#136541

[clang-format] Don't test stability if JS format test fails (llvm#136662

afc030d

)

[RISCV] Remove stale comment. NFC

68d89e9

[LoongArch] Pre-commit for widen shuffle mask (llvm#136544)

141c14c

[libc][math] Skip checking for exceptional values in expm1f when LIBC…

7547ad3

…_MATH_SKIP_ACCURATE_PASS is set. (llvm#130968)

[mlir][bazel] Port e112dcc.

439f16a

[lldb] Make sure changing the separator takes immediate effect (llvm#…

3ccfbc8

…136779) The setter is only used when changing the setting programmatically. When using the settings command, we need to monitor SetPropertyValue.

[CIR] Infer MLIRContext in attr builders when possible (llvm#136741)

7b68015

Mirrors incubator changes from llvm/clangir#1582

[clang][bytecode] Allow casts from void* only in std::allocator calls (…

1a78ef9

…llvm#136714) Otherwise, add the missing diagnostic.

[Clang] [Driver] use __cxa_atexit by default on Cygwin. (llvm#135701)

ca3a5d3

GCC on Cygwin and MSYS2 are built with --enable-__cxa_atexit. Adjust test to expect this change.

[RISCV] Add tests for fixed-length vwadd[u].{w,v}v with disjoint or. NFC

1a99f79

Revert "[clang-format] Allow breaking before kw___attribute (llvm#128623

da8f2d5

)" This reverts commit 8fc8a84, which caused a regression. Fixes llvm#136675.

[mlir][bazel] Also add SideEffectInterfaces dep to PtrDialect.

dfc60b2

Fix for port of e112dcc.

[AArch64] Fix tryToConvertShuffleOfTbl2ToTbl4 with non-buildvector in…

d0cd6f3

…put operands. (llvm#135961) It looks like this code is only considering buildvector inputs, expecting the inputs to have at least 16 operands. This adds a check to make sure that is true. Fixes llvm#135950

mrkajetanp and others added 26 commits April 24, 2025 16:09

AMDGPU: Remove amdhsa_code_object_version module flags from most tests (

4f5cfa8

llvm#136363) These were added to the migration from v4 to v5 and should be removed now that the default has changed.

[Clang][NFC] Move temp variable back into the source (llvm#137095)

72cc868

Static analysis flagged this code b/c we are copying the temp variable back in when we could move it instead.

[RISCV] Make xrivosvizip interleave2 and deinterleave2 undef safe (ll…

b278aa3

…vm#136733) We're duplicating uses here, so we need to freeze the inputs. --------- Co-authored-by: Luke Lau <[email protected]>

[PowerPC] Intrinsics and tests for dmr insert/extract (llvm#135653)

a903c7b

Add some intrinsics and LIT tests for PPC dmr insert/extract instructions.

Fix a crash in constant evaluation of ExtVectorElementExprs (llvm#136771

feaa5aa

) Handle the case where the base expression is a pointer to a vector type. rdar://149223362

[clang] fix typo in CHECK line

c7fbaba

[ms] [llvm-ml] Add support for @CatStr built-in function symbol (ll…

0ab330b

…vm#130781) MASM supports some built-in macro-type functions. We start our support for these with `@CatStr`, one of the more commonly used.

clang/HIP: Add tests that shows fpmath metadata ends up on sqrt calls (…

3e7e23d

…llvm#136413) Make sure the builtin header sqrts work with -fno-hip-f32-correctly-rounded-divide-sqrt, and we end up with properly annotated sqrt intrinsic callsites.

support bf16

c386cc0

add test

a5a3fe8

revert unintentional white space changes

d1e77e0

also guard against v2bf16

9c20a8f

format

33a1785

remove vector type from conditional

b85acb2

stumpOS force-pushed the stumpos/bf16Fix branch from 1a9f4c5 to b85acb2 Compare April 24, 2025 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wip] support bf16 #3

[wip] support bf16 #3

Uh oh!

stumpOS commented Apr 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

128 participants

[wip] support bf16 #3

Are you sure you want to change the base?

[wip] support bf16 #3

Uh oh!

Conversation

stumpOS commented Apr 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

128 participants