[NVPTX] Improve support for {ex2,lg2}.approx #120519

Prince781 · 2024-12-19T04:39:07Z

Add support for @llvm.exp2():
- LLVM: float -> PTX: ex2.approx{.ftz}.f32
- LLVM: half -> PTX: ex2.approx.f16
- LLVM: <2 x half> -> PTX: ex2.approx.f16x2
- LLVM: bfloat -> PTX: ex2.approx.ftz.bf16
- LLVM: <2 x bfloat> -> PTX: ex2.approx.ftz.bf16x2
- Any operations with non-native vector widths are expanded. On
  targets not supporting f16/bf16, values are promoted to f32.
Add CONDITIONAL support for @llvm.log2() [^1]:
- LLVM: float -> PTX: lg2.approx{.ftz}.f32
- Support for f16/bf16 is emulated by promoting values to f32.

[1]: CUDA implements exp2() with ex2.approx but log2() is
implemented differently, so this is off by default. To enable, use the
flag -nvptx-approx-log2f32.

llvmbot · 2024-12-19T04:39:41Z

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-nvptx

Author: Princeton Ferro (Prince781)

Changes

Lower llvm.exp2 to ex2.approx for f32 and all vectors of f32.

Full diff: https://github.com/llvm/llvm-project/pull/120519.diff

3 Files Affected:

(modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+2-1)
(modified) llvm/lib/Target/NVPTX/NVPTXInstrInfo.td (+15)
(added) llvm/test/CodeGen/NVPTX/fexp2.ll (+47)

diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 5c1f717694a4c7..a922ce0ae104f1 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -968,7 +968,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
   setOperationAction(ISD::CopyToReg, MVT::i128, Custom);
   setOperationAction(ISD::CopyFromReg, MVT::i128, Custom);
 
-  // No FEXP2, FLOG2.  The PTX ex2 and log2 functions are always approximate.
+  setOperationAction(ISD::FEXP2, MVT::f32, Legal);
+  // No FLOG2. The PTX log2 function is always approximate.
   // No FPOW or FREM in PTX.
 
   // Now deduce the information based on the above mentioned
diff --git a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
index abaf8e0b0ec1f8..6677a29e0d07d0 100644
--- a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+++ b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
@@ -518,6 +518,19 @@ multiclass F3_fma_component<string OpcStr, SDNode OpNode> {
                Requires<[hasBF16Math, noFMA]>;
 }
 
+// Template for operations which take one f32 operand.  Provides two
+// instructions: <OpcStr>.f32, and <OpcStr>.ftz.f32 (flush subnormal inputs and
+// results to zero).
+multiclass F1<string OpcStr, SDNode OpNode> {
+   def f32_ftz : NVPTXInst<(outs Float32Regs:$dst), (ins Float32Regs:$a),
+                           !strconcat(OpcStr, ".ftz.f32 \t$dst, $a;"),
+                           [(set Float32Regs:$dst, (OpNode Float32Regs:$a))]>,
+                           Requires<[doF32FTZ]>;
+   def f32 :     NVPTXInst<(outs Float32Regs:$dst), (ins Float32Regs:$a),
+                           !strconcat(OpcStr, ".f32 \t$dst, $a;"),
+                           [(set Float32Regs:$dst, (OpNode Float32Regs:$a))]>;
+}
+
 // Template for operations which take two f32 or f64 operands.  Provides three
 // instructions: <OpcStr>.f64, <OpcStr>.f32, and <OpcStr>.ftz.f32 (flush
 // subnormal inputs and results to zero).
@@ -1204,6 +1217,8 @@ defm FNEG_H: F2_Support_Half<"neg", fneg>;
 
 defm FSQRT : F2<"sqrt.rn", fsqrt>;
 
+defm FEXP2 : F1<"ex2.approx", fexp2>;
+
 //
 // F16 NEG
 //
diff --git a/llvm/test/CodeGen/NVPTX/fexp2.ll b/llvm/test/CodeGen/NVPTX/fexp2.ll
new file mode 100644
index 00000000000000..247629865cdd74
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/fexp2.ll
@@ -0,0 +1,47 @@
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_52 -mattr=+ptx86 | FileCheck --check-prefixes=CHECK %s
+; RUN: %if ptxas-12.6 %{ llc < %s -march=nvptx64 -mcpu=sm_52 -mattr=+ptx86 | %ptxas-verify -arch=sm_52 %}
+source_filename = "fexp2.ll"
+target datalayout = "e-p:64:64:64-p3:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-i128:128:128-f32:32:32-f64:64:64-f128:128:128-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64-a:8:8"
+target triple = "nvptx64-nvidia-cuda"
+
+; CHECK-LABEL: exp2_test
+define ptx_kernel void @exp2_test(ptr %a, ptr %res) local_unnamed_addr {
+entry:
+  %in = load float, ptr %a, align 4
+  ; CHECK: ex2.approx.f32 [[D1:%f[0-9]+]], [[S1:%f[0-9]+]]
+  %exp2 = call float @llvm.exp2.f32(float %in)
+  ; CHECK: st.global.f32 {{.*}}, [[D1]]
+  store float %exp2, ptr %res, align 4
+  ret void
+}
+
+; CHECK-LABEL: exp2_ftz_test
+define ptx_kernel void @exp2_ftz_test(ptr %a, ptr %res) local_unnamed_addr #0 {
+entry:
+  %in = load float, ptr %a, align 4
+  ; CHECK: ex2.approx.ftz.f32 [[D1:%f[0-9]+]], [[S1:%f[0-9]+]]
+  %exp2 = call float @llvm.exp2.f32(float %in)
+  ; CHECK: st.global.f32 {{.*}}, [[D1]]
+  store float %exp2, ptr %res, align 4
+  ret void
+}
+
+; CHECK-LABEL: exp2_test_v
+define ptx_kernel void @exp2_test_v(ptr %a, ptr %res) local_unnamed_addr {
+entry:
+  %in = load <4 x float>, ptr %a, align 16
+  ; CHECK: ex2.approx.f32 [[D1:%f[0-9]+]], [[S1:%f[0-9]+]]
+  ; CHECK: ex2.approx.f32 [[D2:%f[0-9]+]], [[S2:%f[0-9]+]]
+  ; CHECK: ex2.approx.f32 [[D3:%f[0-9]+]], [[S3:%f[0-9]+]]
+  ; CHECK: ex2.approx.f32 [[D4:%f[0-9]+]], [[S4:%f[0-9]+]]
+  %exp2 = call <4 x float> @llvm.exp2.v4f32(<4 x float> %in)
+  ; CHECK: st.global.v4.f32 {{.*}}, {{[{]}}[[D4]], [[D3]], [[D2]], [[D1]]{{[}]}}
+  store <4 x float> %exp2, ptr %res, align 16
+  ret void
+}
+
+declare float @llvm.exp2.f32(float %val)
+
+declare <4 x float> @llvm.exp2.v4f32(<4 x float> %val)
+
+attributes #0 = {"denormal-fp-math"="preserve-sign"}

AlexMaclean

Nice, barring some minor stylistic issues to cleanup this looks good to me. Any chance you could add the (b)f16 variants as well?

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td

llvm/test/CodeGen/NVPTX/fexp2.ll

AlexMaclean

Nice, LGTM

llvm/test/CodeGen/NVPTX/fexp2.ll

Artem-B

I'm not sure that lowering fexp2 to ex2.approx is a good idea.

At the very least it should've been conditional to some sort of fast math flag allowing reduced precision.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

Prince781 · 2024-12-19T18:59:34Z

I'm not sure that lowering fexp2 to ex2.approx is a good idea.

At the very least it should've been conditional to some sort of fast math flag allowing reduced precision.

I think it's not a bad idea since there is no non-approximate implementation in PTX, which is something users of NVPTX should know. Making the lowering only work for fast-math would break unoptimized code.

Having to use inline PTX to access exp2() is too cumbersome, especially when using vectors.

Artem-B · 2024-12-19T19:21:35Z

We have explicit flags to enable approximate reciprocal and sqrt and these instructions should follow a similar pattern.

llvm-project/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

Line 87 in 98c97d4

"nvptx-prec-divf32", cl::Hidden,

I agree that enabling them automatically for fast-math may be confusing (though it may be worth checking if we have similar situations on other platforms that could give us some guidelines on how to handle this)

Letting the user enable these instructions explicitly should work.

Letting compiler generate low-precision results will likely break things at runtime (there's a lot of existing code assuming that host/device compilations will produce nearly identical result). I'd prefer things to fail early, in a painfully obvious way if compiler can't do something correctly.

Prince781 · 2024-12-19T19:42:14Z

Okay, this feature is now behind the flags -nvptx-approx-exp2f32 and -nvptx-approx-log2f32, which are off by default.

llvm/test/CodeGen/NVPTX/fexp2.ll

Prince781 · 2024-12-19T23:17:15Z

Updates:

Added f16 and bf16 variants which promote to f32
Support is off by default. User turns it on with either -nvptx-approx-{log2,exp2}f32 or -enable-unsafe-fp-math
Added expected failure tests when support is not requested.

Prince781 · 2024-12-24T10:07:43Z

Updated with more improvements. Please see commit message / first comment for more details!

llvm/include/llvm/IR/IntrinsicsNVVM.td

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td

Prince781 · 2025-01-03T03:23:35Z

@AlexMaclean Tried with the afn flag on @llvm.log2(). This works only if you don't also have non-native operations that are expanded. e.g. f16 = flog2 afn t0 will be expanded to f16 = fptrunc (f32 flog2 (f32 fpextend t0)) where SelectionDAG drops afn on the new f32 flog2 node, causing a crash.

It would be nice if SelectionDAG supported something like "preserve afn".

Anyway, I think these changes can be merged now.

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td

llvm/lib/Target/NVPTX/NVPTXIntrinsics.td

- Add support for `@llvm.exp2()`: - LLVM: `float` -> PTX: `ex2.approx{.ftz}.f32` - LLVM: `half` -> PTX: `ex2.approx.f16` - LLVM: `<2 x half>` -> PTX: `ex2.approx.f16x2` - LLVM: `bfloat` -> PTX: `ex2.approx.ftz.bf16` - LLVM: `<2 x bfloat>` -> PTX: `ex2.approx.ftz.bf16x2` - Any operations with non-native vector widths are expanded. On targets not supporting f16/bf16, values are promoted to f32. - Add *CONDITIONAL* support for `@llvm.log2()` [^1]: - LLVM: `float` -> PTX: `lg2.approx{.ftz}.f32` - Support for f16/bf16 is emulated by promoting values to f32. [1]: CUDA implements `exp2()` with `ex2.approx` but `log2()` is implemented differently, so this is off by default. To enable, use the flag `-nvptx-approx-log2f32`.

Prince781 · 2025-01-08T12:21:39Z

Ping

AlexMaclean

LGTM

Prince781 · 2025-01-16T20:07:53Z

Pinging one of the code owners to merge this.

Prince781 · 2025-01-16T20:22:42Z

Thanks @Artem-B!

llvm-ci · 2025-01-16T20:34:18Z

LLVM Buildbot has detected a new failure on builder clang-armv8-quick running on linaro-clang-armv8-quick while building llvm at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/154/builds/10390

Here is the relevant piece of the build log for the reference

Step 5 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'lit :: googletest-timeout.py' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 9
not env -u FILECHECK_OPTS "/usr/bin/python3.10" /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/llvm/utils/lit/lit.py -j1 --order=lexical -v Inputs/googletest-timeout    --param gtest_filter=InfiniteLoopSubTest --timeout=1 > /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/Output/googletest-timeout.py.tmp.cmd.out
# executed command: not env -u FILECHECK_OPTS /usr/bin/python3.10 /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/llvm/utils/lit/lit.py -j1 --order=lexical -v Inputs/googletest-timeout --param gtest_filter=InfiniteLoopSubTest --timeout=1
# .---command stderr------------
# | lit.py: /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 1 seconds was requested on the command line. Forcing timeout to be 1 seconds.
# `-----------------------------
# RUN: at line 11
FileCheck --check-prefix=CHECK-INF < /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/Output/googletest-timeout.py.tmp.cmd.out /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/googletest-timeout.py
# executed command: FileCheck --check-prefix=CHECK-INF /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/googletest-timeout.py
# .---command stderr------------
# | /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/googletest-timeout.py:34:14: error: CHECK-INF: expected string not found in input
# | # CHECK-INF: Timed Out: 1
# |              ^
# | <stdin>:13:29: note: scanning from here
# | Reached timeout of 1 seconds
# |                             ^
# | <stdin>:37:2: note: possible intended match here
# |  Timed Out: 2 (100.00%)
# |  ^
# | 
# | Input file: <stdin>
# | Check file: /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/googletest-timeout.py
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |             8:  
# |             9:  
# |            10: -- 
# |            11: exit: -9 
# |            12: -- 
# |            13: Reached timeout of 1 seconds 
# | check:34'0                                 X error: no match found
# |            14: ******************** 
# | check:34'0     ~~~~~~~~~~~~~~~~~~~~~
# |            15: TIMEOUT: googletest-timeout :: DummySubDir/OneTest.py/1/2 (2 of 2) 
# | check:34'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |            16: ******************** TEST 'googletest-timeout :: DummySubDir/OneTest.py/1/2' FAILED ******************** 
# | check:34'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |            17: Script(shard): 
# | check:34'0     ~~~~~~~~~~~~~~~
...

Prince781 · 2025-01-16T20:59:41Z

@Artem-B

Also, I thought I'd ask here: do you know how I can gain write access? I emailed Chris Lattner but he didn't respond.

jhuber6 · 2025-01-16T21:01:39Z

@Artem-B

Also, I thought I'd ask here: do you know how I can gain write access? I emailed Chris Lattner but he didn't respond.

https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access though I believe they're in the process of updating that, something like requiring 5 commits and two existing contributors to +1.

Artem-B · 2025-01-21T19:11:01Z

@Prince781 It appears that the tests are generating 32-bit PTX and it's no longer supported by recent CUDA versions.

[  1] ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
[  2] ; RUN: llc < %s -mcpu=sm_20 -mattr=+ptx32 | FileCheck --check-prefixes=CHECK %s [OK]
llc < third_party/llvm/llvm-project/llvm/test/CodeGen/NVPTX/f32-lg2.ll -mcpu=sm_20 -mattr=+ptx32 | third_party/llvm/llvm-project/llvm/FileCheck --allow-unused-prefixes --check-prefixes=CHECK third_party/llvm/llvm-project/llvm/test/CodeGen/NVPTX/f32-lg2.ll
[  3] ; RUN: %if ptxas %{ llc < %s -mcpu=sm_20 -mattr=+ptx32 | %ptxas-verify %} [FAIL]
 llc < third_party/llvm/llvm-project/llvm/test/CodeGen/NVPTX/f32-lg2.ll -mcpu=sm_20 -mattr=+ptx32 | third_party/gpus/cuda/_virtual_includes/_stage_runtime/third_party/gpus/cuda/bin/ptxas -arch=sm_60 -c -o /dev/null - 
ptxas warning :  64 Bit host architecture (--machine) being used mismatches with .address_size of 32 bits
ptxas fatal   :  32-Bit compilation is no longer supported
Command failed: exit status 255

You can reproduce it by running the tests with LLVM_PTXAS_EXECUTABLE=/path/to/cuda-12.6.0/bin/ptxas

jhuber6 · 2025-01-21T19:20:18Z

@Prince781 It appears that the tests are generating 32-bit PTX and it's no longer supported by recent CUDA versions.

[  1] ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
[  2] ; RUN: llc < %s -mcpu=sm_20 -mattr=+ptx32 | FileCheck --check-prefixes=CHECK %s [OK]
llc < third_party/llvm/llvm-project/llvm/test/CodeGen/NVPTX/f32-lg2.ll -mcpu=sm_20 -mattr=+ptx32 | third_party/llvm/llvm-project/llvm/FileCheck --allow-unused-prefixes --check-prefixes=CHECK third_party/llvm/llvm-project/llvm/test/CodeGen/NVPTX/f32-lg2.ll
[  3] ; RUN: %if ptxas %{ llc < %s -mcpu=sm_20 -mattr=+ptx32 | %ptxas-verify %} [FAIL]
 llc < third_party/llvm/llvm-project/llvm/test/CodeGen/NVPTX/f32-lg2.ll -mcpu=sm_20 -mattr=+ptx32 | third_party/gpus/cuda/_virtual_includes/_stage_runtime/third_party/gpus/cuda/bin/ptxas -arch=sm_60 -c -o /dev/null - 
ptxas warning :  64 Bit host architecture (--machine) being used mismatches with .address_size of 32 bits
ptxas fatal   :  32-Bit compilation is no longer supported
Command failed: exit status 255

You can reproduce it by running the tests with LLVM_PTXAS_EXECUTABLE=/path/to/cuda-12.6.0/bin/ptxas

The triple is just missing 64, I can probably fix it along with something else.

Prince781 · 2025-01-22T09:22:10Z

@jhuber6 Thank you!

llvmbot added the backend:NVPTX label Dec 19, 2024

AlexMaclean requested a review from Artem-B December 19, 2024 04:48

AlexMaclean reviewed Dec 19, 2024

View reviewed changes

durga4github reviewed Dec 19, 2024

View reviewed changes

llvm/test/CodeGen/NVPTX/fexp2.ll Outdated Show resolved Hide resolved

Prince781 force-pushed the dev/pferro/nvptx-fexp2 branch 4 times, most recently from 5aeb9f8 to d61ba61 Compare December 19, 2024 14:45

Prince781 requested review from AlexMaclean and durga4github December 19, 2024 14:45

Prince781 force-pushed the dev/pferro/nvptx-fexp2 branch from d61ba61 to 99ddd72 Compare December 19, 2024 14:58

AlexMaclean approved these changes Dec 19, 2024

View reviewed changes

durga4github reviewed Dec 19, 2024

View reviewed changes

llvm/test/CodeGen/NVPTX/fexp2.ll Outdated Show resolved Hide resolved

durga4github approved these changes Dec 19, 2024

View reviewed changes

Artem-B reviewed Dec 19, 2024

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp Outdated Show resolved Hide resolved

Prince781 force-pushed the dev/pferro/nvptx-fexp2 branch from 99ddd72 to e1a68bf Compare December 19, 2024 19:14

Prince781 changed the title ~~[NVPTX] Support llvm.exp2 for f32 and vector of f32~~ [NVPTX] Support llvm.{exp2,log2} for f32 and vector of f32 Dec 19, 2024

Prince781 force-pushed the dev/pferro/nvptx-fexp2 branch 2 times, most recently from 3761255 to bcc74fa Compare December 19, 2024 19:17

Prince781 force-pushed the dev/pferro/nvptx-fexp2 branch from bcc74fa to 322982b Compare December 19, 2024 19:41

justinfargnoli reviewed Dec 19, 2024

View reviewed changes

llvm/test/CodeGen/NVPTX/fexp2.ll Outdated Show resolved Hide resolved

Prince781 force-pushed the dev/pferro/nvptx-fexp2 branch from 322982b to 1e5be93 Compare December 19, 2024 20:35

Artem-B reviewed Dec 19, 2024

View reviewed changes

llvm/test/CodeGen/NVPTX/fexp2.ll Show resolved Hide resolved

Prince781 force-pushed the dev/pferro/nvptx-fexp2 branch 2 times, most recently from 0fc5638 to 5619291 Compare December 19, 2024 23:14

Prince781 changed the title ~~[NVPTX] Support exp2 and log2 for f32/f16/bf16 and vectors~~ [NVPTX] Improve support for {ex2,lg2}.approx Dec 24, 2024

Prince781 force-pushed the dev/pferro/nvptx-fexp2 branch from faaca98 to 72f468e Compare December 24, 2024 10:28

AlexMaclean reviewed Dec 24, 2024

View reviewed changes

llvm/include/llvm/IR/IntrinsicsNVVM.td Outdated Show resolved Hide resolved

AlexMaclean reviewed Dec 24, 2024

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td Outdated Show resolved Hide resolved

Prince781 force-pushed the dev/pferro/nvptx-fexp2 branch 3 times, most recently from 53aae32 to b42a67d Compare December 25, 2024 04:27

Prince781 force-pushed the dev/pferro/nvptx-fexp2 branch from b42a67d to ba0caf9 Compare January 3, 2025 03:24

AlexMaclean reviewed Jan 3, 2025

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td Outdated Show resolved Hide resolved

AlexMaclean reviewed Jan 3, 2025

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXIntrinsics.td Outdated Show resolved Hide resolved

Prince781 force-pushed the dev/pferro/nvptx-fexp2 branch from ba0caf9 to f711117 Compare January 3, 2025 19:10

AlexMaclean reviewed Jan 3, 2025

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXIntrinsics.td Outdated Show resolved Hide resolved

Prince781 force-pushed the dev/pferro/nvptx-fexp2 branch from f711117 to 6e95b75 Compare January 3, 2025 21:17

Prince781 force-pushed the dev/pferro/nvptx-fexp2 branch from 6e95b75 to 71d90aa Compare January 3, 2025 21:27

AlexMaclean approved these changes Jan 8, 2025

View reviewed changes

Artem-B merged commit 3ba339b into llvm:main Jan 16, 2025
8 checks passed

Prince781 deleted the dev/pferro/nvptx-fexp2 branch January 17, 2025 09:21

[NVPTX] Improve support for {ex2,lg2}.approx #120519

[NVPTX] Improve support for {ex2,lg2}.approx #120519

Uh oh!

Conversation

Prince781 commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexMaclean left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexMaclean left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Prince781 commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Artem-B commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Prince781 commented Dec 19, 2024

Uh oh!

Uh oh!

Uh oh!

Prince781 commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Prince781 commented Dec 24, 2024

Uh oh!

Uh oh!

Uh oh!

Prince781 commented Jan 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Prince781 commented Jan 8, 2025

Uh oh!

AlexMaclean left a comment

Choose a reason for hiding this comment

Uh oh!

Prince781 commented Jan 16, 2025

Uh oh!

Uh oh!

Prince781 commented Jan 16, 2025

Uh oh!

llvm-ci commented Jan 16, 2025

Uh oh!

Prince781 commented Jan 16, 2025

Uh oh!

jhuber6 commented Jan 16, 2025

Uh oh!

Artem-B commented Jan 21, 2025

Uh oh!

jhuber6 commented Jan 21, 2025

Uh oh!

Prince781 commented Jan 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Prince781 commented Dec 19, 2024 •

edited

Loading

llvmbot commented Dec 19, 2024 •

edited

Loading

Prince781 commented Dec 19, 2024 •

edited

Loading

Artem-B commented Dec 19, 2024 •

edited

Loading

Prince781 commented Dec 19, 2024 •

edited

Loading