Skip to content

Commit cba8b9d

Browse files
[release/2.8][ROCm][inductor] Improved fast_tanh code generation (#2803)
In the ROCm fork of PyTorch 2.8, Inductor currently has codegen support for fast_tanhf. However, there were some NaN issues in the original Triton implementation of fast_tanhf . Upstream Triton has an improved fast_tanhf where the NaN issues are now fixed. This upstream commit has been backported to ROCm fork of Triton (see code comments). A bump in the Triton commit is also needed. Other notes: - In support of [SWDEV-560271](https://ontrack-internal.amd.com/browse/SWDEV-560271) - Triton 3.4 backport of upstream Triton commit ROCm/triton#900 - Similar to #2802, #2804 - Related to pytorch#162052
1 parent 63e525b commit cba8b9d

File tree

2 files changed

+8
-3
lines changed

2 files changed

+8
-3
lines changed
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
21876a4bbaf371bcb83df8e6ee4f43a92f524dfe
1+
0cace8d2336a9dc399effbb11522eea7f7b8c0b2

torch/_inductor/codegen/triton.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
from torch._prims_common import is_integer_dtype
2727
from torch.utils._ordered_set import OrderedSet
2828
from torch.utils._sympy.functions import CeilDiv, FloorDiv, ModularIndexing
29-
from torch.utils._triton import has_triton_package
29+
from torch.utils._triton import has_triton_package, get_triton_version
3030

3131
from ...utils._sympy.symbol import free_symbol_is_type, prefix_str, symbol_is_type, SymT
3232
from ...utils._sympy.value_ranges import ValueRanges
@@ -1232,7 +1232,12 @@ def tan(x):
12321232
@staticmethod
12331233
@maybe_upcast_float32()
12341234
def tanh(x):
1235-
return f"libdevice.fast_tanhf({x})"
1235+
if torch.version.hip and get_triton_version() > (3, 2):
1236+
# On ROCm, use fast_tanhf depending on Triton version
1237+
# Requires ROCm fork of Triton 3.3, 3.4, 3.5 or upstream Triton 3.6+
1238+
return f"libdevice.fast_tanhf({x})"
1239+
else:
1240+
return f"libdevice.tanh({x})"
12361241

12371242
@staticmethod
12381243
@maybe_upcast_float32()

0 commit comments

Comments
 (0)