[release/2.7][ROCm][inductor] Improved fast_tanh code generation (#2802)

naromero77amd · web-flow · commit 9dc91205f1ac · 2025-11-17T12:16:03.000-06:00
In the ROCm fork of PyTorch 2.7, Inductor currently has codegen support for fast_tanhf. However, it is currently guarded by `TORCHINDUCTOR_USE_FAST_MATH` environment variable due to some NaN issues in the original Triton implementation of fast_tanhf. Upstream Triton has an improved fast_tanhf where the NaN issues are now fixed. This upstream commit has been backported to ROCm fork of Triton (see code comments). Thus, I have removed the conditionalization on Triton versions as well. A bump in the Triton commit is also needed. Other notes: - In support of [SWDEV-560271](https://ontrack-internal.amd.com/browse/SWDEV-560271) - Triton 3.3 backport of upstream Triton commit ROCm/triton#902 - Similar to #2803, #2804 - Related to pytorch#162052
diff --git a/.ci/docker/ci_commit_pins/triton.txt b/.ci/docker/ci_commit_pins/triton.txt
@@ -1 +1 @@
-9c7bc0a3d41407bff948b40cd0e9c793147e49bc
+80ed7f41e4b5d6e71651847e4725f4e7c2999a08
diff --git a/torch/_inductor/codegen/triton.py b/torch/_inductor/codegen/triton.py
@@ -1217,11 +1217,10 @@ def tan(x):
     @staticmethod
     @maybe_upcast_float32()
     def tanh(x):
-        if config.use_fast_math and torch.version.hip:
-            if get_triton_version() > (3, 4):
-                return f"libdevice.fast_tanhf({x})"
-            else:
-                return f"libdevice.tanh({x})"
+        if torch.version.hip and get_triton_version() > (3, 2):
+            # On ROCm, use fast_tanhf depending on Triton version
+            # Requires ROCm fork of Triton 3.3, 3.4, 3.5 or upstream Triton 3.6+
+            return f"libdevice.fast_tanhf({x})"
         else:
             return f"libdevice.tanh({x})"
 

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-9c7bc0a3d41407bff948b40cd0e9c793147e49bc`
	`1`	`+80ed7f41e4b5d6e71651847e4725f4e7c2999a08`