[release/2.9][ROCm][inductor] Improved fast_tanh code generation #2804

naromero77amd · 2025-11-13T02:07:09Z

In the ROCm fork of PyTorch 2.9, Inductor currently has codegen support for fast_tanhf. However, there were some NaN issues in the original Triton implementation of fast_tanhf .

Upstream Triton has an improved fast_tanhf where the NaN issues are now fixed. This upstream commit has been backported to ROCm fork of Triton (see code comments).

A bump in the Triton commit is also needed.

Other notes:

In support of SWDEV-560271
Triton 3.5 backport of upstream Triton commit [AMD] reimplement fast_tanhf() to avoid overflow (#8551) triton#901
Similar to [release/2.7][ROCm][inductor] Improved fast_tanh code generation #2802, [release/2.8][ROCm][inductor] Improved fast_tanh code generation #2803
Related to [ROCm][inductor] Codegen support for fast_tanhf pytorch/pytorch#162052

(cherry picked from commit 7c5277f)

rocm-repo-management-api · 2025-11-13T02:33:27Z

Jenkins build for 0b59f1c2c8cbe8aeb86ce9a5d6aa471f75e76091 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

naromero77amd · 2025-11-14T16:43:05Z

I have confirmed that it resolves the reproducer in the Jira.

torch/_inductor/codegen/triton.py

(cherry picked from commit f416c71)

rocm-repo-management-api · 2025-11-14T20:47:09Z

Jenkins build for 427f9b0052f4e799d68a25f40715c4cb0cd3bac3 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

In the ROCm fork of PyTorch 2.7, Inductor currently has codegen support for fast_tanhf. However, it is currently guarded by `TORCHINDUCTOR_USE_FAST_MATH` environment variable due to some NaN issues in the original Triton implementation of fast_tanhf. Upstream Triton has an improved fast_tanhf where the NaN issues are now fixed. This upstream commit has been backported to ROCm fork of Triton (see code comments). Thus, I have removed the conditionalization on Triton versions as well. A bump in the Triton commit is also needed. Other notes: - In support of [SWDEV-560271](https://ontrack-internal.amd.com/browse/SWDEV-560271) - Triton 3.3 backport of upstream Triton commit ROCm/triton#902 - Similar to #2803, #2804 - Related to pytorch#162052

In the ROCm fork of PyTorch 2.8, Inductor currently has codegen support for fast_tanhf. However, there were some NaN issues in the original Triton implementation of fast_tanhf . Upstream Triton has an improved fast_tanhf where the NaN issues are now fixed. This upstream commit has been backported to ROCm fork of Triton (see code comments). A bump in the Triton commit is also needed. Other notes: - In support of [SWDEV-560271](https://ontrack-internal.amd.com/browse/SWDEV-560271) - Triton 3.4 backport of upstream Triton commit ROCm/triton#900 - Similar to #2802, #2804 - Related to pytorch#162052

In the ROCm fork of PyTorch 2.9, Inductor currently has codegen support for fast_tanhf. However, there were some NaN issues in the original Triton implementation of fast_tanhf . Upstream Triton has an improved fast_tanhf where the NaN issues are now fixed. This upstream commit has been backported to ROCm fork of Triton (see code comments). A bump in the Triton commit is also needed. Other notes: - In support of [SWDEV-560271](https://ontrack-internal.amd.com/browse/SWDEV-560271) - Triton 3.5 backport of upstream Triton commit ROCm/triton#901 - Similar to #2802, #2803 - Related to pytorch#162052

naromero77amd added 2 commits November 13, 2025 02:05

On ROCm, always use fast_tanhf for triton codegen.

be87a96

(cherry picked from commit 7c5277f)

Bump up Triton commit to support fast_tanhf.

0b59f1c

naromero77amd requested review from jataylo, jeffdaily, jithunnair-amd and pruthvistony as code owners November 13, 2025 02:07

This was referenced Nov 13, 2025

[release/2.8][ROCm][inductor] Improved fast_tanh code generation #2803

Merged

[release/2.7][ROCm][inductor] Improved fast_tanh code generation #2802

Merged

naromero77amd changed the title ~~[release /2.9][ROCm][inductor] Improved fast_tanh code generation~~ [release/2.9][ROCm][inductor] Improved fast_tanh code generation Nov 13, 2025

jataylo requested changes Nov 14, 2025

View reviewed changes

torch/_inductor/codegen/triton.py Outdated Show resolved Hide resolved

Conditionalize fast_tanhf on triton_version.

427f9b0

(cherry picked from commit f416c71)

pruthvistony approved these changes Nov 17, 2025

View reviewed changes

pruthvistony merged commit 78640c9 into release/2.9 Nov 17, 2025
6 of 8 checks passed

pruthvistony deleted the release_/2.9_new_fast_tanh branch November 17, 2025 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[release/2.9][ROCm][inductor] Improved fast_tanh code generation #2804

[release/2.9][ROCm][inductor] Improved fast_tanh code generation #2804

naromero77amd commented Nov 13, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

naromero77amd commented Nov 14, 2025

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Nov 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[release/2.9][ROCm][inductor] Improved fast_tanh code generation #2804

[release/2.9][ROCm][inductor] Improved fast_tanh code generation #2804

Conversation

naromero77amd commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naromero77amd commented Nov 14, 2025

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

naromero77amd commented Nov 13, 2025 •

edited

Loading

rocm-repo-management-api bot commented Nov 13, 2025 •

edited

Loading

rocm-repo-management-api bot commented Nov 14, 2025 •

edited

Loading