Skip to content

Conversation

@naromero77amd
Copy link

@naromero77amd naromero77amd commented Nov 13, 2025

In the ROCm fork of PyTorch 2.9, Inductor currently has codegen support for fast_tanhf. However, there were some NaN issues in the original Triton implementation of fast_tanhf .

Upstream Triton has an improved fast_tanhf where the NaN issues are now fixed. This upstream commit has been backported to ROCm fork of Triton (see code comments).

A bump in the Triton commit is also needed.

Other notes:

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Nov 13, 2025

Jenkins build for 0b59f1c2c8cbe8aeb86ce9a5d6aa471f75e76091 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@naromero77amd naromero77amd changed the title [release /2.9][ROCm][inductor] Improved fast_tanh code generation [release/2.9][ROCm][inductor] Improved fast_tanh code generation Nov 13, 2025
@naromero77amd
Copy link
Author

I have confirmed that it resolves the reproducer in the Jira.

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Nov 14, 2025

Jenkins build for 427f9b0052f4e799d68a25f40715c4cb0cd3bac3 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

pruthvistony pushed a commit that referenced this pull request Nov 17, 2025
In the ROCm fork of PyTorch 2.7, Inductor currently has codegen support
for fast_tanhf. However, it is currently guarded by
`TORCHINDUCTOR_USE_FAST_MATH` environment variable due to some NaN
issues in the original Triton implementation of fast_tanhf.

Upstream Triton has an improved fast_tanhf where the NaN issues are now
fixed. This upstream commit has been backported to ROCm fork of Triton
(see code comments).

Thus, I have removed the conditionalization on Triton versions as well.
A bump in the Triton commit is also needed.

Other notes:
- In support of
[SWDEV-560271](https://ontrack-internal.amd.com/browse/SWDEV-560271)
- Triton 3.3 backport of upstream Triton commit
ROCm/triton#902
- Similar to #2803,
#2804
- Related to pytorch#162052
pruthvistony pushed a commit that referenced this pull request Nov 17, 2025
In the ROCm fork of PyTorch 2.8, Inductor currently has codegen support
for fast_tanhf. However, there were some NaN issues in the original
Triton implementation of fast_tanhf .

Upstream Triton has an improved fast_tanhf where the NaN issues are now
fixed. This upstream commit has been backported to ROCm fork of Triton
(see code comments).

A bump in the Triton commit is also needed.

Other notes:

- In support of
[SWDEV-560271](https://ontrack-internal.amd.com/browse/SWDEV-560271)
- Triton 3.4 backport of upstream Triton commit
ROCm/triton#900
- Similar to #2802,
#2804
- Related to pytorch#162052
@pruthvistony pruthvistony merged commit 78640c9 into release/2.9 Nov 17, 2025
6 of 8 checks passed
@pruthvistony pruthvistony deleted the release_/2.9_new_fast_tanh branch November 17, 2025 18:31
jeffdaily pushed a commit that referenced this pull request Nov 17, 2025
In the ROCm fork of PyTorch 2.9, Inductor currently has codegen support
for fast_tanhf. However, there were some NaN issues in the original
Triton implementation of fast_tanhf .

Upstream Triton has an improved fast_tanhf where the NaN issues are now
fixed. This upstream commit has been backported to ROCm fork of Triton
(see code comments).

A bump in the Triton commit is also needed.

Other notes:

- In support of
[SWDEV-560271](https://ontrack-internal.amd.com/browse/SWDEV-560271)
- Triton 3.5 backport of upstream Triton commit
ROCm/triton#901
- Similar to #2802,
#2803
- Related to pytorch#162052
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants