[ROCm] support triton-based flash-attn in TE #177

wangye805 · 2025-05-01T16:34:32Z

Description

Enable triton-based flash-attn in TE

Fixes # (issue)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

ipanfilo · 2025-05-02T00:16:37Z

transformer_engine/pytorch/attention.py

 else:
    if _flash_attn_version_required <= _flash_attn_version <= _flash_attn_max_version:
-        from flash_attn_2_cuda import varlen_bwd as flash_attn_cuda_bwd
+        if IS_HIP_EXTENSION and (os.getenv("FLASH_ATTENTION_TRITON_AMD_ENABLE", "FALSE")=="TRUE"):


It looks like flash_attn_cuda_bwd is only used on NV platform and only when explicitly requested by user. So this changes is not needed.

It's used in our version as well:

TransformerEngine/transformer_engine/pytorch/attention.py

Line 5944 in ba794e8

flash_attn_cuda_bwd(

It exists in our version but not used. All its usages are under use_FAv2_bwd condition

the use_FAv2_bwd is a parameter in the forward function of class FusedAttnFunc_qkvpacked, FusedAttnFunc_kvpacked and FusedAttnFunc, which user could set to true I guess? See

TransformerEngine/transformer_engine/pytorch/attention.py

Line 5754 in ba794e8

use_FAv2_bwd,

as an example. Then this flash_attn_cuda_bwd could be used somewhere else by customers

In normal code-path use_FAv2_bwd is got from NVTE_FUSED_ATTN_USE_FAv2_BWD at

TransformerEngine/transformer_engine/pytorch/attention.py

Line 7220 in ba794e8

self.use_FAv2_bwd = os.getenv(

and it is always set to False for AMD devices there and later at

TransformerEngine/transformer_engine/pytorch/attention.py

Line 7355 in ba794e8

use_FAv2_bwd = (

Indeed, users may call FusedAttnFunc* classes directly which will bypass a lot of other logic and checks so I wouldn't worry about this parameter specifically. For sanity, this feature may be explicitly disabled for ROCm but anyway keeping things as-is seems less evil than rely on independent project env variables.

wenchenvincent · 2025-08-27T14:53:01Z

@wangye805 Could you rebase?

[ROCm] support triton-based flash-attn in TE

ba794e8

wangye805 requested review from ipanfilo, jayfurmanek and wenchenvincent May 1, 2025 16:34

wangye805 marked this pull request as ready for review May 1, 2025 16:34

ipanfilo reviewed May 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] support triton-based flash-attn in TE #177

[ROCm] support triton-based flash-attn in TE #177

wangye805 commented May 1, 2025

Uh oh!

ipanfilo May 2, 2025

Uh oh!

wangye805 May 2, 2025

Uh oh!

ipanfilo May 2, 2025

Uh oh!

wangye805 May 2, 2025

Uh oh!

ipanfilo May 3, 2025

Uh oh!

wenchenvincent commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ROCm] support triton-based flash-attn in TE #177

Are you sure you want to change the base?

[ROCm] support triton-based flash-attn in TE #177

Conversation

wangye805 commented May 1, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

ipanfilo May 2, 2025

Choose a reason for hiding this comment

Uh oh!

wangye805 May 2, 2025

Choose a reason for hiding this comment

Uh oh!

ipanfilo May 2, 2025

Choose a reason for hiding this comment

Uh oh!

wangye805 May 2, 2025

Choose a reason for hiding this comment

Uh oh!

ipanfilo May 3, 2025

Choose a reason for hiding this comment

Uh oh!

wenchenvincent commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants