Enable AITER ASM distributed FA testing in jax/torch #363

Micky774 · 2025-11-05T19:10:06Z

Description

Please include a brief summary of the changes, relevant motivation and context.

Fixes # (issue)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

ci/jax.sh

ipanfilo · 2025-11-06T16:44:34Z

Which functionality not covered by existing tests does it cover?

wangye805 · 2025-11-06T19:15:17Z

Which functionality not covered by existing tests does it cover?

Previously our jax and pytorch distributed fused-attn only enables v2 ck backends, not v3

ipanfilo · 2025-11-06T21:01:36Z

Which functionality not covered by existing tests does it cover?

Previously our jax and pytorch distributed fused-attn only enables v2 ck backends, not v3

Yes, but does it run different fused attn backend configs/kernels then non distributed ones? Or there is functionality concern of coexistence of them with RCCL?

wangye805 · 2025-11-06T21:51:53Z

Which functionality not covered by existing tests does it cover?

Previously our jax and pytorch distributed fused-attn only enables v2 ck backends, not v3

Yes, but does it run different fused attn backend configs/kernels then non distributed ones? Or there is functionality concern of coexistence of them with RCCL?

In the distributed fused-attn (CP) pytest suite, the reference run is usually a single-GPU fused-attn with full seqlen (for example sq=skv=8192) using the default attn backend. The target run decomposes the single full-size fused-attn into 4 or 8 smaller fused-attn (for example, sq=sk=4096), runs those smaller fused-attn instances using the default backend and then "glue" the results in the CP way.

In my option, why we need to enable v3 for distributed fused-attn:
1). The decomposition may create new fused-attn configs not covered by our single GPU fused-attn pytests
2). The "glue" process actually tested the softmax_lse generated in fwd pass. If the softmax lse results are not correct, the glued results will be wrong. And our single GPU fused-attn pytest do not test softmax_lse at all.

ipanfilo · 2025-11-10T15:32:56Z

ci/jax.sh

    *0.4.35*)
        # Workaround for distributed tests hang with xla_flag
 	    XLA_FLAGS="--xla_gpu_enable_nccl_comm_splitting=false" run 3 test_distributed_fused_attn.py -k 'not test_context_parallel_ring_attn'
+ 	    XLA_FLAGS="--xla_gpu_enable_nccl_comm_splitting=false" NVTE_CK_USES_FWD_V3=1 NVTE_CK_USES_BWD_V3=1 run 3 test_distributed_fused_attn.py -k 'not test_context_parallel_ring_attn'


This will run it with AOTriton too

Updated with a guard in the JAX ci script

With those changes env variables are not seen by run method - they are applied to test call only.
Using run_default_fa_lbl. All V3 calls should be labelled with "v3" to distinct them from regular test_distributed_fused_attn call

wenchenvincent · 2025-11-11T05:57:10Z

@Micky774 Could you rebase upon latest dev to incorporate the hot fix for the core sgpu tests?

wenchenvincent · 2025-11-14T02:22:35Z

@ipanfilo Could you check if all your comments have been addressed?

ipanfilo · 2025-11-14T02:21:14Z

ci/jax.sh

    *0.4.35*)
        # Workaround for distributed tests hang with xla_flag
 	    XLA_FLAGS="--xla_gpu_enable_nccl_comm_splitting=false" run 3 test_distributed_fused_attn.py -k 'not test_context_parallel_ring_attn'
+ 	    XLA_FLAGS="--xla_gpu_enable_nccl_comm_splitting=false" NVTE_CK_USES_FWD_V3=1 NVTE_CK_USES_BWD_V3=1 run 3 test_distributed_fused_attn.py -k 'not test_context_parallel_ring_attn'


With those changes env variables are not seen by run method - they are applied to test call only.
Using run_default_fa_lbl. All V3 calls should be labelled with "v3" to distinct them from regular test_distributed_fused_attn call

Add AITER ASM enabled distributed FA testing in jax/torch

96a0158

Micky774 requested review from ipanfilo, wangye805 and wenchenvincent as code owners November 5, 2025 19:10

wangye805 requested changes Nov 5, 2025

View reviewed changes

ci/jax.sh Show resolved Hide resolved

ci/jax.sh Show resolved Hide resolved

Added V3 dist runs for older JAX versions

2999f9f

wangye805 approved these changes Nov 6, 2025

View reviewed changes

ipanfilo requested changes Nov 10, 2025

View reviewed changes

Guard CK V3 tests based on backend

3b072ff

wenchenvincent requested a review from ipanfilo November 11, 2025 05:54

Merge branch 'dev' into zain/aiter-v3-dist

72ae2c8

wenchenvincent approved these changes Nov 14, 2025

View reviewed changes

ipanfilo requested changes Nov 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable AITER ASM distributed FA testing in jax/torch #363

Enable AITER ASM distributed FA testing in jax/torch #363

Micky774 commented Nov 5, 2025

Uh oh!

Uh oh!

Uh oh!

ipanfilo commented Nov 6, 2025

Uh oh!

wangye805 commented Nov 6, 2025

Uh oh!

ipanfilo commented Nov 6, 2025

Uh oh!

wangye805 commented Nov 6, 2025

Uh oh!

ipanfilo Nov 10, 2025

Uh oh!

Micky774 Nov 10, 2025

Uh oh!

ipanfilo Nov 14, 2025

Uh oh!

wenchenvincent commented Nov 11, 2025

Uh oh!

wenchenvincent commented Nov 14, 2025

Uh oh!

ipanfilo Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Enable AITER ASM distributed FA testing in jax/torch #363

Are you sure you want to change the base?

Enable AITER ASM distributed FA testing in jax/torch #363

Conversation

Micky774 commented Nov 5, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

Uh oh!

Uh oh!

ipanfilo commented Nov 6, 2025

Uh oh!

wangye805 commented Nov 6, 2025

Uh oh!

ipanfilo commented Nov 6, 2025

Uh oh!

wangye805 commented Nov 6, 2025

Uh oh!

ipanfilo Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Micky774 Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

ipanfilo Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

wenchenvincent commented Nov 11, 2025

Uh oh!

wenchenvincent commented Nov 14, 2025

Uh oh!

ipanfilo Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants