Swa padding brcm by sudhakarsingh27 · Pull Request #4 · sudhakarsingh27/TransformerEngine

sudhakarsingh27 · 2025-12-02T23:35:31Z

Description

Please include a brief summary of the changes, relevant motivation and context.

Fixes # (issue)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

for more information, see https://pre-commit.ci

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

for more information, see https://pre-commit.ci

* fix ctx.aval_out indexing for workspace * add cudnn init to prepare phase of norm custom calls * add thread_local for norm registry instance --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

Add Jeremy to ci users Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

for more information, see https://pre-commit.ci

* softmax custom calls with correct encapsulates * rm jax deprecated features --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

for more information, see https://pre-commit.ci

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

for more information, see https://pre-commit.ci

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

for more information, see https://pre-commit.ci

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

for more information, see https://pre-commit.ci

…VIDIA#1358) * draft implementation of fsdp2 fp8 all gather Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> * fix the convergence issue Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> * Add warning Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * disable lint error Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix the lint error Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> * fix lint error Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint error Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint error Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> * add comments Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> * add ref Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> * add related tests Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

add max_t for KV Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* Add util functions to attn_mask_type Signed-off-by: Reese Wang <rewang@nvidia.com> * Add util functions to qkv_layout Signed-off-by: Reese Wang <rewang@nvidia.com> * Fix THD cross reference code Signed-off-by: Reese Wang <rewang@nvidia.com> * Remove explicit segment_pad, encoding it to segment_ids Signed-off-by: Reese Wang <rewang@nvidia.com> * Add jax.jit, replace _token with segment_ids, rename bias shape enum Signed-off-by: Reese Wang <rewang@nvidia.com> * Add comment for make_mask Signed-off-by: Reese Wang <rewang@nvidia.com> * Clean code Signed-off-by: Reese Wang <rewang@nvidia.com> * Add doc strings for the added functions Signed-off-by: Reese Wang <rewang@nvidia.com> * Remove cache for fa deterministic which causes UT failed Signed-off-by: Reese Wang <rewang@nvidia.com> * Rename fixture to avoid conflict Signed-off-by: Reese Wang <rewang@nvidia.com> --------- Signed-off-by: Reese Wang <rewang@nvidia.com>

add weights_only=False for torch.load Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* WIP: fix get_swa_mask for padding Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix mask type setting Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix the order of checking valid swa and changing mask type Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revamp to get full mask Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…#1356) * Move test distributed encoder to L0 distributed test suit --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by: Reese Wang <rewang@nvidia.com>

…sal (NVIDIA#1378) * add swa (left,0) + padding + brcm support Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * final fixes Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * upgrade to FE 1.9-rc Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax tests Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * skip thd + CP + fused attn tests for cuDNN 9.6+ due to different stats shapes Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

cyanguwa and others added 30 commits December 11, 2024 21:30

WIP: add support for SWA (left,0) + THD/BSHD/SBHD + padding + CM/BRCM

165f99c

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

e2d9ffe

for more information, see https://pre-commit.ci

enable more support

8572c1f

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

681ffbe

for more information, see https://pre-commit.ci

[JAX] Bug fix for distributed normalization (NVIDIA#1366)

0e1d9fa

* fix ctx.aval_out indexing for workspace * add cudnn init to prepare phase of norm custom calls * add thread_local for norm registry instance --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

Add user to CI (NVIDIA#1371)

e7bfc0c

Add Jeremy to ci users Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Fix an invalid reference in the doc (NVIDIA#1362)

1ae8190

WIP: fix up swa

956570f

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

8d17e10

for more information, see https://pre-commit.ci

[JAX] Bug Fix: Softmax FFIs with correct Encapsulates (NVIDIA#1375)

1975ace

* softmax custom calls with correct encapsulates * rm jax deprecated features --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

add left_bound/right_bound

9a09edb

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

277dd60

for more information, see https://pre-commit.ci

tweak tests

fc6e338

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

dff7e09

for more information, see https://pre-commit.ci

fix C swa and tests

e64a291

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

fix get_swa_mask

a8ca89c

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

dd9159b

for more information, see https://pre-commit.ci

fix lint

8c4d836

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

6f677da

for more information, see https://pre-commit.ci

[common] Add max_t support for KV in THD (NVIDIA#1370)

f4f35c2

add max_t for KV Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

Merge branch 'main' into swa_padding_brcm

6a8e073

Merge branch 'main' into swa_padding_brcm

0998f9e

[PyTorch] Add weights_only=False for torch.load (NVIDIA#1374)

83dac8c

add weights_only=False for torch.load Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

[JAX] Move parallel encoder tests to L0 distributed test set. (NVIDIA…

a3b32ec

…#1356) * Move test distributed encoder to L0 distributed test suit --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by: Reese Wang <rewang@nvidia.com>

Update copyright to include 2025 (NVIDIA#1388)

c9ea6be

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Merge branch 'main' into swa_padding_brcm

cd529f9

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

swa (left, right) after merging with main

c72108e

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swa padding brcm#4

Swa padding brcm#4
sudhakarsingh27 wants to merge 31 commits intomain_baseline_for_swa_padding_brcmfrom
swa_padding_brcm

sudhakarsingh27 commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

sudhakarsingh27 commented Dec 2, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants