IFU release v2.6 #406

wangye805 · 2026-01-03T16:27:03Z

Description

upstream release_v2.6 (with commit c90a720) IFU based on dev commit (669b556)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Resolve several conflicts

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

* Remove GH pinned deps Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Pin onnxscript Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

Reset FP8 weight workspace if usages are invalid Signed-off-by: Tim Moon <[email protected]>

…end` (#1965) Update utils.py Fix the condition error of the FP8 attention in `get_attention_backend` Signed-off-by: yuzhongw-nvidia <[email protected]> Co-authored-by: Xiaowei Ren <[email protected]>

* exclude 9.10.0/.1 for certain configs Signed-off-by: Charlene Yang <[email protected]> * fix kv_channels Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add get_backend to tests Signed-off-by: Charlene Yang <[email protected]> * add init files Signed-off-by: Charlene Yang <[email protected]> * fix numerics and cuda graph tests Signed-off-by: Charlene Yang <[email protected]> * fix jax tests Signed-off-by: Charlene Yang <[email protected]> * remove prints Signed-off-by: Charlene Yang <[email protected]> * minor changes after renaming Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix import structure and rename get_attention_backends Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix docs and benchmarks Signed-off-by: Charlene Yang <[email protected]> * fix get backend calls Signed-off-by: Charlene Yang <[email protected]> * Revert "fix get backend calls" This reverts commit 653cbb51c697bc2f975416bb3aac1d85f76c36dc. Signed-off-by: Charlene Yang <[email protected]> * Revert "fix docs and benchmarks" This reverts commit 98cd52e04ff7c53e26b412195f5744e39f7ed0e9. Signed-off-by: Charlene Yang <[email protected]> * fix docs, benchmarks and pre-commit ci Signed-off-by: Charlene Yang <[email protected]> * fix dpa/mha flash attn selection Signed-off-by: Charlene Yang <[email protected]> * fix rng states Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ModelConfig Signed-off-by: Charlene Yang <[email protected]> * fix backend selection on Ampere Signed-off-by: Charlene Yang <[email protected]> * fix issues from last merge Signed-off-by: Charlene Yang <[email protected]> * Update tests/pytorch/utils.py Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * remove initialization of rng_states to None Signed-off-by: Charlene Yang <[email protected]> * redefine ModelConfig Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ModelConfig Signed-off-by: Charlene Yang <[email protected]> * fix seed for CP tests Signed-off-by: Charlene Yang <[email protected]> * Update tests/pytorch/test_sanity.py Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move fixture from utils to individual tests Signed-off-by: Charlene Yang <[email protected]> * fix CI Signed-off-by: Charlene Yang <[email protected]> --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <[email protected]>

…ug quantizer (#1963) * Debug linear layer when saving original input and using debug quantizer Signed-off-by: Tim Moon <[email protected]> * Workaround bugs with quantizing with only column-wise usage Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: Tim Moon <[email protected]> * Avoid unnecessary row-wise data Signed-off-by: Tim Moon <[email protected]> * Workaround bugs with quantizing with only column-wise usage FP8 does not support transpose-only cast. Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed conflicts Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor code refactoring to avoid unnecessary checks Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed typo Signed-off-by: Oleg Goncharov <[email protected]> * Fixed dBias accumulation error due to initialization. Minor code refactoring Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Test case to reproduce the init error Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed rowwise dbias error Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed ptx API Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added a struct for two packed FP8 values Signed-off-by: Oleg Goncharov <[email protected]> * Rolled back to scalar code for columnwise scaling due to its better performance Signed-off-by: Oleg Goncharov <[email protected]> * Minor corrections Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Rebased on main Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes per code review Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removed constexpr in C++ test suite to build faster Signed-off-by: Oleg Goncharov <[email protected]> * Computed activations are now numerically truncated to InputType before scaling. Improved test suite. Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor refactoring Signed-off-by: Oleg Goncharov <[email protected]> * Minor refactoring Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Modified mismatches checks of MXFP8 to address FP8 numerics Signed-off-by: Oleg Goncharov <[email protected]> * Implemented Jeremy's fixes to JAX test suite with an intermediate downcast Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reduced the dims of the test tensors to improve CI runtime Signed-off-by: Oleg Goncharov <[email protected]> * Fixed memory alignment issue. Compute dbias without downcast. Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed misaligned memory issue also in gated kernels. Reduced size of MXFP8 gated tests Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Oleg Goncharov <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix current device for cuDNN/cuBLAS handles Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add unit test Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use weight device and improve tests Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

… L0 (#1990) Fix current scaling test_helper.py and enable test_helper.py in L0 Signed-off-by: Jeremy Berchtold <[email protected]>

…on-MXFP8 recipes. (#1962) * add manage_primitives() helper * disable GEMM primitives for non-MXFP8 recipes * implement the NVTE_JAX_CUSTOM_CALLS + deprecate NVTE_JAX_CUSTOM_CALLS_RE * replace NVTE_JAX_CUSTOM_CALLS_RE with NVTE_JAX_CUSTOM_CALLS in TE tests and examples * fix use_jax_gemm contextmanager Signed-off-by: Phuong Nguyen <[email protected]> --------- Signed-off-by: Phuong Nguyen <[email protected]>

Fix cuDNN lib runtime loading and simplify Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

Fix cudnn versioning in support in PyTorch DPA and Fused attn Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

…elism correctly for sequence-parallel inputs (#1980) * updated GemmPrimitive partitioning rules to explicitly control all-reduce vs. reduce-scatter for sequence-parallelism Signed-off-by: Alp Dener <[email protected]> * corrected handling of FSDP sharding for the RHS operand Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use correct logical axes variable to identify sequence-parallel dim in LayerNormDenseGeneral Signed-off-by: Alp Dener <[email protected]> * fixed linting issues Signed-off-by: Alp Dener <[email protected]> * added assert on sequence-parallel options when GemmPrimitive is disabled Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alp Dener <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* optimize static grad outputs Signed-off-by: Robin Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <[email protected]>

Signed-off-by: Przemek Tredak <[email protected]>

…u_release_v2.6_rocm

KshitijLakhani and others added 16 commits July 20, 2025 12:43

Changed VERSION to 2.6.0

bf5b217

Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

[PyTorch] Remove GH pinned deps (#1961)

c7d0271

* Remove GH pinned deps Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Pin onnxscript Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

[PyTorch] Reset FP8 weight workspace if usages are invalid (#1972)

787acff

Reset FP8 weight workspace if usages are invalid Signed-off-by: Tim Moon <[email protected]>

Fix the condition error when checking fp8 attn in `get_attention_back…

9926245

…end` (#1965) Update utils.py Fix the condition error of the FP8 attention in `get_attention_backend` Signed-off-by: yuzhongw-nvidia <[email protected]> Co-authored-by: Xiaowei Ren <[email protected]>

[JAX] Fix current scaling test_helper.py and enable test_helper.py in…

928dfa8

… L0 (#1990) Fix current scaling test_helper.py and enable test_helper.py in L0 Signed-off-by: Jeremy Berchtold <[email protected]>

Fix runtime lib loading for cuDNN (#1989)

e02e289

Fix cuDNN lib runtime loading and simplify Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

Fix cudnn versioning support in PyTorch DPA and Fused attn (#1991)

21d7410

Fix cudnn versioning in support in PyTorch DPA and Fused attn Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

Fix the use-after-free bug in unfused normalization (#2002)

c90a720

Signed-off-by: Przemek Tredak <[email protected]>

Merge remote-tracking branch 'upstream/release_v2.6' into yewang12/if…

966a4ac

…u_release_v2.6_rocm

wangye805 requested review from ipanfilo and wenchenvincent as code owners January 3, 2026 16:27

wangye805 force-pushed the release_v2.6_rocm branch from 40b85a9 to 669b556 Compare January 6, 2026 23:04

[ROCm] Resolve conflicts

97556c6

wangye805 force-pushed the yewang12/ifu_release_v2.6_rocm branch from a90850f to 97556c6 Compare January 9, 2026 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IFU release v2.6 #406

IFU release v2.6 #406

Uh oh!

wangye805 commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

IFU release v2.6 #406

Are you sure you want to change the base?

IFU release v2.6 #406

Uh oh!

Conversation

wangye805 commented Jan 3, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants