Skip to content

Conversation

@wangye805
Copy link
Collaborator

Description

upstream release_v2.6 (with commit c90a720) IFU based on dev commit (669b556)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Resolve several conflicts

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

KshitijLakhani and others added 16 commits July 20, 2025 12:43
Signed-off-by: Kshitij Janardan Lakhani <[email protected]>
* Remove GH pinned deps

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* Pin onnxscript

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

---------

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Reset FP8 weight workspace if usages are invalid

Signed-off-by: Tim Moon <[email protected]>
…end` (#1965)

Update utils.py

Fix the condition error of the FP8 attention in `get_attention_backend`

Signed-off-by: yuzhongw-nvidia <[email protected]>
Co-authored-by: Xiaowei Ren <[email protected]>
* exclude 9.10.0/.1 for certain configs

Signed-off-by: Charlene Yang <[email protected]>

* fix kv_channels

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add get_backend to tests

Signed-off-by: Charlene Yang <[email protected]>

* add init files

Signed-off-by: Charlene Yang <[email protected]>

* fix numerics and cuda graph tests

Signed-off-by: Charlene Yang <[email protected]>

* fix jax tests

Signed-off-by: Charlene Yang <[email protected]>

* remove prints

Signed-off-by: Charlene Yang <[email protected]>

* minor changes after renaming

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix import structure and rename get_attention_backends

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix docs and benchmarks

Signed-off-by: Charlene Yang <[email protected]>

* fix get backend calls

Signed-off-by: Charlene Yang <[email protected]>

* Revert "fix get backend calls"

This reverts commit 653cbb51c697bc2f975416bb3aac1d85f76c36dc.
Signed-off-by: Charlene Yang <[email protected]>

* Revert "fix docs and benchmarks"

This reverts commit 98cd52e04ff7c53e26b412195f5744e39f7ed0e9.
Signed-off-by: Charlene Yang <[email protected]>

* fix docs, benchmarks and pre-commit ci

Signed-off-by: Charlene Yang <[email protected]>

* fix dpa/mha flash attn selection

Signed-off-by: Charlene Yang <[email protected]>

* fix rng states

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix ModelConfig

Signed-off-by: Charlene Yang <[email protected]>

* fix backend selection on Ampere

Signed-off-by: Charlene Yang <[email protected]>

* fix issues from last merge

Signed-off-by: Charlene Yang <[email protected]>

* Update tests/pytorch/utils.py

Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>

* remove initialization of rng_states to None

Signed-off-by: Charlene Yang <[email protected]>

* redefine ModelConfig

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix ModelConfig

Signed-off-by: Charlene Yang <[email protected]>

* fix seed for CP tests

Signed-off-by: Charlene Yang <[email protected]>

* Update tests/pytorch/test_sanity.py

Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move fixture from utils to individual tests

Signed-off-by: Charlene Yang <[email protected]>

* fix CI

Signed-off-by: Charlene Yang <[email protected]>

---------

Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <[email protected]>
…ug quantizer (#1963)

* Debug linear layer when saving original input and using debug quantizer

Signed-off-by: Tim Moon <[email protected]>

* Workaround bugs with quantizing with only column-wise usage

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused imports

Signed-off-by: Tim Moon <[email protected]>

* Avoid unnecessary row-wise data

Signed-off-by: Tim Moon <[email protected]>

* Workaround bugs with quantizing with only column-wise usage

FP8 does not support transpose-only cast.

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fixed conflicts

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Minor code refactoring to avoid unnecessary checks

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed typo

Signed-off-by: Oleg Goncharov <[email protected]>

* Fixed dBias accumulation error due to initialization. Minor code refactoring

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Test case to reproduce the init error

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed rowwise dbias error

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changed ptx API

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added a struct for two packed FP8 values

Signed-off-by: Oleg Goncharov <[email protected]>

* Rolled back to scalar code for columnwise scaling due to its better performance

Signed-off-by: Oleg Goncharov <[email protected]>

* Minor corrections

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Rebased on main

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes per code review

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removed constexpr in C++ test suite to build faster

Signed-off-by: Oleg Goncharov <[email protected]>

* Computed activations are now numerically truncated to InputType before scaling. Improved test suite.

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Minor refactoring

Signed-off-by: Oleg Goncharov <[email protected]>

* Minor refactoring

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Modified mismatches checks of MXFP8 to address FP8 numerics

Signed-off-by: Oleg Goncharov <[email protected]>

* Implemented Jeremy's fixes to JAX test suite with an intermediate downcast

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reduced the dims of the test tensors to improve CI runtime

Signed-off-by: Oleg Goncharov <[email protected]>

* Fixed memory alignment issue. Compute dbias without downcast.

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed misaligned memory issue also in gated kernels. Reduced size of MXFP8 gated tests

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Oleg Goncharov <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* fix current device for cuDNN/cuBLAS handles

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unit test

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use weight device and improve tests

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
… L0 (#1990)

Fix current scaling test_helper.py and enable test_helper.py in L0

Signed-off-by: Jeremy Berchtold <[email protected]>
…on-MXFP8 recipes. (#1962)

* add manage_primitives() helper

* disable GEMM primitives for non-MXFP8 recipes

* implement the NVTE_JAX_CUSTOM_CALLS + deprecate NVTE_JAX_CUSTOM_CALLS_RE

* replace NVTE_JAX_CUSTOM_CALLS_RE with NVTE_JAX_CUSTOM_CALLS in TE tests and examples

* fix use_jax_gemm contextmanager

Signed-off-by: Phuong Nguyen <[email protected]>

---------

Signed-off-by: Phuong Nguyen <[email protected]>
Fix cuDNN lib runtime loading and simplify

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Fix cudnn versioning in support in PyTorch DPA and Fused attn

Signed-off-by: Kshitij Janardan Lakhani <[email protected]>
…elism correctly for sequence-parallel inputs (#1980)

* updated GemmPrimitive partitioning rules to explicitly control all-reduce vs. reduce-scatter for sequence-parallelism

Signed-off-by: Alp Dener <[email protected]>

* corrected handling of FSDP sharding for the RHS operand

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use correct logical axes variable to identify sequence-parallel dim in LayerNormDenseGeneral

Signed-off-by: Alp Dener <[email protected]>

* fixed linting issues

Signed-off-by: Alp Dener <[email protected]>

* added assert on sequence-parallel options when GemmPrimitive is disabled

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alp Dener <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* optimize static grad outputs

Signed-off-by: Robin Zhang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Robin Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <[email protected]>
@wangye805 wangye805 force-pushed the yewang12/ifu_release_v2.6_rocm branch from a90850f to 97556c6 Compare January 9, 2026 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.