-
Notifications
You must be signed in to change notification settings - Fork 22
IFU release v2.6 #406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
wangye805
wants to merge
17
commits into
release_v2.6_rocm
Choose a base branch
from
yewang12/ifu_release_v2.6_rocm
base: release_v2.6_rocm
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
IFU release v2.6 #406
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Kshitij Janardan Lakhani <[email protected]>
* Remove GH pinned deps Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Pin onnxscript Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Reset FP8 weight workspace if usages are invalid Signed-off-by: Tim Moon <[email protected]>
…end` (#1965) Update utils.py Fix the condition error of the FP8 attention in `get_attention_backend` Signed-off-by: yuzhongw-nvidia <[email protected]> Co-authored-by: Xiaowei Ren <[email protected]>
* exclude 9.10.0/.1 for certain configs Signed-off-by: Charlene Yang <[email protected]> * fix kv_channels Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add get_backend to tests Signed-off-by: Charlene Yang <[email protected]> * add init files Signed-off-by: Charlene Yang <[email protected]> * fix numerics and cuda graph tests Signed-off-by: Charlene Yang <[email protected]> * fix jax tests Signed-off-by: Charlene Yang <[email protected]> * remove prints Signed-off-by: Charlene Yang <[email protected]> * minor changes after renaming Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix import structure and rename get_attention_backends Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix docs and benchmarks Signed-off-by: Charlene Yang <[email protected]> * fix get backend calls Signed-off-by: Charlene Yang <[email protected]> * Revert "fix get backend calls" This reverts commit 653cbb51c697bc2f975416bb3aac1d85f76c36dc. Signed-off-by: Charlene Yang <[email protected]> * Revert "fix docs and benchmarks" This reverts commit 98cd52e04ff7c53e26b412195f5744e39f7ed0e9. Signed-off-by: Charlene Yang <[email protected]> * fix docs, benchmarks and pre-commit ci Signed-off-by: Charlene Yang <[email protected]> * fix dpa/mha flash attn selection Signed-off-by: Charlene Yang <[email protected]> * fix rng states Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ModelConfig Signed-off-by: Charlene Yang <[email protected]> * fix backend selection on Ampere Signed-off-by: Charlene Yang <[email protected]> * fix issues from last merge Signed-off-by: Charlene Yang <[email protected]> * Update tests/pytorch/utils.py Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * remove initialization of rng_states to None Signed-off-by: Charlene Yang <[email protected]> * redefine ModelConfig Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ModelConfig Signed-off-by: Charlene Yang <[email protected]> * fix seed for CP tests Signed-off-by: Charlene Yang <[email protected]> * Update tests/pytorch/test_sanity.py Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move fixture from utils to individual tests Signed-off-by: Charlene Yang <[email protected]> * fix CI Signed-off-by: Charlene Yang <[email protected]> --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <[email protected]>
…ug quantizer (#1963) * Debug linear layer when saving original input and using debug quantizer Signed-off-by: Tim Moon <[email protected]> * Workaround bugs with quantizing with only column-wise usage Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: Tim Moon <[email protected]> * Avoid unnecessary row-wise data Signed-off-by: Tim Moon <[email protected]> * Workaround bugs with quantizing with only column-wise usage FP8 does not support transpose-only cast. Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fixed conflicts Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor code refactoring to avoid unnecessary checks Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed typo Signed-off-by: Oleg Goncharov <[email protected]> * Fixed dBias accumulation error due to initialization. Minor code refactoring Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Test case to reproduce the init error Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed rowwise dbias error Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed ptx API Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added a struct for two packed FP8 values Signed-off-by: Oleg Goncharov <[email protected]> * Rolled back to scalar code for columnwise scaling due to its better performance Signed-off-by: Oleg Goncharov <[email protected]> * Minor corrections Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Rebased on main Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes per code review Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removed constexpr in C++ test suite to build faster Signed-off-by: Oleg Goncharov <[email protected]> * Computed activations are now numerically truncated to InputType before scaling. Improved test suite. Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor refactoring Signed-off-by: Oleg Goncharov <[email protected]> * Minor refactoring Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Modified mismatches checks of MXFP8 to address FP8 numerics Signed-off-by: Oleg Goncharov <[email protected]> * Implemented Jeremy's fixes to JAX test suite with an intermediate downcast Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reduced the dims of the test tensors to improve CI runtime Signed-off-by: Oleg Goncharov <[email protected]> * Fixed memory alignment issue. Compute dbias without downcast. Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed misaligned memory issue also in gated kernels. Reduced size of MXFP8 gated tests Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Oleg Goncharov <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* fix current device for cuDNN/cuBLAS handles Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add unit test Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use weight device and improve tests Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
… L0 (#1990) Fix current scaling test_helper.py and enable test_helper.py in L0 Signed-off-by: Jeremy Berchtold <[email protected]>
…on-MXFP8 recipes. (#1962) * add manage_primitives() helper * disable GEMM primitives for non-MXFP8 recipes * implement the NVTE_JAX_CUSTOM_CALLS + deprecate NVTE_JAX_CUSTOM_CALLS_RE * replace NVTE_JAX_CUSTOM_CALLS_RE with NVTE_JAX_CUSTOM_CALLS in TE tests and examples * fix use_jax_gemm contextmanager Signed-off-by: Phuong Nguyen <[email protected]> --------- Signed-off-by: Phuong Nguyen <[email protected]>
Fix cuDNN lib runtime loading and simplify Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Fix cudnn versioning in support in PyTorch DPA and Fused attn Signed-off-by: Kshitij Janardan Lakhani <[email protected]>
…elism correctly for sequence-parallel inputs (#1980) * updated GemmPrimitive partitioning rules to explicitly control all-reduce vs. reduce-scatter for sequence-parallelism Signed-off-by: Alp Dener <[email protected]> * corrected handling of FSDP sharding for the RHS operand Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use correct logical axes variable to identify sequence-parallel dim in LayerNormDenseGeneral Signed-off-by: Alp Dener <[email protected]> * fixed linting issues Signed-off-by: Alp Dener <[email protected]> * added assert on sequence-parallel options when GemmPrimitive is disabled Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alp Dener <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* optimize static grad outputs Signed-off-by: Robin Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
…u_release_v2.6_rocm
40b85a9 to
669b556
Compare
a90850f to
97556c6
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
upstream release_v2.6 (with commit c90a720) IFU based on dev commit (669b556)
Type of change
Changes
Resolve several conflicts
Checklist: