Generation support in TE for Gemma model#2
Open
sudhakarsingh27 wants to merge 276 commits intomainfrom
Open
Conversation
…ax.jit (NVIDIA#785) * fixed static argnums for jax.jit in single gpu encoder test, changed warning filtering for pytest Signed-off-by: Alp Dener <adener@nvidia.com> * propagating the fix to the JAX mnist example Signed-off-by: Alp Dener <adener@nvidia.com> * fixed missing space ibetween flags i QAA scripts Signed-off-by: Alp Dener <adener@nvidia.com> * added TE warnings into the ignore list Signed-off-by: Alp Dener <adener@nvidia.com> --------- Signed-off-by: Alp Dener <adener@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
* Add NVRTC kernels for cast-transpose Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update copyright year Signed-off-by: Tim Moon <tmoon@nvidia.com> * Add noop flag to NVRTC cast-transpose kernel Signed-off-by: Tim Moon <tmoon@nvidia.com> * Apply suggestions from code review Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
) * Support noop concat without providing full tensor Stop storing fused buffers in linear modules. Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug noop cat func Signed-off-by: Tim Moon <tmoon@nvidia.com> * Construct TE modules in tests with correct dtypes Signed-off-by: Tim Moon <tmoon@nvidia.com> * Add tolerances to numerical tests Signed-off-by: Tim Moon <tmoon@nvidia.com> * Use plain PyTorch concat when exporting to ONNX Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
…#780) * Allow multi-dims for dgamma and dbeta in LN descriptor. Signed-off-by: Ming Huang <mingh@nvidia.com> * Fix the jit error in examples/jax Signed-off-by: Ming Huang <mingh@nvidia.com> --------- Signed-off-by: Ming Huang <mingh@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
* Remove unnecessary Pylint overrides Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fixes to lint Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
* combined layernorm_geglu with layernorm_gelu into fused_layernorm Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * fixes to pass all unit tests in test_custom_call_compute.py, test_layer.py, and test_praxis_layer.py Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * cleaning and formatting Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * renaming based on reviewers suggestions Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * implemented partial fused layernorm Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * geglu + bias passed tests Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * added partial fused calculation for dbias_1 Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * clean up Co-authored-by: Alp Dener <adener@nvidia.com> Signed-off-by: Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com> --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> Signed-off-by: Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com> Co-authored-by: Alp Dener <adener@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
* Try using global buffer for cu_seqlens Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Avoid using functools.lru_cache Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Added HF Nanotron to integrations and updated GTC 24 video to ondemand link Signed-off-by: Santosh Bhavani <santosh@semantic.md> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
* Implemented swiglu and silu Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * Renamed nvte-*silu to nvte-*swish + generalized GetDBiasDact functions Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
* make FusedAttn with CP support bias Signed-off-by: Xiaowei Ren <xren@nvidia.com> * assert Alibi cannot work with CP Signed-off-by: Xiaowei Ren <xren@nvidia.com> * syntax fix Signed-off-by: Xiaowei Ren <xren@nvidia.com> * fix variable name Signed-off-by: Xiaowei Ren <xren@nvidia.com> * fix tensor shapes Signed-off-by: Xiaowei Ren <xren@nvidia.com> * a typo fix Signed-off-by: Xiaowei Ren <xren@nvidia.com> * fix bias indexing for CP Signed-off-by: Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by: Xiaowei Ren <xren@nvidia.com> * add attn bias tests Signed-off-by: Xiaowei Ren <xren@nvidia.com> * change dbias update location Signed-off-by: Xiaowei Ren <xren@nvidia.com> * fix CP test model configs Signed-off-by: Xiaowei Ren <xren@nvidia.com> * change CP test sequence length Signed-off-by: Xiaowei Ren <xren@nvidia.com> * make AttnFuncWithCP support qkv format of sbhd Signed-off-by: Xiaowei Ren <xren@nvidia.com> * make sure qkv are contiguous for CP in cuDNN fused attn Signed-off-by: Xiaowei Ren <xren@nvidia.com> * change assert message Signed-off-by: Xiaowei Ren <xren@nvidia.com> * fix code format Signed-off-by: Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by: Xiaowei Ren <xren@nvidia.com> Co-authored-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
* Add support for MoE with FP8. Signed-off-by: Dennis Liu <denliu@nvidia.com> * Fix unittest. Signed-off-by: Dennis Liu <denliu@nvidia.com> * Fix error in linear backward. Signed-off-by: Dennis Liu <denliu@nvidia.com> --------- Signed-off-by: Dennis Liu <denliu@nvidia.com> Co-authored-by: Przemyslaw Tredak <ptredak@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
* Add module level filter for deprecation warning in common Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix module Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
remove tp_size/tp_group as amax reduction is handled by fp8_group() Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
…IDIA#799) restrict context parallel tests to sm80+ as fused/flash attn backends require sm80+ Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
* Fix linter warnings from unused args Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update .gitignore Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
* Added pull request template Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Changes from the review Signed-off-by: Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
…nite scale (NVIDIA#786) * Handle the scaling factor when amax is too tiny that leads to an infinite scale Signed-off-by: Jinze Xue <jinzex@nvidia.com> * revert formatting changes Signed-off-by: Jinze Xue <jinzex@nvidia.com> * fix comments Signed-off-by: Jinze Xue <jinzex@nvidia.com> * Apply review suggestion Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Jinze Xue <155670984+jinzex@users.noreply.github.com> * Apply review suggestion Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Jinze Xue <155670984+jinzex@users.noreply.github.com> * Apply review suggestion Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Jinze Xue <155670984+jinzex@users.noreply.github.com> * apply review suggestion Signed-off-by: Jinze Xue <jinzex@nvidia.com> * add test_recipe.py to qa/L0_pytorch_unittest/test.sh; fix unittest for is_first_microbatch=False Signed-off-by: Jinze Xue <jinzex@nvidia.com> * revert changes to update_weight_scale_inv Signed-off-by: Jinze Xue <jinzex@nvidia.com> * Debug test failures Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Jinze Xue <jinzex@nvidia.com> Signed-off-by: Jinze Xue <155670984+jinzex@users.noreply.github.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: Jinze Xue <jinzex@nvidia.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
…> 1 on Paxml. (NVIDIA#774) * Support FP8 Meta Dtype (FM32) and Align FP8 Scale Update with PyTorch. Signed-off-by: Ming Huang <mingh@nvidia.com> * Modify with the feedback of code review Signed-off-by: Ming Huang <mingh@nvidia.com> * Hiding FlaxFloatMeta32 inside fp8.py Signed-off-by: Ming Huang <mingh@nvidia.com> * Make functions to be JAX tracable objects. Signed-off-by: Ming Huang <mingh@nvidia.com> * Rebased with mian. Signed-off-by: Ming Huang <mingh@nvidia.com> * Update jax images for github workflow. Signed-off-by: Ming Huang <mingh@nvidia.com> --------- Signed-off-by: Ming Huang <mingh@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
* initialize tp_group for FP8 DPA Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix cuDNN version in unit tests for cuDNN v9 Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add hook to ignore missing fused_attn._extra_states if training from old checkpoints Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove test and redundant implementation from last commit Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove warning message and replace with docstring Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove tp_size/tp_group in FusedAttention; amax reduction is handled with fp8_group Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * move core_attention.fused_attention._extra_state to core_attention._extra_state Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * simplify post_state_dict_hooks between FU and DPA Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add temporary test Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove previous attempts to move core_attention.fused_attention to core_attention; keep the test Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove the test Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disable pylint self arg for hook which is required by hook Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
* Add layernorm_fp8_dot unit test Signed-off-by: Reese Wang <rewang@nvidia.com> * Update the softmax primitives support conditions Signed-off-by: Reese Wang <rewang@nvidia.com> * Add tests for the softmax primitives Signed-off-by: Reese Wang <rewang@nvidia.com> * Round1 refactor of test_layer Signed-off-by: Reese Wang <rewang@nvidia.com> * Split dropout arguments of ref code and add hidden/intermediate dropout elementwise comparison Signed-off-by: Reese Wang <rewang@nvidia.com> * Add dropout_braodcast_dim, self_attn_mask tests and clean a few code Signed-off-by: Reese Wang <rewang@nvidia.com> * Abstract test layer and fix a rope reference code diff Signed-off-by: Reese Wang <rewang@nvidia.com> * Add bias tests Signed-off-by: Reese Wang <rewang@nvidia.com> * Add epsilon and float32 tests Signed-off-by: Reese Wang <rewang@nvidia.com> * Add relpos_bias and attention dropout tests Signed-off-by: Reese Wang <rewang@nvidia.com> * Loose the atol Signed-off-by: Reese Wang <rewang@nvidia.com> * Move common fixtures to conftest.py Signed-off-by: Reese Wang <rewang@nvidia.com> * Add doc string for test_layer Signed-off-by: Reese Wang <rewang@nvidia.com> * Add doc string for test_layer Signed-off-by: Reese Wang <rewang@nvidia.com> * Fix conflicts of test_layer Signed-off-by: Reese Wang <rewang@nvidia.com> * Avoid to left bias parameters in graph when use_bias=False Signed-off-by: Reese Wang <rewang@nvidia.com> --------- Signed-off-by: Reese Wang <rewang@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
* templated primitives and respective C++ functions Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * fixes for LayerNormMLP, tests in test_custom_compute all passed Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * added default arg for pybind get_workspace_size funcs Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * fixes for TestTransFormer with non-gated act tests Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * renamed gelu to act Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * improved enum implementation, avoid using magic numbers Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * Exposed C++ ActivationEnum to python side Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * Changed error messages Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * changed conditional check on input shape for dbias_cast_transpose Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * changed dtype (tol) for bias grad tests Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * fixes so that layer_norm_fp8_mlp can take bias = None Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> * Set bias = None in flax modules Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Update FP8 recipe test to handle recipe changes Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Bump FA version to 2.5.8 Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
* fixes for ActLuPrimitive in PAXML * changed indices for arg_infos in sharding func in dbias_cast_transpose primitive --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Reviving NVIDIA#829 but without the tutorial code which is, for now, in a different branch te_gemma_generation_tutorial