Skip to content

Commit 5d8baa4

Browse files
BoxiangWLaplasjan107akoumpako3n1gashors1
authored
Add non-mcore fsdp2 strategy (NVIDIA-NeMo#11525)
* Add fsdp2 strategy Signed-off-by: Boxiang Wang <boxiangw@nvidia.com> * Apply isort and black reformatting Signed-off-by: BoxiangW <BoxiangW@users.noreply.github.com> * Add imports Signed-off-by: Boxiang Wang <boxiangw@nvidia.com> * Apply isort and black reformatting Signed-off-by: BoxiangW <BoxiangW@users.noreply.github.com> * Add init import Signed-off-by: Boxiang Wang <boxiangw@nvidia.com> * Apply isort and black reformatting Signed-off-by: BoxiangW <BoxiangW@users.noreply.github.com> * Fix mixtral export for NeMo 2.0 (#11532) * Initial commit Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> --------- Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com> Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com> Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com> * Make HFDatasetDataModule a datasets.load_dataset wrapper (#11500) * Make HfDatasetDataModule a datasets.load_dataset wrapper Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add logging Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Update HFDatasetDataModule Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * refactor Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * refactor fixup Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * refactor fixup #2 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * do not expand Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * doc Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * doc Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add synonym Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * typo Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * Add train/val/test attributes Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Add test for hf-datamodule Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Import lazily to avoid breaking with older megatron versions Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * bot happy Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * bot happy2 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add doc-strings and collate-fn arg Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * ci: Bump release workflow (#11544) Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * ci: Use SHA for cut-off (#11545) Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * link to mcore documentation (#11538) Signed-off-by: ashors1 <ashors@nvidia.com> * ci: Adjust inputs for code-freeze workflow (#11550) Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * ci: Bump release freeze (#11551) Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * Ko3n1g/ci/commit sha for cutoff (#11553) * ci: Remove token from checkout Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * bump version Signed-off-by: Oliver Koenig <okoenig@nvidia.com> --------- Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * ci: Bump code-freeze workflow (#11554) Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * ci: Bump code freeze workflow (#11557) Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * Fix deploy conflicts in llm.api (#11367) * Fix llm.deploy api Signed-off-by: Hemil Desai <hemild@nvidia.com> * fix Signed-off-by: Hemil Desai <hemild@nvidia.com> * fix Signed-off-by: Hemil Desai <hemild@nvidia.com> * fix Signed-off-by: Hemil Desai <hemild@nvidia.com> * fix Signed-off-by: Hemil Desai <hemild@nvidia.com> * fix Signed-off-by: Hemil Desai <hemild@nvidia.com> * Apply isort and black reformatting Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> * PR feedback Signed-off-by: Hemil Desai <hemild@nvidia.com> * fix Signed-off-by: Hemil Desai <hemild@nvidia.com> --------- Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: hemildesai <hemildesai@users.noreply.github.com> Co-authored-by: hemildesai <hemildesai@users.noreply.github.com> * perf summary docs link (#11262) Signed-off-by: Malay Nagda <malayn@nvidia.com> Co-authored-by: oliver könig <okoenig@nvidia.com> * Add vlm nemo run scripts (#11394) * update recipe Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix mllama mock ds Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to use attention bias Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * remove example Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix docstring mock.py Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix docstring language.py Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix docstring language.py Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix docstring mllama/base.py Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix docstring mllama/language.py Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * bump mcore Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * Add scripts for mllama Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * update script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix pylint Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * revert Dockerfile.ci Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * add scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add vlm training test in ci Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix docstring issues Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update script match recipe Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update recipes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update mllama_train.py Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * update mllama 90b recipe Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update to use tmp in ci tests Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update default llava config Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * add nemo run scripts Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix vpp issue Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix cicd Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix cicd Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * remove duplicated script Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * ci: Add HF cache Signed-off-by: oliver könig <okoenig@nvidia.com> * update to use SP in recipe Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * upgrade Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Revert "upgrade" This reverts commit f6ad2cd76abcdd9258cb53a25c788fd658189150. * update neva api Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update neva api Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix neva processing Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix lint Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix data fields Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * few fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Signed-off-by: Oliver Koenig <okoenig@nvidia.com> Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Co-authored-by: Oliver Koenig <okoenig@nvidia.com> * Add from_dict to HFDatasetDataModule (#11559) * Add from_dict method Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add test_load_from_dict Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add test_load_from_dict Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * Prevent llama3.1 from using Linear interpolation (#11548) * prevent llama3.1 from using linear interpolation * Apply isort and black reformatting Signed-off-by: suiyoubi <suiyoubi@users.noreply.github.com> --------- Signed-off-by: suiyoubi <suiyoubi@users.noreply.github.com> Co-authored-by: suiyoubi <suiyoubi@users.noreply.github.com> * [TTS] Add audio and mel codec HF models to docs (#11526) Signed-off-by: Ryan <rlangman@nvidia.com> * Update for NEST release (#11537) * update for nest release Signed-off-by: stevehuang52 <heh@nvidia.com> * make pylint happier Signed-off-by: stevehuang52 <heh@nvidia.com> * fix for lhotse dataloader Signed-off-by: stevehuang52 <heh@nvidia.com> * update yaml Signed-off-by: stevehuang52 <heh@nvidia.com> * minor refactor Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> * Merging SpeechLLM development branch (#11462) * Port changes related to SFT text+speech dataloading Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Revert changes from Canary(nonLLM) code Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Add joint text/audio dataloading capability to speechllm Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * include text-only into fprop of training and eval; TODO: text-only predict Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * Actually working forward step Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Support for source-target text file pair training for MT+speech Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Include supervision text tokens in audio example's num tokens Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Disable conformer seq len NCCL sync Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Preliminary sampler fusion stragies support: mux/zip/round_robin/randomized_round_robin Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Working V2 version of multimodal dataloading. Each modality gets its own batch settings that can be merged with zip sampler to enjoy max batch sizes for both modalities in a single training step. Each modality runs fwd+bwd in turn to save GPU memory (instead of running fwd separately and bwd together). Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Add missing config Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Revert multimodal grad accum and fix mask padding issue Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Add modality weights support via cfg.model.modality_weights Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Fix for V2 dataloader shuffling CRITICAL Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Restore multimodal grad accum Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Fix unit tests for multi-sampler configurations Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply isort and black reformatting Signed-off-by: pzelasko <pzelasko@users.noreply.github.com> * nemo gemma to hf conversion (#9629) * adding script for gemma nemo to hf Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> * adding verification for convert_gemma_nemo_to_hf Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> * Apply isort and black reformatting Signed-off-by: krishnacpuvvada <krishnacpuvvada@users.noreply.github.com> --------- Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: krishnacpuvvada <krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: krishnacpuvvada <krishnacpuvvada@users.noreply.github.com> * support FSDP (thank Yifan for early trying) (#10062) Note: as of now, this is still not fully working on the cluster. See above doc for details. Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * Fix unit tests after rebasing on recent main Signed-off-by: Piotr Żelasko <petezor@gmail.com> * support megatron_amp_O2 and tp (#10599) * Port changes related to SFT text+speech dataloading Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Revert changes from Canary(nonLLM) code Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Add joint text/audio dataloading capability to speechllm Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * include text-only into fprop of training and eval; TODO: text-only predict Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * Actually working forward step Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Support for source-target text file pair training for MT+speech Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Include supervision text tokens in audio example's num tokens Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Disable conformer seq len NCCL sync Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Preliminary sampler fusion stragies support: mux/zip/round_robin/randomized_round_robin Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Working V2 version of multimodal dataloading. Each modality gets its own batch settings that can be merged with zip sampler to enjoy max batch sizes for both modalities in a single training step. Each modality runs fwd+bwd in turn to save GPU memory (instead of running fwd separately and bwd together). Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Add missing config Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Revert multimodal grad accum and fix mask padding issue Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Add modality weights support via cfg.model.modality_weights Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Fix for V2 dataloader shuffling CRITICAL Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Restore multimodal grad accum Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Fix unit tests for multi-sampler configurations Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply isort and black reformatting Signed-off-by: pzelasko <pzelasko@users.noreply.github.com> * nemo gemma to hf conversion (#9629) * adding script for gemma nemo to hf Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> * adding verification for convert_gemma_nemo_to_hf Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> * Apply isort and black reformatting Signed-off-by: krishnacpuvvada <krishnacpuvvada@users.noreply.github.com> --------- Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: krishnacpuvvada <krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: krishnacpuvvada <krishnacpuvvada@users.noreply.github.com> * support FSDP (thank Yifan for early trying) Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * debug TP deadlock Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * some fixes for fsdp and tp /lustre/fsw/portfolios/llmservice/users/zhehuaic/results/canary-v0_speechllm/prompt_lhmerge5_p2b_oci_FC-GPT_llama_canaryset_b6s4kf-sunolong_noCC_langtemp0.5_dsettemp0.5_lr1e-4wd1e-3_CosineAnnealing_warmup2500_minlr1e-6_gbs2048_mbs16_ep200/error-1417621-0.out /lustre/fsw/portfolios/llmservice/users/zhehuaic/results/canary-v0_speechllm/prompt_lhmerge5_p2b_tp_oci_FC-GPT_llama_canaryset_b6s4kf-sunolong_noCC_langtemp0.5_dsettemp0.5_lr1e-4wd1e-3_CosineAnnealing_warmup2500_minlr1e-6_gbs128_mbs16_ep200/error-1421103-3.out Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * nit fix Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * fix for llama3.1 Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * for llama3.1 Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * fix for inference Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * fix inference Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * fix grad accu Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * fix inference Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * initial impl to support megatron_amp_O2 in salm, bestow, salm-t5 Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> --------- Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: pzelasko <pzelasko@users.noreply.github.com> Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: krishnacpuvvada <krishnacpuvvada@users.noreply.github.com> Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> Co-authored-by: pzelasko <pzelasko@users.noreply.github.com> Co-authored-by: Krishna Puvvada <93558329+krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: krishnacpuvvada <krishnacpuvvada@users.noreply.github.com> * minor change in dataloader (#10601) * Speechllm dataset basic unit test (#10631) * Basic unit test for speechllm lhotse dataset Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * cleanup Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> --------- Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Unit test for existing speechllm dataset with llama2 prompt format (#10634) Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * [speechllm] Replace TextProcessing with PromptFormatter (#10639) * [speechllm] Replace TextProcessing with PromptFormatter Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Test for tokens_to_generate Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Padding optimization for speechlm dataset Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> --------- Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Multimodal conversation format dataloading (#10683) * Draft implementation of NeMo Multimodal Conversation format Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fully working data parsing and iteration Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fully working dataloading with tokenization + prompting Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Collapse consecutive user turns into single turn Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * a few fixes for the new prompt template based dataloader and lora+distributed fused adam (#10701) * Draft implementation of NeMo Multimodal Conversation format Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fully working data parsing and iteration Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fully working dataloading with tokenization + prompting Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Collapse consecutive user turns into single turn Signed-off-by: Piotr Żelasko <petezor@gmail.com> * compatible with previous expts Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * support gemma Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * handle the case max_seq_length is smaller than input_id length Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * fix max seq case Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * fix lora ckpt storing and loading Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * temp fix for distributed fused adam Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * revert changes in nemo_adapters.py Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * Fix tokenize_with_prompt Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> Co-authored-by: Piotr Żelasko <petezor@gmail.com> * Mechanism to insert BOS/EOS at the beginning/end of dialog (#10923) * Mechanism to insert BOS/EOS at the beginning/end of dialog Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix Gemma prompt formatter test Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Add a test specifically for multiturn insertion of bos/eos Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Add options to override default map/iterable dataset style selection in lhotse dataloader Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Feature/conversations tarred (#11086) * Multimodal conversation tarring script Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix sharding logic Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix dir creation Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * EMMeTT support in SpeechLLM + tutorial for Lhotse Multimodal Dataloading (#10927) * Preliminary support for oomptimizer Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * OOMptimizer for SpeechLLM Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Initial version of estimate token bins script Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Initial support for multimodal 2d bucketing Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Extend to text-to-text oomptimizer Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Preliminary support for Llama2 prompt format in ast+mt Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Support for 1D estimate token bins Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Support for 1D estimate token bins Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Minor tweaks Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Add min/max tokens filter Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Change to bisect_left for bucket idx selection Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Add reconfigure_num_microbatches_calculator at the start of train epoch for modular models Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Update lhotse multi-sampler config and make validation datasets finite Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Initial implementation of text+audio training for T5 modular models Signed-off-by: Piotr Żelasko <petezor@gmail.com> * megatron t5 nmt prompt formatter Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fixes for MT+AST T5 oomptimizer and training Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * configs, fixes, token-per-token filtering * Support text modality in predict_step Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Support text data in val/test dl Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix infinite Signed-off-by: Piotr Żelasko <petezor@gmail.com> * prompt format fixes Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fixes in audio supervision Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * remove superficial padding Signed-off-by: Piotr Żelasko <petezor@gmail.com> * test config and prompt context fetching fixes Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * support text-only decoding for salm/bestow Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Add unit tests for EMMETT / refactor prompt_format_fn Signed-off-by: Piotr Żelasko <petezor@gmail.com> * make t5nmt prompt formatter auto discoverable Signed-off-by: Piotr Żelasko <petezor@gmail.com> * include token count / tpt filtering in estimate_token_bins Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix max token filter Signed-off-by: Piotr Żelasko <petezor@gmail.com> * some fixes Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * custom mixin for text adapters Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Warmup in oomptimizer-speechlm Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Move oomptimizer-speechllm to separate directory Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Initial cleanup Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Refactoring of prompt format fn and length measurement and filtering for data types; improved unit test coverage Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Refactor sampler constraints / filters into sampling.py Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Tests and support for sampler length measurement of multimodal conversations Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Update estimate_token_bins.py Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move estimate_token_bins.py to speech_llm scripts Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Minor tweaks Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fixes for SpeechLLM dataset Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply isort and black reformatting Signed-off-by: pzelasko <pzelasko@users.noreply.github.com> * Add missing emmett tests Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Add tutorial about multimodal lhotse dataloading Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Updated documentation for multimodal dataloading Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Prompt Formatter tutorial Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Review comments Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fixes for sampling filters None values Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Changes requested by Steve: moving some args to main config namespace in multi config sampler Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Update default configs to the modified config schema Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix omegaconf use issue Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Update the docs to the modified multi config format Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: pzelasko <pzelasko@users.noreply.github.com> Co-authored-by: pzelasko <pzelasko@users.noreply.github.com> * Remove old TODO comments Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Remove prompts/fn.py Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Copyright notices Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Make linter happy Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Make linter happy Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix megatron test Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix megatron test Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Disable plugin for high entropy strings in secrets detector Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix CodeQL errors Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix unit tests Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix another unit test Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix multimodal tests Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * Apply isort and black reformatting Signed-off-by: pzelasko <pzelasko@users.noreply.github.com> * fixes after merging canary2 pr to main Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix headers Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix canary integration test + formatting Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Address reviews - add sync_max_audio_length flag for conformer encoder Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Revert change in secrets detector Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Revert change in secrets detector Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Revert change in secrets detector Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Address code review Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Address Steve's review Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: pzelasko <pzelasko@users.noreply.github.com> Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: krishnacpuvvada <krishnacpuvvada@users.noreply.github.com> Co-authored-by: zhehuaichen <dian.chenzhehuai@gmail.com> Co-authored-by: pzelasko <pzelasko@users.noreply.github.com> Co-authored-by: Krishna Puvvada <93558329+krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: krishnacpuvvada <krishnacpuvvada@users.noreply.github.com> Co-authored-by: zhehuaichen <139396994+zhehuaichen@users.noreply.github.com> * Sync validation metrics for ASRModel (#11533) * Sync validation metrics for ASRModel Signed-off-by: Piotr Żelasko <petezor@gmail.com> * support sync for single-dataloader case Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * NeMo 2.0 In-framework deployment support (#11523) * nemo 2 support Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Remove unwanted params in DDP init in Megatron Parallel Signed-off-by: Hemil Desai <hemild@nvidia.com> * nemo2 working with query Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> * multigpu deployment with nemo2 works Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> * add max output lenght Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Remove prints Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * Fix merge conflicts Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> * readded this file Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> --------- Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com> Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> Co-authored-by: Hemil Desai <hemild@nvidia.com> Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com> * Add SFT/PEFT HF tests (#11519) * Add SFT/PEFT HF tests Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move hf examples to examples dir Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * bot Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use mini_squad Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use mini_squad Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * add 2gpu DDP Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * refactor Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use labels as passed by the user Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update samples/ tests Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * rm unused imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Add tests with subset split names, e.g. train[:100] Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * add --disable-ckpt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use self-hosted-azure-gpus-1 for single-gpu test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Add TRANSFORMERS_OFFLINE=1 to hf tests Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * Fix typo: LocalNonpersitentObject -> LocalNonpersistentObject (#11546) Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com> * Adding documentation for packed dataset preparation with context para… (#11564) * adding documentation for packed dataset preparation with context parallel Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com> * addressing Anna Shor's comment Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com> --------- Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com> * have micro_batch_size and global_batch_size as class attributes in mock datamodule (#11563) * Revert "Fix the names of two sets of weight and bias in mcore_to_nemo_mapping" (#11560) * Revert "Fix the names of two sets of weight and bias in mcore_to_nemo_mapping (#9628)" This reverts commit 6784db56a03f19f37bc4f37bdf87dabb3fc1acee. * keep underscores Signed-off-by: ashors1 <ashors@nvidia.com> --------- Signed-off-by: ashors1 <ashors@nvidia.com> * add huggingface-based tokenizer support for mixtral HF -> .nemo (#11572) * add huggingface-based tokenizer support Signed-off-by: dimapihtar <dpihtar@gmail.com> * Apply isort and black reformatting Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com> --------- Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com> * Github Actions tests for Llava Next and modify pretrain recipe to have language model path (#11424) * modified pretrain recipe to have language_model_from_pretrained * ci test for llava next * fixed indent/lint issue in cicd yml file * fix lint issues * Apply isort and black reformatting Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> * Update .github/workflows/cicd-main.yml Co-authored-by: oliver könig <okoenig@nvidia.com> Signed-off-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com> * Update .github/workflows/cicd-main.yml Co-authored-by: oliver könig <okoenig@nvidia.com> Signed-off-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com> --------- Signed-off-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> Signed-off-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com> Co-authored-by: yashaswikarnati <yashaswikarnati@users.noreply.github.com> Co-authored-by: oliver könig <okoenig@nvidia.com> * Fix SingleDeviceStrategy support in Nsys callback (#11574) * fix for SingleDeviceStrategy Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * mini refactor Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * typo Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * remove dialogue scripts and docs (#11577) * remove deprecated scripts Signed-off-by: dimapihtar <dpihtar@gmail.com> * remove deprecated docs Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: dimapihtar <dpihtar@gmail.com> * add JitTransform (#11131) * add JitTransform Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fixes Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add JiT CB test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove stale imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * typo Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * cleanup Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add jit callback test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * fix param passing Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use sgd in test_nemo_jit_cb Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add thunder call Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * Use .compile method to avoid changing module structure Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * Use JitConfig Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * thunder setting Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * avoid reentry Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove optional Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * rewrite Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * refactor & module_selector Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * NeMo 2.0 documentation upgrade (#11235) * update attention Signed-off-by: dimapihtar <dpihtar@gmail.com> * update docs to NeMo 2.0 Signed-off-by: dimapihtar <dpihtar@gmail.com> * update usage Signed-off-by: dimapihtar <dpihtar@gmail.com> * update parallelism Signed-off-by: dimapihtar <dpihtar@gmail.com> * update parallelism docs Signed-off-by: dimapihtar <dpihtar@gmail.com> * update parallelism docs Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix style Signed-off-by: dimapihtar <dpihtar@gmail.com> * update to NeMo 2.0 Signed-off-by: dimapihtar <dpihtar@gmail.com> * NeMo 2.0 update Signed-off-by: dimapihtar <dpihtar@gmail.com> * NeMo 2.0 update Signed-off-by: dimapihtar <dpihtar@gmail.com> * remove deprecated file Signed-off-by: dimapihtar <dpihtar@gmail.com> * update in respect to NeMo 2.0 Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix hyperlinks Signed-off-by: dimapihtar <dpihtar@gmail.com> * remove deprecated Signed-off-by: dimapihtar <dpihtar@gmail.com> * remove deprecated Signed-off-by: dimapihtar <dpihtar@gmail.com> * update documentation to NeMo 2.0 Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix typo Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix punctuation Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: dimapihtar <dpihtar@gmail.com> * Remove auto-import of lhotse when importing nemo.collections.common.data (#11578) * Remove auto-import of lhotse when importing nemo.collections.common.data Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix test import Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Fix example configs (#11571) * Fix example configs Signed-off-by: Boxiang Wang <boxiangw@nvidia.com> * Fix line length Signed-off-by: Boxiang Wang <boxiangw@nvidia.com> --------- Signed-off-by: Boxiang Wang <boxiangw@nvidia.com> * fix (#11575) Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * NIM supporting changes for nemo.export for NeMo 2.0 (#11488) * Move torch_dtype_from_precision for independent export module Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Apply isort and black reformatting Signed-off-by: janekl <janekl@users.noreply.github.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Remove unused imports Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix too long lines Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Apply isort and black reformatting Signed-off-by: janekl <janekl@users.noreply.github.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix signature and default for megatron_amp_O2 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: janekl <janekl@users.noreply.github.com> Co-authored-by: Bobby Chen <bobchen@nvidia.com> Co-authored-by: janekl <janekl@users.noreply.github.com> * AED greedy confidence estimation (#11573) * upload Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * Apply isort and black reformatting Signed-off-by: GNroy <GNroy@users.noreply.github.com> * set prompt confidence dtype at initialization Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Signed-off-by: GNroy <GNroy@users.noreply.github.com> Co-authored-by: GNroy <GNroy@users.noreply.github.com> * gemma fix (#11587) * Update T5 DataModule regarding Pretrain/Finetune validate (#11584) * update datamodule to have mbs/gbs * update datamodule to have mbs/gbs * Apply isort and black reformatting Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com> --------- Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com> Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com> Co-authored-by: huvunvidia <huvunvidia@users.noreply.github.com> * fix llama3 (#11580) * Add Hf nemorun tests (#11566) * minor fixes for recipe Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * add peft nemorun script Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * add sft script and data module Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * Apply isort and black reformatting Signed-off-by: HuiyingLi <HuiyingLi@users.noreply.github.com> * clean up Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * add disable ckpt and data config for tests Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * Apply isort and black reformatting Signed-off-by: HuiyingLi <HuiyingLi@users.noreply.github.com> * add tests to cicd yaml Signed-off-by: HuiyingLi <willwin.lee@gmail.com> * cleanup Signed-off-by: HuiyingLi <willwin.lee@gmail.com> --------- Signed-off-by: HuiyingLi <willwin.lee@gmail.com> Signed-off-by: HuiyingLi <HuiyingLi@users.noreply.github.com> Co-authored-by: HuiyingLi <HuiyingLi@users.noreply.github.com> * [🤖]: Howdy folks, let's bump NeMo-Toolkit to `2.2.0rc0` ! (#11555) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Pass the number of experts to modelopt layer spec (#11607) * Pass number of experts to modelopt layer spec Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix too long lines Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Adding changes to asr documentation (#11397) Signed-off-by: Ssofja <sofiakostandian@gmail.com> * Support Cosmos tokenizer TensorRT inference (#11472) * Add cosmos TRT * Add trt run script * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Clean code * Fix CodeQL --------- Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> Co-authored-by: meatybobby <meatybobby@users.noreply.github.com> * Neva updates to latest mcore and some fixes (#11565) * api updates and fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix arg Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com> * add nemo2-sft-peft to readme (#11613) Signed-off-by: Huiying Li <willwin.lee@gmail.com> * Set Minitron width pruning batch size 1 (#11603) Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> * Disable CP for running Inference using megatron_gpt_eval (#11547) * Disable CP for megatron_gpt_eval * Apply isort and black reformatting Signed-off-by: suiyoubi <suiyoubi@users.noreply.github.com> * Update examples/nlp/language_modeling/megatron_gpt_eval.py Co-authored-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Ao Tang <mike.tang96@gmail.com> --------- Signed-off-by: suiyoubi <suiyoubi@users.noreply.github.com> Signed-off-by: Ao Tang <mike.tang96@gmail.com> Co-authored-by: suiyoubi <suiyoubi@users.noreply.github.com> Co-authored-by: Chen Cui <chcui@nvidia.com> * ci: Add `no-fail-fast` mode (#11608) Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * Chat dataset support (#11423) * chat dataset support Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * add ci test Signed-off-by: Chen Cui <chcui@nvidia.com> * address comment Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * address comment Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com> * Sortformer Diarizer 4spk v1 model PR Part 2: Unit-tests for Sortformer Diarizer. (#11336) * Adding the first pr files models and dataset Signed-off-by: taejinp <tango4j@gmail.com> * Tested all unit-test files Signed-off-by: taejinp <tango4j@gmail.com> * Name changes on yaml files and train example Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Reflecting comments and removing unnecessary parts for this PR Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Adding docstrings to reflect the PR comments Signed-off-by: taejinp <tango4j@gmail.com> * removed the unused find_first_nonzero Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Fixed all pylint issues Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Resolving pylint issues Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Removing unused varialbe in audio_to_diar_label.py Signed-off-by: taejinp <tango4j@gmail.com> * Fixed docstrings in training script Signed-off-by: taejinp <tango4j@gmail.com> * Line-too-long issue from Pylint fixed Signed-off-by: taejinp <tango4j@gmail.com> * Adding get_subsegments_scriptable to prevent jit.script error Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Addressed Code-QL issues Signed-off-by: taejinp <tango4j@gmail.com> * Resolved conflicts on bce_loss.py Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Adding all the diarization reltated unit-tests Signed-off-by: taejinp <tango4j@gmail.com> * Moving speaker task related unit test files to speaker_tasks folder Signed-off-by: taejinp <tango4j@gmail.com> * Fixed uninit variable issue in bce_loss.py spotted by codeQL Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Fixing code-QL issues Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Reflecting PR comments from weiqingw Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Line too long pylint issue resolved in e2e_diarize_speech.py Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Resovled unused variable issue in model test Signed-off-by: taejinp <tango4j@gmail.com> * Reflecting the comment on Nov 21st 2024. Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Unused variable import time Signed-off-by: taejinp <tango4j@gmail.com> * Adding docstrings to score_labels() function in der.py Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Reflecting comments on YAML files and model file variable changes. Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Added get_subsegments_scriptable for legacy get_subsegment functions Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Resolved line too long pylint issues Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Added training and inference CI-tests Signed-off-by: taejinp <tango4j@gmail.com> * Added the missing parse_func in preprocessing/collections.py Signed-off-by: taejinp <tango4j@gmail.com> * Adding the missing parse_func in preprocessing/collections.py Signed-off-by: taejinp <tango4j@gmail.com> * Fixed an indentation error Signed-off-by: taejinp <tango4j@gmail.com> * Resolved multi_bin_acc and bce_loss issues Signed-off-by: taejinp <tango4j@gmail.com> * Resolved line-too-long for msdd_models.py Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * Code QL issues and fixed test errors Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * line too long in audio_to_diar_label.py Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tango4j <tango4j@users.noreply.github.com> * resolving CICD test issues Signed-off-by: taejinp <tango4j@gmail.com> * Fixing codeQL issues Signed-off-by: taejinp <tango4j@gmail.com> * Fixed pin memory False for inference Signed-off-by: taejinp <tango4j@gmail.com> --------- Signed-off-by: taejinp <tango4j@gmail.com> Signed-off-by: tango4j <tango4j@users.noreply.github.com> Co-authored-by: tango4j <tango4j@users.noreply.github.com> * 2x more memory efficient Graph-based RNN-T (#11169) * Optimized Graph-Transducer implementation Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> --------- Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> * Use explicit subpaths in io for exporting a checkpoint (#11352) * Fix llm.export_ckpt Signed-off-by: Hemil Desai <hemild@nvidia.com> * fix Signed-off-by: Hemil Desai <hemild@nvidia.com> --------- Signed-off-by: Hemil Desai <hemild@nvidia.com> * Remove triton requirement (#11627) * Specify pytorch-triton instead of triton Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove triton Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * ci: Remove comment if no changes required anymore (#11624) Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * Jit with peft (#11586) * move jitransform at the end Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add docstring & post-init Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Add remove_extra_batch_keys and remove align_labels Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Run JitTransform on_train_epoch_start Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add --use-torch-jit option Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add docstrings Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pep8 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * NeMo-UX: add Hf's AutoModelForImageTextToText (#11321) * init commit Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * wip Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * peft examp;le Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * move peft example to multimodal_llm Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * surface HFAutoModelForImageTextToText Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add hf vlm dataset Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move processor Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * train_log -> train_loss Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * vlm.HFDatasetDataModule pass collate_fn as argument Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Update peft example Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * typo Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove unused var Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Move example Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * remove unused Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Small change Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Fix loss calculation Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Add extract_skipped_token_ids Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Use vlm.HFAutoModelForImageTextToText.extract_skipped_token_ids Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Update logits/labels handling Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add trust_remote_code to configure_processor Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * mini refactor Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add LLAMA_TOKENS Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update hf_dataset Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Add lora_dtype for models with non-FP weights Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Add load_in_4bit option Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add default_dtype Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add load_in_4bit to llm collection Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * rm import Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix asset path Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move vlm test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move data offline Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use signel gpu Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pylint fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pylint Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pylint Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * drop align_labels Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove align_labels from llm too Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use loss * mask instead of loss[mask == 1] Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix path Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> * ci: Bump release workflow (#11635) Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * Add fix docstring for speech commands (#11638) Signed-off-by: smajumdar <titu1994@gmail.com> * Fixing Multi_Task_Adapters.ipynb by replacing canary2 with canary_custom (#11641) Signed-off-by: Weiqing Wang <weiqingw@nvidia.com> * fixed config name in online augmentation tutorial (#11628) Signed-off-by: Rauf <rnasretdinov@nvidia.com> * fix default nodes (#11632) * add renormalize_blend_weights param (#11647) Signed-off-by: dimapihtar <dpihtar@gmail.com> * Sortformer Diarizer 4spk v1 model PR Part 3: Speaker Diarization Mixin (#11511) * Adding diarization mixin for one click inference Signed-off-by: taejinp <tango4j@gmail.com> * Apply isort and black reformatting Signed-off-by: tan…
1 parent e04e345 commit 5d8baa4

File tree

11 files changed

+643
-4
lines changed

11 files changed

+643
-4
lines changed

.github/workflows/cicd-main.yml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3701,6 +3701,17 @@ jobs:
37013701
TRANSFORMERS_OFFLINE=1 python tests/collections/llm/hf/sft.py --model /home/TestData/nlp/hf_gemma/hf_gemma_2b --max-steps 10 --devices 2 --strategy ddp
37023702
AFTER_SCRIPT: |
37033703
rm -rf nemo_experiments
3704+
3705+
L2_HF_Transformer_SFT_FSDP2_2gpu:
3706+
needs: [ cicd-test-container-setup ]
3707+
uses: ./.github/workflows/_test_template.yml
3708+
if: contains(fromJSON(needs.cicd-test-container-setup.outputs.test_to_run), 'L2_HF_Transformer_SFT_FSDP2_2gpu') || needs.cicd-test-container-setup.outputs.all == 'true'
3709+
with:
3710+
RUNNER: self-hosted-azure
3711+
SCRIPT: |
3712+
TRANSFORMERS_OFFLINE=1 python tests/collections/llm/hf/sft_fsdp2.py --model /home/TestData/nlp/hf_gemma/hf_gemma_2b --max-steps 10 --devices 2
3713+
AFTER_SCRIPT: |
3714+
rm -rf nemo_experiments
37043715
37053716
L2_HF_Transformer_PT_2gpu:
37063717
needs: [ cicd-test-container-setup ]
@@ -3723,6 +3734,17 @@ jobs:
37233734
TRANSFORMERS_OFFLINE=1 python tests/collections/llm/hf/sft_nemorun.py --model /home/TestData/nlp/hf_gemma/hf_gemma_2b --max-steps 10 --devices 2 --strategy ddp
37243735
AFTER_SCRIPT: |
37253736
rm -rf nemo_experiments
3737+
3738+
L2_HF_Transformer_SFT_2gpu_nemorun_fsdp2:
3739+
needs: [ cicd-test-container-setup ]
3740+
uses: ./.github/workflows/_test_template.yml
3741+
if: contains(fromJSON(needs.cicd-test-container-setup.outputs.test_to_run), 'L2_HF_Transformer_SFT_2gpu_nemorun_fsdp2') || needs.cicd-test-container-setup.outputs.all == 'true'
3742+
with:
3743+
RUNNER: self-hosted-azure
3744+
SCRIPT: |
3745+
TRANSFORMERS_OFFLINE=1 python tests/collections/llm/hf/sft_nemorun_fsdp2.py --model /home/TestData/nlp/hf_gemma/hf_gemma_2b --max-steps 10 --devices 2
3746+
AFTER_SCRIPT: |
3747+
rm -rf nemo_experiments
37263748
37273749
L2_HF_Transformer_PT_2gpu_nemorun:
37283750
needs: [ cicd-test-container-setup ]
@@ -5048,6 +5070,8 @@ jobs:
50485070
- L2_NeMo_2_PTQ_Llama2_FP8
50495071
- L2_NeMo_2_jit_callback
50505072
- L2_NeMo_2_LLAVA_NEXT_MOCK_TRAINING
5073+
- L2_HF_Transformer_SFT_FSDP2_2gpu
5074+
- L2_HF_Transformer_SFT_2gpu_nemorun_fsdp2
50515075
if: always()
50525076
runs-on: ubuntu-latest
50535077
steps:

nemo/collections/llm/gpt/model/hf_auto_model_for_causal_lm.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
from nemo.collections.common.tokenizers.huggingface.auto_tokenizer import AutoTokenizer
2121
from nemo.collections.llm import fn
2222
from nemo.lightning import io
23+
from nemo.lightning.pytorch.strategies.utils import fsdp2_strategy_parallelize
2324
from nemo.utils import logging
2425

2526

@@ -91,6 +92,10 @@ def configure_model(self):
9192
config, torch_dtype=dtype, trust_remote_code=self.trust_remote_code
9293
)
9394

95+
# Apply FSDP2 and TP to the model
96+
if self.device_mesh is not None:
97+
fsdp2_strategy_parallelize(self.model, device_mesh=self.device_mesh)
98+
9499
if self.model_accelerator is not None:
95100
self.model_accelerator(self.model)
96101

@@ -99,7 +104,7 @@ def configure_model(self):
99104
def forward(self, batch):
100105
return self.model(**batch)
101106

102-
def training_step(self, batch):
107+
def training_step(self, batch, batch_idx=None):
103108
labels = batch.pop('labels').to(self.model.device)
104109
loss_mask = batch.pop('loss_mask', None)
105110

nemo/lightning/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
from nemo.lightning.pytorch.optim import LRSchedulerModule, MegatronOptimizerModule, OptimizerModule, lr_scheduler
3232
from nemo.lightning.pytorch.plugins import MegatronDataSampler, MegatronMixedPrecision
3333
from nemo.lightning.pytorch.plugins import data_sampler as _data_sampler
34-
from nemo.lightning.pytorch.strategies import FSDPStrategy, MegatronStrategy
34+
from nemo.lightning.pytorch.strategies import FSDP2Strategy, FSDPStrategy, MegatronStrategy
3535
from nemo.lightning.pytorch.strategies.utils import RestoreConfig
3636
from nemo.lightning.pytorch.trainer import Trainer, configure_no_restart_validation_training_loop
3737
from nemo.lightning.resume import AutoResume
@@ -60,6 +60,7 @@ def _is_slurm_interactive_mode():
6060
"MegatronMixedPrecision",
6161
"MegatronOptimizerModule",
6262
"FSDPStrategy",
63+
"FSDP2Strategy",
6364
"RestoreConfig",
6465
"lr_scheduler",
6566
"NeMoLogger",

nemo/lightning/pytorch/strategies/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,12 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15+
from nemo.lightning.pytorch.strategies.fsdp2_strategy import FSDP2Strategy
1516
from nemo.lightning.pytorch.strategies.fsdp_strategy import FSDPStrategy
1617
from nemo.lightning.pytorch.strategies.megatron_strategy import MegatronStrategy
1718

18-
1919
__all__ = [
2020
"FSDPStrategy",
21+
"FSDP2Strategy",
2122
"MegatronStrategy",
2223
]
Lines changed: 276 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import os
16+
import shutil
17+
from collections import OrderedDict
18+
from pathlib import Path
19+
from typing import Any, Dict, Literal, Optional, Union
20+
21+
import lightning.pytorch as pl
22+
import torch
23+
from lightning.fabric.plugins import CheckpointIO
24+
from lightning.fabric.strategies.fsdp import _get_sharded_state_dict_context
25+
from lightning.pytorch.strategies.model_parallel import ModelParallelStrategy as PLModelParallelStrategy
26+
from lightning.pytorch.trainer.states import TrainerFn
27+
from lightning.pytorch.utilities.types import STEP_OUTPUT
28+
from torch.distributed.checkpoint.state_dict import ( # get_state_dict,
29+
StateDictOptions,
30+
get_optimizer_state_dict,
31+
set_state_dict,
32+
)
33+
from torch.utils.data import DataLoader
34+
from typing_extensions import override
35+
36+
from nemo.lightning import io
37+
from nemo.lightning.pytorch.strategies.utils import (
38+
ckpt_to_dir,
39+
create_checkpoint_io,
40+
fix_progress_bar,
41+
init_model_parallel,
42+
mcore_to_pyt_sharded_state_dict,
43+
pyt_to_mcore_state_dict,
44+
setup_data_sampler,
45+
setup_parallel_ranks,
46+
)
47+
48+
49+
class FSDP2Strategy(PLModelParallelStrategy, io.IOMixin):
50+
"""Megatron plugin for Pytorch Lightning.
51+
52+
This strategy implements FSDP 2 using PyTorch's native FSDP 2 methods. Comparing with
53+
MegatronStrategy, FSDP2Strategy is designed to be more lightweight, with minimal
54+
modifications over Lightning's ModelParallelStrategy which supports FSDP2 + TP
55+
parallelization but preserves necessary features to be compatible with nemo and mcore.
56+
By default, this strategy wraps FSDP2 per TransformerLayer.
57+
58+
Note:
59+
This strategy is designed to work with NVIDIA's Megatron-LM framework and requires
60+
specific model implementations that are compatible with Megatron's parallelism techniques.
61+
Note:
62+
Due to the different optimizer structure (FSDP2 only uses torch native optimizers),
63+
MegatronStrategy cannot resume training from checkpoints saved by FSDP2Strategy, and vice
64+
versa. However, the model weights structure is made compatible, so switching strategy is
65+
possible if users only need the weights not the optimizer states. (E.g. run pretrain with
66+
megatron 4D parallelism and run SFT with FSDP2.)
67+
"""
68+
69+
def __init__(
70+
self,
71+
data_parallel_size: Union[Literal["auto"], int] = "auto",
72+
tensor_parallel_size: Union[Literal["auto"], int] = "auto",
73+
ckpt_load_optimizer: bool = True,
74+
ckpt_save_optimizer: bool = True,
75+
data_sampler=None,
76+
**kwargs,
77+
):
78+
super().__init__(data_parallel_size=data_parallel_size, tensor_parallel_size=tensor_parallel_size, **kwargs)
79+
80+
self.data_sampler = data_sampler
81+
self.ckpt_load_optimizer = ckpt_load_optimizer
82+
self.ckpt_save_optimizer = ckpt_save_optimizer
83+
84+
@override
85+
def setup_environment(self) -> None:
86+
setup_parallel_ranks(self)
87+
super().setup_environment()
88+
init_model_parallel(self.model)
89+
90+
@override
91+
def setup(self, trainer: pl.Trainer) -> None:
92+
self.trainer = trainer
93+
setup_data_sampler(self.trainer)
94+
fix_progress_bar(trainer)
95+
super().setup(trainer)
96+
97+
def _get_loss_reduction(self, step_type: str):
98+
for fn_name in [f"{step_type}_loss_reduction", "loss_reduction"]:
99+
if hasattr(self.lightning_module, fn_name):
100+
return getattr(self.lightning_module, fn_name)
101+
return None
102+
103+
def _step_proxy(self, step_type, batch, batch_idx=None):
104+
method_name = f"{step_type}_step"
105+
if self.model != self.lightning_module:
106+
loss = self._forward_redirection(self.model, self.lightning_module, method_name, batch, batch_idx)
107+
else:
108+
loss = getattr(self.lightning_module, method_name)(batch, batch_idx)
109+
110+
_loss_reduction = self._get_loss_reduction(step_type)
111+
if _loss_reduction:
112+
return _loss_reduction.forward(batch, loss)
113+
return loss, {'avg': loss}
114+
115+
@override
116+
def training_step(self, batch, batch_idx=None) -> STEP_OUTPUT:
117+
assert self.lightning_module is not None
118+
assert self.model is not None
119+
with self.precision_plugin.train_step_context():
120+
loss, reduced = self._step_proxy("training", batch, batch_idx)
121+
122+
self.lightning_module.log(
123+
'global_step',
124+
self.trainer.global_step,
125+
prog_bar=True,
126+
rank_zero_only=True,
127+
batch_size=1,
128+
)
129+
130+
self.lightning_module.log(
131+
'step',
132+
self.trainer.global_step,
133+
)
134+
self.lightning_module.log(
135+
'reduced_train_loss', reduced['avg'], prog_bar=True, rank_zero_only=True, batch_size=1
136+
)
137+
138+
# returns unreduced loss for backward
139+
return loss
140+
141+
@override
142+
def validation_step(self, batch, batch_idx=None) -> Any:
143+
assert self.lightning_module is not None
144+
assert self.model is not None
145+
with self.precision_plugin.val_step_context():
146+
loss, reduced = self._step_proxy("validation", batch, batch_idx)
147+
self.lightning_module.log('val_loss', reduced['avg'], rank_zero_only=True, batch_size=1)
148+
return loss
149+
150+
@override
151+
def test_step(self, batch, batch_idx=None) -> STEP_OUTPUT:
152+
assert self.lightning_module is not None
153+
assert self.model is not None
154+
with self.precision_plugin.test_step_context():
155+
loss, reduced = self._step_proxy("test", batch, batch_idx)
156+
self.lightning_module.log('test_loss', reduced['avg'], rank_zero_only=True, batch_size=1)
157+
158+
return loss
159+
160+
@override
161+
def predict_step(self, batch, batch_idx=None) -> STEP_OUTPUT:
162+
assert self.lightning_module is not None
163+
assert self.model is not None
164+
with self.precision_plugin.predict_step_context():
165+
loss, reduced = self._step_proxy("predict", batch, batch_idx)
166+
return reduced
167+
168+
@override
169+
def process_dataloader(self, dataloader: DataLoader) -> DataLoader:
170+
if self.data_sampler:
171+
return self.data_sampler.transform_dataloader(dataloader)
172+
173+
return dataloader
174+
175+
@property
176+
@override
177+
def checkpoint_io(self) -> CheckpointIO:
178+
if not self._checkpoint_io:
179+
self._checkpoint_io = create_checkpoint_io()
180+
181+
return self._checkpoint_io
182+
183+
@checkpoint_io.setter
184+
def checkpoint_io(self, io: CheckpointIO) -> None:
185+
self._checkpoint_io = io
186+
187+
@property
188+
def current_epoch_step(self) -> int:
189+
"""
190+
Get the value of step within an epoch.
191+
"""
192+
return max(
193+
self.trainer.fit_loop.epoch_loop.automatic_optimization.optim_progress.optimizer.step.current.completed,
194+
self.trainer.fit_loop.epoch_loop.manual_optimization.optim_step_progress.current.completed,
195+
)
196+
197+
@override
198+
def remove_checkpoint(self, filepath: Union[str, Path]) -> None:
199+
# Taken from MegatronStrategy
200+
ckpt = ckpt_to_dir(filepath)
201+
if self.is_global_zero:
202+
if os.path.islink(ckpt):
203+
os.unlink(ckpt)
204+
else:
205+
shutil.rmtree(ckpt)
206+
207+
@override
208+
def save_checkpoint(
209+
self, checkpoint: Dict[str, Any], filepath: Union[str, Path], storage_options: Optional[Any] = None
210+
) -> None:
211+
"""Converts PyT checkpoints to MCore format and save using MCore dist ckpt library."""
212+
checkpoint["sharded_state_dict"] = pyt_to_mcore_state_dict(
213+
checkpoint.pop("state_dict"), device_mesh=self.device_mesh
214+
)
215+
checkpoint["state_dict"] = OrderedDict([])
216+
217+
if "optimizer_states" in checkpoint and self.trainer.state.fn == TrainerFn.FITTING:
218+
# Clear the optimizer states. This handles the case where ckpt_save_optimizer=False
219+
# Ideally, the optimizer state dicts should not be generated in this case
220+
checkpoint["optimizer_states"] = {}
221+
222+
## replace unsharded optimizer_states with sharded dict.
223+
## note that if trainer.save_checkpoint(path, save_weights_only=True) is called,
224+
## the checkpoint will contain only model weights. Optimizer states will be omitted.
225+
if self.ckpt_save_optimizer:
226+
checkpoint['optimizer'] = get_optimizer_state_dict(self.model, self.optimizers)
227+
pyt_to_mcore_state_dict(
228+
checkpoint['optimizer']['state'], prefix="optimizer.state.", device_mesh=self.device_mesh
229+
)
230+
231+
self.checkpoint_io.save_checkpoint(checkpoint, filepath, storage_options=storage_options)
232+
233+
@override
234+
def load_checkpoint(self, checkpoint_path: str | Path) -> Dict[str, Any]:
235+
"""PTL method which we override to integrate distributed checkpoints for FSDP models.
236+
Different from MegatronStrategy, both model and optimizer states are restore within
237+
this method.
238+
239+
The logic here is slightly more complicated:
240+
1. Obtain PyT state dicts (sharded & unflattened) for model and optim -> torch::ShardedTensor
241+
2. Convert to MCore state dicts -> mcore::ShardedTensor
242+
3. Load from checkpoint using MCore dist ckpt API -> torch::Tensor
243+
4. Convert to PyT state dicts (sharded & unflattened) -> torch::ShardedTensor
244+
5. Load into model and optim using PyT dist ckpt API
245+
6. Return the loaded checkpoint for lightning to load other metadata
246+
"""
247+
path = Path(self.broadcast(checkpoint_path))
248+
torch.cuda.empty_cache()
249+
250+
# TODO: the elegant way to load both state dicts. Need pytorch 2.3.1
251+
# msd, osd = get_state_dict(self.model, self.optimizers, options=StateDictOptions(cpu_offload=True))
252+
sharded_state_dict = {}
253+
with _get_sharded_state_dict_context(self.model):
254+
msd = self.model.state_dict()
255+
pyt_to_mcore_state_dict(msd, device_mesh=self.device_mesh)
256+
sharded_state_dict["sharded_state_dict"] = msd
257+
258+
if self.ckpt_load_optimizer and self.trainer.state.fn == TrainerFn.FITTING:
259+
osd = get_optimizer_state_dict(self.model, self.optimizers, options=StateDictOptions(cpu_offload=True))
260+
pyt_to_mcore_state_dict(osd['state'], prefix="optimizer.state.", device_mesh=self.device_mesh)
261+
sharded_state_dict["optimizer"] = osd
262+
263+
checkpoint = self.checkpoint_io.load_checkpoint(path, sharded_state_dict=sharded_state_dict)
264+
mcore_to_pyt_sharded_state_dict(checkpoint['sharded_state_dict'], msd)
265+
266+
if self.ckpt_load_optimizer and self.trainer.state.fn == TrainerFn.FITTING:
267+
mcore_to_pyt_sharded_state_dict(checkpoint['optimizer']['state'], osd['state'])
268+
269+
set_state_dict(
270+
self.model,
271+
self.optimizers if self.ckpt_load_optimizer else [],
272+
model_state_dict=checkpoint['sharded_state_dict'],
273+
optim_state_dict=checkpoint['optimizer'] if self.ckpt_load_optimizer else None,
274+
)
275+
276+
return checkpoint

nemo/lightning/pytorch/strategies/megatron_strategy.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ class MegatronStrategy(DDPStrategy, io.IOMixin):
116116
across GPU ranks. Defaults to 1.
117117
virtual_pipeline_model_parallel_size (Optional[int]): Interleaved pipeline parallelism used to
118118
improve performance by reducing the pipeline bubble. Defaults to None.
119-
microbatch_group_size_per_vp_stageOptional[int]: the number of micro-batches that are executed
119+
microbatch_group_size_per_vp_stage (Optional[int]): the number of micro-batches that are executed
120120
at a time for a given virtual stage (both forward and backward). Defaults to None and convert
121121
to pipeline_parallel_size. which specifies a depth-first schedule.
122122
context_parallel_size (int): Splits network input along sequence dimension across GPU ranks.

0 commit comments

Comments
 (0)