[TTS] MagpieTTS inference: Add command line option to select a subset of datasets to run inference on#15212
Merged
blisc merged 3 commits intoNVIDIA-NeMo:mainfrom Dec 31, 2025
Conversation
New command line argument: --datasets <dataset1,dataset2,...> where dataset1, dataset2, ... are the names datasets to process in the datasets_json_path file. If not specified, all datasets in the datasets_json_path will be processed. If specified, only the datasets in the list will be processed. Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
* Correctly handle comma-separated list of dataset names in the --datasets argument. * Help text Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
blisc
approved these changes
Dec 30, 2025
Collaborator
|
Closing in favour of #15242 |
Collaborator
|
Will merge this instead of #15242 |
blisc
added a commit
that referenced
this pull request
Jan 7, 2026
…15212 Signed-off-by: Jason <jasoli@nvidia.com>
3 tasks
blisc
added a commit
that referenced
this pull request
Jan 8, 2026
* move inference params to checkpoint and make do_tts apply prior Signed-off-by: Jason <jasoli@nvidia.com> * Apply isort and black reformatting Signed-off-by: blisc <blisc@users.noreply.github.com> * Enable LT in do_tts; add docstrings Signed-off-by: Jason <jasoli@nvidia.com> * update defaults; merge inference dataclasses Signed-off-by: Jason <jasoli@nvidia.com> * Apply isort and black reformatting Signed-off-by: blisc <blisc@users.noreply.github.com> * update epsilon value Signed-off-by: Jason <jasoli@nvidia.com> * add defaults for inference; fix longform mode; fix bug introduced in #15212 Signed-off-by: Jason <jasoli@nvidia.com> * Apply isort and black reformatting Signed-off-by: blisc <blisc@users.noreply.github.com> * fix field key usage Signed-off-by: Jason <jasoli@nvidia.com> --------- Signed-off-by: Jason <jasoli@nvidia.com> Signed-off-by: blisc <blisc@users.noreply.github.com> Co-authored-by: blisc <blisc@users.noreply.github.com>
AkCodes23
pushed a commit
to AkCodes23/NeMo
that referenced
this pull request
Jan 28, 2026
… of datasets to run inference on (NVIDIA-NeMo#15212) * Added datasets filtering to the inference script New command line argument: --datasets <dataset1,dataset2,...> where dataset1, dataset2, ... are the names datasets to process in the datasets_json_path file. If not specified, all datasets in the datasets_json_path will be processed. If specified, only the datasets in the list will be processed. Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> * Refined datasets filtering in the inference script * Correctly handle comma-separated list of dataset names in the --datasets argument. * Help text Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> --------- Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> Signed-off-by: Akhil Varanasi <akhilvaranasi23@gmail.com>
AkCodes23
pushed a commit
to AkCodes23/NeMo
that referenced
this pull request
Jan 28, 2026
* move inference params to checkpoint and make do_tts apply prior Signed-off-by: Jason <jasoli@nvidia.com> * Apply isort and black reformatting Signed-off-by: blisc <blisc@users.noreply.github.com> * Enable LT in do_tts; add docstrings Signed-off-by: Jason <jasoli@nvidia.com> * update defaults; merge inference dataclasses Signed-off-by: Jason <jasoli@nvidia.com> * Apply isort and black reformatting Signed-off-by: blisc <blisc@users.noreply.github.com> * update epsilon value Signed-off-by: Jason <jasoli@nvidia.com> * add defaults for inference; fix longform mode; fix bug introduced in NVIDIA-NeMo#15212 Signed-off-by: Jason <jasoli@nvidia.com> * Apply isort and black reformatting Signed-off-by: blisc <blisc@users.noreply.github.com> * fix field key usage Signed-off-by: Jason <jasoli@nvidia.com> --------- Signed-off-by: Jason <jasoli@nvidia.com> Signed-off-by: blisc <blisc@users.noreply.github.com> Co-authored-by: blisc <blisc@users.noreply.github.com> Signed-off-by: Akhil Varanasi <akhilvaranasi23@gmail.com>
nune-tadevosyan
pushed a commit
to nune-tadevosyan/NeMo
that referenced
this pull request
Mar 13, 2026
… of datasets to run inference on (NVIDIA-NeMo#15212) * Added datasets filtering to the inference script New command line argument: --datasets <dataset1,dataset2,...> where dataset1, dataset2, ... are the names datasets to process in the datasets_json_path file. If not specified, all datasets in the datasets_json_path will be processed. If specified, only the datasets in the list will be processed. Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> * Refined datasets filtering in the inference script * Correctly handle comma-separated list of dataset names in the --datasets argument. * Help text Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com> --------- Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
nune-tadevosyan
pushed a commit
to nune-tadevosyan/NeMo
that referenced
this pull request
Mar 13, 2026
* move inference params to checkpoint and make do_tts apply prior Signed-off-by: Jason <jasoli@nvidia.com> * Apply isort and black reformatting Signed-off-by: blisc <blisc@users.noreply.github.com> * Enable LT in do_tts; add docstrings Signed-off-by: Jason <jasoli@nvidia.com> * update defaults; merge inference dataclasses Signed-off-by: Jason <jasoli@nvidia.com> * Apply isort and black reformatting Signed-off-by: blisc <blisc@users.noreply.github.com> * update epsilon value Signed-off-by: Jason <jasoli@nvidia.com> * add defaults for inference; fix longform mode; fix bug introduced in NVIDIA-NeMo#15212 Signed-off-by: Jason <jasoli@nvidia.com> * Apply isort and black reformatting Signed-off-by: blisc <blisc@users.noreply.github.com> * fix field key usage Signed-off-by: Jason <jasoli@nvidia.com> --------- Signed-off-by: Jason <jasoli@nvidia.com> Signed-off-by: blisc <blisc@users.noreply.github.com> Co-authored-by: blisc <blisc@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Added a command line option to select a subset of datasets to run inference on.
Reasoning: for day-to-day work we need a way to select a subset of datasets to run inference on. Directly editing the JSON file leads to non-reproducible local testing as the JSON file is edited over and over in a non-traceable way. Hence adding a programmatic way to choose datasets. It's entirely optional to specify this new command line argument; if not specified, all datasets in the JSON file will be processed.
New command line argument format:
--datasets <dataset1,dataset2,...>wheredataset1, dataset2,... are the names of datasets to process in thedatasets_json_path file.
If not specified, all datasets in the datasets_json_path will be processed.
If specified, only the datasets in the list will be processed.