Skip to content

[TTS] MagpieTTS inference: Add command line option to select a subset of datasets to run inference on#15212

Merged
blisc merged 3 commits intoNVIDIA-NeMo:mainfrom
rfejgin:magpietts_datasets_inferece_command_line
Dec 31, 2025
Merged

[TTS] MagpieTTS inference: Add command line option to select a subset of datasets to run inference on#15212
blisc merged 3 commits intoNVIDIA-NeMo:mainfrom
rfejgin:magpietts_datasets_inferece_command_line

Conversation

@rfejgin
Copy link
Collaborator

@rfejgin rfejgin commented Dec 20, 2025

Added a command line option to select a subset of datasets to run inference on.

Reasoning: for day-to-day work we need a way to select a subset of datasets to run inference on. Directly editing the JSON file leads to non-reproducible local testing as the JSON file is edited over and over in a non-traceable way. Hence adding a programmatic way to choose datasets. It's entirely optional to specify this new command line argument; if not specified, all datasets in the JSON file will be processed.

New command line argument format: --datasets <dataset1,dataset2,...> where
dataset1, dataset2, ... are the names of datasets to process in the
datasets_json_path file.

If not specified, all datasets in the datasets_json_path will be processed.
If specified, only the datasets in the list will be processed.

New command line argument: --datasets <dataset1,dataset2,...> where
dataset1, dataset2, ... are the names datasets to process in the
datasets_json_path file.

If not specified, all datasets in the datasets_json_path will be processed.
If specified, only the datasets in the list will be processed.

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
* Correctly handle comma-separated list of dataset names in the --datasets argument.
* Help text

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
@github-actions github-actions bot added the TTS label Dec 20, 2025
@rfejgin rfejgin changed the title [TTS] MagpieTTS inference: Add command line to select a subset of datasets to run inference on [TTS] MagpieTTS inference: Add command line option to select a subset of datasets to run inference on Dec 20, 2025
@blisc
Copy link
Collaborator

blisc commented Dec 31, 2025

Closing in favour of #15242

@blisc blisc closed this Dec 31, 2025
@blisc blisc reopened this Dec 31, 2025
@blisc
Copy link
Collaborator

blisc commented Dec 31, 2025

Will merge this instead of #15242

@blisc blisc merged commit 316aea2 into NVIDIA-NeMo:main Dec 31, 2025
108 of 111 checks passed
blisc added a commit that referenced this pull request Jan 7, 2026
blisc added a commit that referenced this pull request Jan 8, 2026
* move inference params to checkpoint and make do_tts apply prior

Signed-off-by: Jason <jasoli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: blisc <blisc@users.noreply.github.com>

* Enable LT in do_tts; add docstrings

Signed-off-by: Jason <jasoli@nvidia.com>

* update defaults; merge inference dataclasses

Signed-off-by: Jason <jasoli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: blisc <blisc@users.noreply.github.com>

* update epsilon value

Signed-off-by: Jason <jasoli@nvidia.com>

* add defaults for inference; fix longform mode; fix bug introduced in #15212

Signed-off-by: Jason <jasoli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: blisc <blisc@users.noreply.github.com>

* fix field key usage

Signed-off-by: Jason <jasoli@nvidia.com>

---------

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: blisc <blisc@users.noreply.github.com>
Co-authored-by: blisc <blisc@users.noreply.github.com>
AkCodes23 pushed a commit to AkCodes23/NeMo that referenced this pull request Jan 28, 2026
… of datasets to run inference on (NVIDIA-NeMo#15212)

* Added datasets filtering to the inference script

New command line argument: --datasets <dataset1,dataset2,...> where
dataset1, dataset2, ... are the names datasets to process in the
datasets_json_path file.

If not specified, all datasets in the datasets_json_path will be processed.
If specified, only the datasets in the list will be processed.

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

* Refined datasets filtering in the inference script

* Correctly handle comma-separated list of dataset names in the --datasets argument.
* Help text

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

---------

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Akhil Varanasi <akhilvaranasi23@gmail.com>
AkCodes23 pushed a commit to AkCodes23/NeMo that referenced this pull request Jan 28, 2026
* move inference params to checkpoint and make do_tts apply prior

Signed-off-by: Jason <jasoli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: blisc <blisc@users.noreply.github.com>

* Enable LT in do_tts; add docstrings

Signed-off-by: Jason <jasoli@nvidia.com>

* update defaults; merge inference dataclasses

Signed-off-by: Jason <jasoli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: blisc <blisc@users.noreply.github.com>

* update epsilon value

Signed-off-by: Jason <jasoli@nvidia.com>

* add defaults for inference; fix longform mode; fix bug introduced in NVIDIA-NeMo#15212

Signed-off-by: Jason <jasoli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: blisc <blisc@users.noreply.github.com>

* fix field key usage

Signed-off-by: Jason <jasoli@nvidia.com>

---------

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: blisc <blisc@users.noreply.github.com>
Co-authored-by: blisc <blisc@users.noreply.github.com>
Signed-off-by: Akhil Varanasi <akhilvaranasi23@gmail.com>
nune-tadevosyan pushed a commit to nune-tadevosyan/NeMo that referenced this pull request Mar 13, 2026
… of datasets to run inference on (NVIDIA-NeMo#15212)

* Added datasets filtering to the inference script

New command line argument: --datasets <dataset1,dataset2,...> where
dataset1, dataset2, ... are the names datasets to process in the
datasets_json_path file.

If not specified, all datasets in the datasets_json_path will be processed.
If specified, only the datasets in the list will be processed.

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

* Refined datasets filtering in the inference script

* Correctly handle comma-separated list of dataset names in the --datasets argument.
* Help text

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

---------

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
nune-tadevosyan pushed a commit to nune-tadevosyan/NeMo that referenced this pull request Mar 13, 2026
* move inference params to checkpoint and make do_tts apply prior

Signed-off-by: Jason <jasoli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: blisc <blisc@users.noreply.github.com>

* Enable LT in do_tts; add docstrings

Signed-off-by: Jason <jasoli@nvidia.com>

* update defaults; merge inference dataclasses

Signed-off-by: Jason <jasoli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: blisc <blisc@users.noreply.github.com>

* update epsilon value

Signed-off-by: Jason <jasoli@nvidia.com>

* add defaults for inference; fix longform mode; fix bug introduced in NVIDIA-NeMo#15212

Signed-off-by: Jason <jasoli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: blisc <blisc@users.noreply.github.com>

* fix field key usage

Signed-off-by: Jason <jasoli@nvidia.com>

---------

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: blisc <blisc@users.noreply.github.com>
Co-authored-by: blisc <blisc@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants