Skip to content

Commit 676a368

Browse files
[DRAFT][TTS] Magpietts Simple API and loading audiocodec from Huggingface (#15172)
* Modularize magpie inference code, move inference code from scripts to example Signed-off-by: subhankar-ghosh <[email protected]> * Modify magpie CI with inference changes Signed-off-by: subhankar-ghosh <[email protected]> * Renaming magpietts inference scripts from magpie to magpietts Signed-off-by: subhankar-ghosh <[email protected]> * infer_batch returns dataclass object Signed-off-by: subhankar-ghosh <[email protected]> * Fixed context embedding without context encoder Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * Remove unnecessary configurations Removed multiple long manifest configurations from evalset_config.py. Signed-off-by: Subhankar Ghosh <[email protected]> * Removing unused imports Signed-off-by: subhankar-ghosh <[email protected]> * Copilot suggested changes Signed-off-by: subhankar-ghosh <[email protected]> * Copilot suggested changes Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * do_tts method, load audiocodec from huggingface Signed-off-by: subhankar-ghosh <[email protected]> * Move inference helper modules from examples to tts collection Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * Review changes Signed-off-by: subhankar-ghosh <[email protected]> * Changes suggested in compute_mean_with_confidence_interval Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * Linting issue Signed-off-by: subhankar-ghosh <[email protected]> * do_tts method, load audiocodec from huggingface Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * register_tokenizer_artifacts to store tokenizer files in .nemo file Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * Modularize magpie inference code, move inference code from scripts to example Signed-off-by: subhankar-ghosh <[email protected]> * Renaming magpietts inference scripts from magpie to magpietts Signed-off-by: subhankar-ghosh <[email protected]> * Removing unused imports Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * Remove unnecessary configurations Removed multiple long manifest configurations from evalset_config.py. Signed-off-by: Subhankar Ghosh <[email protected]> * Copilot suggested changes Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * Move inference helper modules from examples to tts collection Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * Changes suggested in compute_mean_with_confidence_interval Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * do_tts method, load audiocodec from huggingface Signed-off-by: subhankar-ghosh <[email protected]> * register_tokenizer_artifacts to store tokenizer files in .nemo file Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * rebase with main issues Signed-off-by: subhankar-ghosh <[email protected]> * changed datasets to json input, moved json file to examples/tts Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * Remove unwanted dataconfig. Signed-off-by: subhankar-ghosh <[email protected]> * optional utmos import, text_normalization cache and check, test updated Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * Update nemo/collections/tts/models/magpietts.py Co-authored-by: Copilot <[email protected]> Signed-off-by: Subhankar Ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * Update nemo/collections/tts/models/magpietts.py Co-authored-by: Copilot <[email protected]> Signed-off-by: Subhankar Ghosh <[email protected]> * Linting errors Signed-off-by: subhankar-ghosh <[email protected]> * Refactored prepare_context_tensors, removed dummy context audio/text from do_tts Signed-off-by: subhankar-ghosh <[email protected]> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <[email protected]> * remove utmos, make dataset path required Signed-off-by: subhankar-ghosh <[email protected]> * remove unused imports Signed-off-by: subhankar-ghosh <[email protected]> * Enable loading MagpieTTS from HF Signed-off-by: subhankar-ghosh <[email protected]> * Support speaker index in do_tts api Signed-off-by: subhankar-ghosh <[email protected]> --------- Signed-off-by: subhankar-ghosh <[email protected]> Signed-off-by: subhankar-ghosh <[email protected]> Signed-off-by: Subhankar Ghosh <[email protected]> Signed-off-by: Subhankar Ghosh <[email protected]> Co-authored-by: subhankar-ghosh <[email protected]> Co-authored-by: Copilot <[email protected]>
1 parent 61aa919 commit 676a368

File tree

8 files changed

+731
-248
lines changed

8 files changed

+731
-248
lines changed

nemo/collections/tts/modules/magpietts_inference/evalset_config.json renamed to examples/tts/evalset_config.json

File renamed without changes.

examples/tts/magpietts_inference.py

Lines changed: 13 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,15 @@
2121
# Inference only (from .nemo file) - default behavior
2222
python examples/tts/magpietts_inference.py \\
2323
--nemo_files /path/to/model.nemo \\
24-
--datasets libritts_test_clean \\
24+
--datasets_json_path /path/to/evalset_config.json \\
2525
--out_dir /path/to/output \\
2626
--codecmodel_path /path/to/codec.nemo
2727
2828
# Inference with evaluation (from checkpoint)
2929
python examples/tts/magpietts_inference.py \\
3030
--hparams_files /path/to/hparams.yaml \\
3131
--checkpoint_files /path/to/model.ckpt \\
32-
--datasets libritts_test_clean,vctk \\
32+
--datasets_json_path /path/to/evalset_config.json \\
3333
--out_dir /path/to/output \\
3434
--codecmodel_path /path/to/codec.nemo \\
3535
--run_evaluation \\
@@ -65,14 +65,8 @@
6565
load_magpie_model,
6666
)
6767
from nemo.collections.tts.modules.magpietts_inference.visualization import create_combined_box_plot, create_violin_plot
68-
6968
from nemo.utils import logging
7069

71-
# Default evaluation datasets
72-
EVALUATION_DATASETS = (
73-
"riva_hard_digits,riva_hard_letters,riva_hard_money,riva_hard_short,vctk,libritts_seen,libritts_test_clean"
74-
)
75-
7670

7771
def parse_layer_list(layer_str: Optional[str]) -> Optional[List[int]]:
7872
"""Parse a comma-separated list of layer indices."""
@@ -127,7 +121,7 @@ def run_inference_and_evaluation(
127121
model_config: ModelLoadConfig,
128122
inference_config: InferenceConfig,
129123
eval_config: EvaluationConfig,
130-
datasets: List[str],
124+
dataset_meta_info: dict,
131125
out_dir: str,
132126
num_repeats: int = 1,
133127
confidence_level: float = 0.95,
@@ -142,7 +136,7 @@ def run_inference_and_evaluation(
142136
model_config: Configuration for loading the model.
143137
inference_config: Configuration for inference.
144138
eval_config: Configuration for evaluation.
145-
datasets: List of dataset names to evaluate.
139+
dataset_meta_info: Dictionary containing dataset metadata.
146140
out_dir: Output directory for results.
147141
num_repeats: Number of times to repeat inference (for CI estimation).
148142
confidence_level: Confidence level for CI calculation.
@@ -176,7 +170,7 @@ def run_inference_and_evaluation(
176170
runner = MagpieInferenceRunner(model, inference_config)
177171

178172
# Tracking metrics across datasets
179-
dataset_meta_info = load_evalset_config()
173+
datasets = list(dataset_meta_info.keys())
180174
ssim_per_dataset = []
181175
cer_per_dataset = []
182176
all_datasets_filewise_metrics = {}
@@ -193,10 +187,6 @@ def run_inference_and_evaluation(
193187
for dataset in datasets:
194188
logging.info(f"Processing dataset: {dataset}")
195189

196-
if dataset not in dataset_meta_info:
197-
logging.warning(f"Dataset '{dataset}' not found in evalset_config.json, skipping.")
198-
continue
199-
200190
meta = dataset_meta_info[dataset]
201191
manifest_records = read_manifest(meta['manifest_path'])
202192
language = meta.get('whisper_language', 'en')
@@ -232,7 +222,7 @@ def run_inference_and_evaluation(
232222
f"Dataset length mismatch: {len(test_dataset)} vs {len(manifest_records)} manifest records"
233223
)
234224

235-
rtf_metrics_list, generated_paths = runner.run_inference_on_dataset(
225+
rtf_metrics_list, _ = runner.run_inference_on_dataset(
236226
dataset=test_dataset,
237227
output_dir=repeat_audio_dir,
238228
manifest_records=manifest_records,
@@ -377,10 +367,10 @@ def create_argument_parser() -> argparse.ArgumentParser:
377367
# Dataset and output arguments
378368
data_group = parser.add_argument_group('Dataset and Output')
379369
data_group.add_argument(
380-
'--datasets',
370+
'--datasets_json_path',
381371
type=str,
382372
default=None,
383-
help=f'Comma-separated dataset names (default: {EVALUATION_DATASETS})',
373+
help='Path to dataset configuration JSON file (will process all datasets in the file)',
384374
)
385375
data_group.add_argument(
386376
'--out_dir',
@@ -487,11 +477,10 @@ def main():
487477
parser = create_argument_parser()
488478
args = parser.parse_args()
489479

490-
# Set default datasets if not provided
491-
if args.datasets is None:
492-
args.datasets = EVALUATION_DATASETS
480+
dataset_meta_info = load_evalset_config(args.datasets_json_path)
481+
datasets = list(dataset_meta_info.keys())
493482

494-
datasets = args.datasets.split(",")
483+
logging.info(f"Loaded {len(datasets)} datasets: {', '.join(datasets)}")
495484

496485
# Determine mode and validate
497486
has_checkpoint_mode = (
@@ -559,7 +548,7 @@ def main():
559548
model_config=model_config,
560549
inference_config=inference_config,
561550
eval_config=eval_config,
562-
datasets=datasets,
551+
dataset_meta_info=dataset_meta_info,
563552
out_dir=args.out_dir,
564553
num_repeats=args.num_repeats,
565554
confidence_level=args.confidence_level,
@@ -584,7 +573,7 @@ def main():
584573
model_config=model_config,
585574
inference_config=inference_config,
586575
eval_config=eval_config,
587-
datasets=datasets,
576+
dataset_meta_info=dataset_meta_info,
588577
out_dir=args.out_dir,
589578
num_repeats=args.num_repeats,
590579
confidence_level=args.confidence_level,

0 commit comments

Comments
 (0)