Removes use of torchaudio and moves transforms inside of NeMo #15211

blisc · 2025-12-19T20:17:43Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Removes use of torchaudio.transforms and moves transforms inside of NeMo.
NOTE: we will use torchsquirm in nemo/collections/audio/metrics/squim.py and nemo/collections/tts/models/magpietts_preference_optimization.py

Collection: audio, asr, tts

Changelog

Move frequently used torchaudio transform into NeMo

PR Type:

New Feature
Bugfix
Documentation

Signed-off-by: Jason <[email protected]>

Signed-off-by: blisc <[email protected]>

Signed-off-by: Jason <[email protected]>

Signed-off-by: blisc <[email protected]>

Signed-off-by: Jason <[email protected]>

Signed-off-by: blisc <[email protected]>

nithinraok

Can we update or remove torchaudio references from following files as well:

nemo/collections/tts/models/magpietts_preference_optimization.py
docker/Dockerfile.speech
scripts/installers/install_torchaudio_latest.sh
docs/source/speechlm2/intro.rst
nemo/collections/audio/metrics/squim.py

In following files they are being installed but not used. We can update them too.

tutorials/00_NeMo_Primer.ipynb
tutorials/01_NeMo_Models.ipynb
tutorials/asr/Online_Offline_Microphone_VAD_Demo.ipynb
tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb
tutorials/asr/Streaming_Multitalker_ASR.ipynb
tutorials/audio/speech_enhancement/BNR_Speech_enhancement_with_NeMo.ipynb
tutorials/audio/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb
tutorials/speaker_tasks/ASR_with_SpeakerDiarization.ipynb
tutorials/speaker_tasks/End_to_End_Diarization_Inference.ipynb
tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb
tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb
tutorials/speaker_tasks/Streaming_End_to_End_Diarization_Inference.ipynb

nithinraok · 2026-01-03T18:45:06Z

nemo/collections/asr/modules/audio_preprocessing.py

        nb_max_freq (int) : Frequency above which all frequencies will be masked for narrowband augmentation.
            Defaults to 4000
-        use_torchaudio: Whether to use the `torchaudio` implementation.
+        use_torchaudio: Whether to use the FilterbankFeatures or FilterbankFeaturesTA class


can we remove this option altogether?

nithinraok · 2026-01-03T18:47:02Z

nemo/collections/asr/modules/audio_preprocessing.py

-            featurizer_class = FilterbankFeatures
-        else:
-            featurizer_class = FilterbankFeaturesTA
+        featurizer_class = FilterbankFeaturesTA if use_torchaudio else FilterbankFeatures


I see from features.py FilterbankFeaturesTA doesn;t use torchaudio. I think we can remove this condition and default to FilterbankFeatures

nithinraok · 2026-01-03T18:49:13Z

nemo/collections/audio/models/__init__.py

 # See the License for the specific language governing permissions and
 # limitations under the License.
-
-from nemo.collections.audio.models.audio_to_audio import AudioToAudioModel


why to remove these?

Causes a circular dependency if this stays in the init

blisc · 2026-01-06T16:22:03Z

Can we update or remove torchaudio references from following files as well:

nemo/collections/tts/models/magpietts_preference_optimization.py

docker/Dockerfile.speech

scripts/installers/install_torchaudio_latest.sh

docs/source/speechlm2/intro.rst

nemo/collections/audio/metrics/squim.py

We do not have a replacement for SQIUM-MOS. I think we still need to keep the import check there. But we can remove it from the dockerfile and tutorials.

pzelasko

LGTM after addressing Nithin's concerns

Signed-off-by: Jason <[email protected]>

Signed-off-by: blisc <[email protected]>

Signed-off-by: Jason <[email protected]>

…o tts_2512_removetorchaudio

nithinraok · 2026-01-08T19:43:38Z

docker/Dockerfile.speech

@chtruong814 could you help review the docker file changes.

nithinraok · 2026-01-08T19:48:32Z

@blisc I think updates to docs/source/speechlm2/intro.rst are missing

blisc · 2026-01-08T20:51:26Z

@blisc I think updates to docs/source/speechlm2/intro.rst are missing

I don't know if there is an equivalent to torchaudio.load() so I'm deferring making that change. @PiotrDabkowski Do you have a recommendation on how to update that docs?

nithinraok · 2026-01-09T02:28:08Z

That failing CI test is okay to skip as its flaky. @chtruong814 FYI

remove use of torchaudio.transforms; SQUIM todo

9a46c09

Signed-off-by: Jason <[email protected]>

blisc requested a review from pzelasko December 19, 2025 20:17

github-actions bot added TTS ASR audio labels Dec 19, 2025

blisc requested a review from nithinraok December 19, 2025 20:17

blisc added the Run CICD label Dec 19, 2025

Apply isort and black reformatting

f84393d

Signed-off-by: blisc <[email protected]>