diff --git a/README.md b/README.md index d67b574c4d..d9a159fa9c 100644 --- a/README.md +++ b/README.md @@ -27,9 +27,6 @@ processing library. The benefits of PyTorch can be seen in torchaudio through having all the computations be through PyTorch operations which makes it easy to use and feel like a natural extension. -- [Support audio I/O (Load files, Save files)](http://pytorch.org/audio/main/) - - Load a variety of audio formats, such as `wav`, `mp3`, `ogg`, `flac`, `opus`, `sphere`, into a torch Tensor using SoX - - [Kaldi (ark/scp)](http://pytorch.org/audio/main/kaldi_io.html) - [Dataloaders for common audio datasets](http://pytorch.org/audio/main/datasets.html) - Audio and speech processing functions - [forced_align](https://pytorch.org/audio/main/generated/torchaudio.functional.forced_align.html) @@ -70,7 +67,7 @@ If you find this package useful, please cite as: ```bibtex @misc{hwang2023torchaudio, - title={TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch}, + title={TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch}, author={Jeff Hwang and Moto Hira and Caroline Chen and Xiaohui Zhang and Zhaoheng Ni and Guangzhi Sun and Pingchuan Ma and Ruizhe Huang and Vineel Pratap and Yuekai Zhang and Anurag Kumar and Chin-Yun Yu and Chuang Zhu and Chunxi Liu and Jacob Kahn and Mirco Ravanelli and Peng Sun and Shinji Watanabe and Yangyang Shi and Yumeng Tao and Robin Scheibler and Samuele Cornell and Sean Kim and Stavros Petridis}, year={2023}, eprint={2310.17864}, diff --git a/docs/source/compliance.kaldi.rst b/docs/source/compliance.kaldi.rst deleted file mode 100644 index 2a54f6c61d..0000000000 --- a/docs/source/compliance.kaldi.rst +++ /dev/null @@ -1,20 +0,0 @@ -.. py:module:: torchaudio.compliance.kaldi - -torchaudio.compliance.kaldi -=========================== - -.. currentmodule:: torchaudio.compliance.kaldi - -The useful processing operations of kaldi_ can be performed with torchaudio. -Various functions with identical parameters are given so that torchaudio can -produce similar outputs. - -.. _kaldi: https://github.com/kaldi-asr/kaldi - -.. autosummary:: - :toctree: generated - :nosignatures: - - spectrogram - fbank - mfcc diff --git a/docs/source/index.rst b/docs/source/index.rst index 449720096a..193c3bb0fa 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -4,16 +4,16 @@ Torchaudio Documentation .. image:: _static/img/logo.png Torchaudio is a library for audio and signal processing with PyTorch. -It provides I/O, signal and data processing functions, datasets, +It provides signal and data processing functions, datasets, model implementations and application components. .. note:: Starting with version 2.8, we are refactoring TorchAudio to transition it into a maintenance phase. As a result: - - Some APIs are deprecated in 2.8 and will be removed in 2.9. + - Some APIs were deprecated in 2.8 and removed as of 2.9. - The decoding and encoding capabilities of PyTorch for both audio and video - are being consolidated into TorchCodec. + have been consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. @@ -67,7 +67,6 @@ model implementations and application components. models models.decoder pipelines - utils .. toctree:: :maxdepth: 1 diff --git a/docs/source/torchaudio.rst b/docs/source/torchaudio.rst index 819a019130..3c0a18a8f9 100644 --- a/docs/source/torchaudio.rst +++ b/docs/source/torchaudio.rst @@ -4,12 +4,11 @@ torchaudio .. currentmodule:: torchaudio .. warning:: - Starting with version 2.8, we are refactoring TorchAudio to transition it - into a maintenance phase. As a result: + Starting with version 2.9, we have transitioned TorchAudio into a maintenance phase. As a result: - - Most APIs listed below are deprecated in 2.8 and will be removed in 2.9. + - APIs deprecated in version 2.8 have been removed in 2.9. - The decoding and encoding capabilities of PyTorch for both audio and video - are being consolidated into TorchCodec. For convenience, we provide + have been consolidated into TorchCodec. For convenience, we provide :func:`~torchaudio.load_with_torchcodec` as a replacement for :func:`~torchaudio.load` and :func:`~torchaudio.save_with_torchcodec` as a replacement for :func:`~torchaudio.save`, but we recommend that you port @@ -28,81 +27,7 @@ it easy to handle audio data. :nosignatures: :template: autosummary/io.rst - info load load_with_torchcodec save save_with_torchcodec - -.. _backend: - -Backend and Dispatcher ----------------------- - -Decoding and encoding media is highly elaborated process. Therefore, TorchAudio -relies on third party libraries to perform these operations. These third party -libraries are called ``backend``, and currently TorchAudio integrates the -following libraries. - -Please refer to `Installation <./installation.html>`__ for how to enable backends. - -Conventionally, TorchAudio has had its I/O backend set globally at runtime -based on availability. However, this approach does not allow applications to -use different backends, and it is not well-suited for large codebases. - -For these reasons, in v2.0, we introduced a dispatcher, a new mechanism to allow -users to choose a backend for each function call. - -When dispatcher mode is enabled, all the I/O functions accept extra keyward argument -``backend``, which specifies the desired backend. If the specified -backend is not available, the function call will fail. - -If a backend is not explicitly chosen, the functions will select a backend to use given order of precedence and library availability. - -The following table summarizes the backends. - -.. list-table:: - :header-rows: 1 - :widths: 8 12 25 60 - - * - Priority - - Backend - - Supported OS - - Note - * - 1 - - FFmpeg - - Linux, macOS, Windows - - Note - - This backend Supports various protocols, such as HTTPS and MP4, and file-like objects. - * - 3 - - SoundFile - - Linux, macOS, Windows - - Please refer to `the official document `__ for the supported codecs. - - This backend supports file-like objects. - -.. _dispatcher_migration: - -Dispatcher Migration -~~~~~~~~~~~~~~~~~~~~ - -We are migrating the I/O functions to use the dispatcher mechanism, and this -incurs multiple changes, some of which involve backward-compatibility-breaking -changes, and require users to change their function call. - -The (planned) changes are as follows. For up-to-date information, -please refer to https://github.com/pytorch/audio/issues/2950 - -* In 2.0, audio I/O backend dispatcher was introduced. - Users can opt-in to using dispatcher by setting the environment variable - ``TORCHAUDIO_USE_BACKEND_DISPATCHER=1``. -* In 2.1, the disptcher became the default mechanism for I/O. -* In 2.2, the legacy global backend mechanism is removed. - Utility functions :py:func:`get_audio_backend` and :py:func:`set_audio_backend` - became no-op. - -Furthermore, we removed file-like object support from libsox backend, as this -is better supported by FFmpeg backend and makes the build process simpler. -Therefore, beginning with 2.1, FFmpeg and Soundfile are the sole backends that support -file-like objects. diff --git a/docs/source/utils.rst b/docs/source/utils.rst deleted file mode 100644 index 1fe0d72c10..0000000000 --- a/docs/source/utils.rst +++ /dev/null @@ -1,23 +0,0 @@ -.. py:module:: torchaudio.utils - -torchaudio.utils -================ - -``torchaudio.utils`` module contains utility functions to configure the global state of third party libraries. - -.. warning:: - Starting with version 2.8, we are refactoring TorchAudio to transition it - into a maintenance phase. As a result: - - ``sox_utils`` are deprecated in 2.8 and will be removed in 2.9. - - The decoding and encoding capabilities of PyTorch for both audio and video - are being consolidated into TorchCodec. - Please see https://github.com/pytorch/audio/issues/3902 for more information. - -.. currentmodule:: torchaudio.utils - -.. autosummary:: - :toctree: generated - :nosignatures: - :template: autosummary/utils.rst - - sox_utils diff --git a/packaging/torchaudio/meta.yaml b/packaging/torchaudio/meta.yaml index 63121b6e87..555e214ee8 100644 --- a/packaging/torchaudio/meta.yaml +++ b/packaging/torchaudio/meta.yaml @@ -50,9 +50,7 @@ build: test: imports: - torchaudio - - torchaudio.io - torchaudio.datasets - - torchaudio.sox_effects - torchaudio.transforms source_files: diff --git a/test/torchaudio_unittest/README.md b/test/torchaudio_unittest/README.md index 2ae1104847..c6f56b4245 100644 --- a/test/torchaudio_unittest/README.md +++ b/test/torchaudio_unittest/README.md @@ -15,10 +15,8 @@ Some useful pytest commands: pytest test --collect-only # Run all the test suites pytest test -# Run tests on sox_effects module -pytest test/torchaudio_unittest/sox_effect # use -k to apply filter -pytest test/torchaudio_unittest/sox_io_backend -k load # only runs tests where their names contain load +pytest test/torchaudio_unittest/test_load_save_torchcodec.py -k load # only runs tests where their names contain load # Some other useful options; # Stop on the first failure -x # Run failure fast --ff @@ -61,8 +59,6 @@ The following test modules are defined for corresponding `torchaudio` module/fun - [`torchaudio.functional`](./functional) - [`torchaudio.transforms`](./transforms/transforms_test.py) - [`torchaudio.compliance.kaldi`](./compliance_kaldi_test.py) -- [`torchaudio.kaldi_io`](./kaldi_io_test.py) -- [`torchaudio.sox_effects`](./sox_effect) - [`torchaudio.backend`](./backend) ### Test modules that do not fall into the above categories @@ -73,6 +69,9 @@ The following test modules are defined for corresponding `torchaudio` module/fun - [assets](./assets): Contain sample audio files. - [assets/kaldi](./assets/kaldi): Contains Kaldi format matrix files used in [./test_compliance_kaldi.py](./test_compliance_kaldi.py). - [compliance](./compliance): Scripts used to generate above Kaldi matrix files. +- [assets/kaldi_expected_results](./assets/kaldi_expected_results): Contains outputs from Kaldi to compare against torchaudio functionality in [./compliance/kaldi](./compliance/kaldi). +- [assets/librosa_expected_results](./assets/librosa_expected_results): Contains outputs from Librosa to compare against torchaudio functionality. +- [assets/sox_expected_results](./assets/sox_expected_results): Contains outputs from Sox to compare against torchaudio functionality. ### Waveforms for Testing Purposes diff --git a/test/torchaudio_unittest/assets/sox_effect_test_args.jsonl b/test/torchaudio_unittest/assets/sox_effect_test_args.jsonl deleted file mode 100644 index 2a223df635..0000000000 --- a/test/torchaudio_unittest/assets/sox_effect_test_args.jsonl +++ /dev/null @@ -1,88 +0,0 @@ -{"effects": [["allpass", "300", "10"]]} -{"effects": [["band", "300", "10"]]} -{"effects": [["bandpass", "300", "10"]]} -{"effects": [["bandreject", "300", "10"]]} -{"effects": [["bass", "-10"]]} -{"effects": [["bend", ".35,180,.25", ".15,740,.53", "0,-520,.3"]]} -{"effects": [["biquad", "0.4", "0.2", "0.9", "0.7", "0.2", "0.6"]]} -{"effects": [["chorus", "0.7", "0.9", "55", "0.4", "0.25", "2", "-t"]]} -{"effects": [["chorus", "0.6", "0.9", "50", "0.4", "0.25", "2", "-t", "60", "0.32", "0.4", "1.3", "-s"]]} -{"effects": [["chorus", "0.5", "0.9", "50", "0.4", "0.25", "2", "-t", "60", "0.32", "0.4", "2.3", "-t", "40", "0.3", "0.3", "1.3", "-s"]]} -{"effects": [["channels", "1"]]} -{"effects": [["channels", "2"]]} -{"effects": [["channels", "3"]]} -{"effects": [["compand", "0.3,1", "6:-70,-60,-20", "-5", "-90", "0.2"]]} -{"effects": [["compand", ".1,.2", "-inf,-50.1,-inf,-50,-50", "0", "-90", ".1"]]} -{"effects": [["compand", ".1,.1", "-45.1,-45,-inf,0,-inf", "45", "-90", ".1"]]} -{"effects": [["contrast", "0"]]} -{"effects": [["contrast", "25"]]} -{"effects": [["contrast", "50"]]} -{"effects": [["contrast", "75"]]} -{"effects": [["contrast", "100"]]} -{"effects": [["dcshift", "1.0"]]} -{"effects": [["dcshift", "-1.0"]]} -{"effects": [["deemph"]], "input_sample_rate": 44100} -{"effects": [["delay", "1.5", "+1"]]} -{"effects": [["dither", "-s"]]} -{"effects": [["dither", "-S"]]} -{"effects": [["divide"]]} -{"effects": [["downsample", "2"]], "input_sample_rate": 8000, "output_sample_rate": 4000} -{"effects": [["earwax"]], "input_sample_rate": 44100} -{"effects": [["echo", "0.8", "0.88", "60", "0.4"]]} -{"effects": [["echo", "0.8", "0.88", "6", "0.4"]]} -{"effects": [["echo", "0.8", "0.9", "1000", "0.3"]]} -{"effects": [["echo", "0.8", "0.9", "1000", "0.3", "1800", "0.25"]]} -{"effects": [["echos", "0.8", "0.7", "700", "0.25", "700", "0.3"]]} -{"effects": [["echos", "0.8", "0.7", "700", "0.25", "900", "0.3"]]} -{"effects": [["echos", "0.8", "0.7", "40", "0.25", "63", "0.3"]]} -{"effects": [["equalizer", "300", "10", "5"]]} -{"effects": [["fade", "q", "3"]]} -{"effects": [["fade", "h", "3"]]} -{"effects": [["fade", "t", "3"]]} -{"effects": [["fade", "l", "3"]]} -{"effects": [["fade", "p", "3"]]} -{"effects": [["fir", "0.0195", "-0.082", "0.234", "0.891", "-0.145", "0.043"]]} -{"effects": [["fir", "/sox_effect_test_fir_coeffs.txt"]]} -{"effects": [["flanger"]]} -{"effects": [["gain", "-n"]]} -{"effects": [["gain", "-n", "-3"]]} -{"effects": [["gain", "-l", "-6"]]} -{"effects": [["highpass", "-1", "300"]]} -{"effects": [["highpass", "-2", "300"]]} -{"effects": [["hilbert"]]} -{"effects": [["loudness"]]} -{"effects": [["lowpass", "-1", "300"]]} -{"effects": [["lowpass", "-2", "300"]]} -{"effects": [["mcompand", "0.005,0.1 -47,-40,-34,-34,-17,-33", "100", "0.003,0.05 -47,-40,-34,-34,-17,-33", "400", "0.000625,0.0125 -47,-40,-34,-34,-15,-33", "1600", "0.0001,0.025 -47,-40,-34,-34,-31,-31,-0,-30", "6400", "0,0.025 -38,-31,-28,-28,-0,-25"]], "input_sample_rate": 44100} -{"effects": [["norm"]]} -{"effects": [["oops"]]} -{"effects": [["overdrive"]]} -{"effects": [["pad"]]} -{"effects": [["phaser"]]} -{"effects": [["pitch", "6.48"], ["rate", "8030"]], "output_sample_rate": 8030} -{"effects": [["pitch", "-6.50"], ["rate", "7970"]], "output_sample_rate": 7970} -{"effects": [["rate", "4567"]], "output_sample_rate": 4567} -{"effects": [["remix", "6", "7", "8", "0"]], "num_channels": 8} -{"effects": [["remix", "1-3,7", "3"]], "num_channels": 8} -{"effects": [["repeat"]]} -{"effects": [["reverb"]]} -{"effects": [["reverse"]]} -{"effects": [["riaa"]], "input_sample_rate": 44100} -{"effects": [["silence", "0"]]} -{"effects": [["sinc", "3k"]]} -{"effects": [["speed", "1.3"]], "input_sample_rate": 4000, "output_sample_rate": 5200} -{"effects": [["speed", "0.7"]], "input_sample_rate": 4000, "output_sample_rate": 2800} -{"effects": [["stat"]]} -{"effects": [["stats"]]} -{"effects": [["stretch"]]} -{"effects": [["swap"]]} -{"effects": [["synth"]]} -{"effects": [["tempo", "0.9"]]} -{"effects": [["tempo", "1.1"]]} -{"effects": [["treble", "3"]]} -{"effects": [["tremolo", "300", "40"]]} -{"effects": [["tremolo", "300", "50"]]} -{"effects": [["trim", "0", "0.1"]]} -{"effects": [["upsample", "2"]], "input_sample_rate": 8000, "output_sample_rate": 16000} -{"effects": [["vad"]]} -{"effects": [["vol", "3"]]}