Skip to content

Commit f965477

Browse files
Polina Kazakovamariosasko
andauthored
Use soundfile for mp3 decoding instead of torchaudio (#5573)
* use soundfile for mp3 decoding instead of torchaudio * fix some tests * remove torch and torchaudio from library's requirements * refactor audio decoding, decode everything with soundfile * remove torchaudio latest test ci stage, remove libsndfile and sox binaries installation * remove checks for libsndfile in tests since it's bundeled in python library * remove instructions about installing via package manager since it's misleading * pin soundfile version to the latest * update documentation * fix setup * Update docs/source/installation.md Co-authored-by: Mario Šaško <[email protected]> * refactor decoding: move all the code under the main decode_example func * get audio format with os.path instead of string split * add module config variables for opus and mp3 support * apply steven's suggestion to installation docs * wrap torch.from_numpy in a func to avoid torch.from_numpy pickling error * Apply suggestions from code review Co-authored-by: Mario Šaško <[email protected]> * fix code style * import xsplitext --------- Co-authored-by: Mario Šaško <[email protected]>
1 parent 88fa043 commit f965477

File tree

9 files changed

+75
-397
lines changed

9 files changed

+75
-397
lines changed

.github/workflows/ci.yml

Lines changed: 0 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,6 @@ jobs:
4040
continue-on-error: ${{ matrix.test == 'integration' }}
4141
runs-on: ${{ matrix.os }}
4242
steps:
43-
- name: Install OS dependencies
44-
if: ${{ matrix.os == 'ubuntu-latest' }}
45-
run: |
46-
sudo apt-get -y update
47-
sudo apt-get -y install libsndfile1 sox
4843
- uses: actions/checkout@v3
4944
with:
5045
fetch-depth: 0
@@ -72,16 +67,6 @@ jobs:
7267
- name: Test with pytest
7368
run: |
7469
python -m pytest -rfExX -m ${{ matrix.test }} -n 2 --dist loadfile -sv ./tests/
75-
- name: Install dependencies to test torchaudio>=0.12 on Ubuntu
76-
if: ${{ matrix.os == 'ubuntu-latest' }}
77-
run: |
78-
pip uninstall -y torchaudio torch
79-
pip install "torchaudio>=0.12"
80-
sudo apt-get -y install ffmpeg
81-
- name: Test torchaudio>=0.12 on Ubuntu
82-
if: ${{ matrix.os == 'ubuntu-latest' }}
83-
run: |
84-
python -m pytest -rfExX -m torchaudio_latest -n 2 --dist loadfile -sv ./tests/features/test_audio.py
8570
8671
test_py310:
8772
needs: check_code_quality
@@ -93,11 +78,6 @@ jobs:
9378
continue-on-error: false
9479
runs-on: ${{ matrix.os }}
9580
steps:
96-
- name: Install OS dependencies
97-
if: ${{ matrix.os == 'ubuntu-latest' }}
98-
run: |
99-
sudo apt-get -y update
100-
sudo apt-get -y install libsndfile1 sox
10181
- uses: actions/checkout@v3
10282
with:
10383
fetch-depth: 0
@@ -112,13 +92,3 @@ jobs:
11292
- name: Test with pytest
11393
run: |
11494
python -m pytest -rfExX -m ${{ matrix.test }} -n 2 --dist loadfile -sv ./tests/
115-
- name: Install dependencies to test torchaudio>=0.12 on Ubuntu
116-
if: ${{ matrix.os == 'ubuntu-latest' }}
117-
run: |
118-
pip uninstall -y torchaudio torch
119-
pip install "torchaudio>=0.12"
120-
sudo apt-get -y install ffmpeg
121-
- name: Test torchaudio>=0.12 on Ubuntu
122-
if: ${{ matrix.os == 'ubuntu-latest' }}
123-
run: |
124-
python -m pytest -rfExX -m torchaudio_latest -n 2 --dist loadfile -sv ./tests/features/test_audio.py

docs/source/audio_load.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Load audio data
22

33
You can load an audio dataset using the [`Audio`] feature that automatically decodes and resamples the audio files when you access the examples.
4-
Audio decoding is based on `librosa` in general, and `torchaudio` for MP3.
4+
Audio decoding is based on the [`soundfile`](https://github.com/bastibe/python-soundfile) python package, which uses the [`libsndfile`](https://github.com/libsndfile/libsndfile) C library under the hood.
55

66
## Installation
77

docs/source/installation.md

Lines changed: 3 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -67,31 +67,15 @@ pip install datasets[audio]
6767

6868
<Tip warning={true}>
6969

70-
On Linux, non-Python dependency on `libsndfile` package must be installed manually, using your distribution package manager, for example:
70+
To decode mp3 files, you need to have at least version 1.1.0 of the `libsndfile` system library. Usually, it's bundled with the python [`soundfile`](https://github.com/bastibe/python-soundfile) package, which is installed as an extra audio dependency for 🤗 Datasets.
71+
For Linux, the required version of `libsndfile` is bundled with `soundfile` starting from version 0.12.0. You can run the following command to determine which version of `libsndfile` is being used by `soundfile`:
7172

7273
```bash
73-
sudo apt-get install libsndfile1
74+
python -c "import soundfile; print(soundfile.__libsndfile_version__)"
7475
```
7576

7677
</Tip>
7778

78-
To support loading audio datasets containing MP3 files, users should also install [torchaudio](https://pytorch.org/audio/stable/index.html) to handle the audio data with high performance:
79-
80-
```bash
81-
pip install 'torchaudio<0.12.0'
82-
```
83-
84-
<Tip warning={true}>
85-
86-
torchaudio's `sox_io` [backend](https://pytorch.org/audio/stable/backend.html#) supports decoding MP3 files. Unfortunately, the `sox_io` backend is only available on Linux/macOS and isn't supported by Windows.
87-
88-
You need to install it using your distribution package manager, for example:
89-
90-
```bash
91-
sudo apt-get install sox
92-
```
93-
94-
</Tip>
9579

9680
## Vision
9781

setup.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,7 @@
142142
]
143143

144144
AUDIO_REQUIRE = [
145+
"soundfile>=0.12.1",
145146
"librosa",
146147
]
147148

@@ -176,8 +177,7 @@
176177
"tensorflow-macos; sys_platform == 'darwin' and platform_machine == 'arm64'",
177178
"tiktoken;python_version>='3.8'",
178179
"torch",
179-
"torchaudio<0.12.0",
180-
"soundfile",
180+
"soundfile>=0.12.1",
181181
"transformers",
182182
"zstandard",
183183
]

src/datasets/config.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,12 @@
130130

131131
# Optional tools for feature decoding
132132
PIL_AVAILABLE = importlib.util.find_spec("PIL") is not None
133-
133+
IS_OPUS_SUPPORTED = importlib.util.find_spec("soundfile") is not None and version.parse(
134+
importlib.import_module("soundfile").__libsndfile_version__
135+
) >= version.parse("1.0.31")
136+
IS_MP3_SUPPORTED = importlib.util.find_spec("soundfile") is not None and version.parse(
137+
importlib.import_module("soundfile").__libsndfile_version__
138+
) >= version.parse("1.1.0")
134139

135140
# Optional compression tools
136141
RARFILE_AVAILABLE = importlib.util.find_spec("rarfile") is not None

src/datasets/features/audio.py

Lines changed: 41 additions & 142 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,13 @@
11
import os
2-
import warnings
32
from dataclasses import dataclass, field
43
from io import BytesIO
54
from typing import TYPE_CHECKING, Any, ClassVar, Dict, Optional, Union
65

76
import numpy as np
87
import pyarrow as pa
9-
from packaging import version
108

119
from .. import config
12-
from ..download.streaming_download_manager import xopen
10+
from ..download.streaming_download_manager import xopen, xsplitext
1311
from ..table import array_cast
1412
from ..utils.py_utils import no_op_if_value_is_null, string_to_dict
1513

@@ -150,20 +148,47 @@ def decode_example(
150148
path, file = (value["path"], BytesIO(value["bytes"])) if value["bytes"] is not None else (value["path"], None)
151149
if path is None and file is None:
152150
raise ValueError(f"An audio sample should have one of 'path' or 'bytes' but both are None in {value}.")
153-
elif path is not None and path.endswith("mp3"):
154-
array, sampling_rate = self._decode_mp3(file if file else path)
155-
elif path is not None and path.endswith("opus"):
156-
if file:
157-
array, sampling_rate = self._decode_non_mp3_file_like(file, "opus")
158-
else:
159-
array, sampling_rate = self._decode_non_mp3_path_like(
160-
path, "opus", token_per_repo_id=token_per_repo_id
161-
)
151+
152+
try:
153+
import librosa
154+
import soundfile as sf
155+
except ImportError as err:
156+
raise ImportError("To support decoding audio files, please install 'librosa' and 'soundfile'.") from err
157+
158+
audio_format = xsplitext(path)[1][1:].lower() if path is not None else None
159+
if not config.IS_OPUS_SUPPORTED and audio_format == "opus":
160+
raise RuntimeError(
161+
"Decoding 'opus' files requires system library 'libsndfile'>=1.0.31, "
162+
'You can try to update `soundfile` python library: `pip install "soundfile>=0.12.1"`. '
163+
)
164+
elif not config.IS_MP3_SUPPORTED and audio_format == "mp3":
165+
raise RuntimeError(
166+
"Decoding 'mp3' files requires system library 'libsndfile'>=1.1.0, "
167+
'You can try to update `soundfile` python library: `pip install "soundfile>=0.12.1"`. '
168+
)
169+
170+
if file is None:
171+
token_per_repo_id = token_per_repo_id or {}
172+
source_url = path.split("::")[-1]
173+
try:
174+
repo_id = string_to_dict(source_url, config.HUB_DATASETS_URL)["repo_id"]
175+
use_auth_token = token_per_repo_id[repo_id]
176+
except (ValueError, KeyError):
177+
use_auth_token = None
178+
179+
with xopen(path, "rb", use_auth_token=use_auth_token) as f:
180+
array, sampling_rate = sf.read(f)
181+
162182
else:
163-
if file:
164-
array, sampling_rate = self._decode_non_mp3_file_like(file)
165-
else:
166-
array, sampling_rate = self._decode_non_mp3_path_like(path, token_per_repo_id=token_per_repo_id)
183+
array, sampling_rate = sf.read(file)
184+
185+
array = array.T
186+
if self.mono:
187+
array = librosa.to_mono(array)
188+
if self.sampling_rate and self.sampling_rate != sampling_rate:
189+
array = librosa.resample(array, orig_sr=sampling_rate, target_sr=self.sampling_rate)
190+
sampling_rate = self.sampling_rate
191+
167192
return {"path": path, "array": array, "sampling_rate": sampling_rate}
168193

169194
def flatten(self) -> Union["FeatureType", Dict[str, "FeatureType"]]:
@@ -242,129 +267,3 @@ def path_to_bytes(path):
242267
)
243268
storage = pa.StructArray.from_arrays([bytes_array, path_array], ["bytes", "path"], mask=bytes_array.is_null())
244269
return array_cast(storage, self.pa_type)
245-
246-
def _decode_non_mp3_path_like(
247-
self, path, format=None, token_per_repo_id: Optional[Dict[str, Union[str, bool, None]]] = None
248-
):
249-
try:
250-
import librosa
251-
except ImportError as err:
252-
raise ImportError("To support decoding audio files, please install 'librosa'.") from err
253-
254-
token_per_repo_id = token_per_repo_id or {}
255-
if format == "opus":
256-
import soundfile
257-
258-
if version.parse(soundfile.__libsndfile_version__) < version.parse("1.0.30"):
259-
raise RuntimeError(
260-
"Decoding .opus files requires 'libsndfile'>=1.0.30, "
261-
+ "it can be installed via conda: `conda install -c conda-forge libsndfile>=1.0.30`"
262-
)
263-
source_url = path.split("::")[-1]
264-
try:
265-
repo_id = string_to_dict(source_url, config.HUB_DATASETS_URL)["repo_id"]
266-
use_auth_token = token_per_repo_id[repo_id]
267-
except (ValueError, KeyError):
268-
use_auth_token = None
269-
270-
with xopen(path, "rb", use_auth_token=use_auth_token) as f:
271-
array, sampling_rate = librosa.load(f, sr=self.sampling_rate, mono=self.mono)
272-
return array, sampling_rate
273-
274-
def _decode_non_mp3_file_like(self, file, format=None):
275-
try:
276-
import librosa
277-
import soundfile as sf
278-
except ImportError as err:
279-
raise ImportError("To support decoding audio files, please install 'librosa' and 'soundfile'.") from err
280-
281-
if format == "opus":
282-
if version.parse(sf.__libsndfile_version__) < version.parse("1.0.30"):
283-
raise RuntimeError(
284-
"Decoding .opus files requires 'libsndfile'>=1.0.30, "
285-
+ 'it can be installed via conda: `conda install -c conda-forge "libsndfile>=1.0.30"`'
286-
)
287-
array, sampling_rate = sf.read(file)
288-
array = array.T
289-
if self.mono:
290-
array = librosa.to_mono(array)
291-
if self.sampling_rate and self.sampling_rate != sampling_rate:
292-
array = librosa.resample(array, orig_sr=sampling_rate, target_sr=self.sampling_rate)
293-
sampling_rate = self.sampling_rate
294-
return array, sampling_rate
295-
296-
def _decode_mp3(self, path_or_file):
297-
try:
298-
import torchaudio
299-
except ImportError as err:
300-
raise ImportError("To support decoding 'mp3' audio files, please install 'torchaudio'.") from err
301-
if version.parse(torchaudio.__version__) < version.parse("0.12.0"):
302-
try:
303-
torchaudio.set_audio_backend("sox_io")
304-
except RuntimeError as err:
305-
raise ImportError("To support decoding 'mp3' audio files, please install 'sox'.") from err
306-
array, sampling_rate = self._decode_mp3_torchaudio(path_or_file)
307-
else:
308-
try: # try torchaudio anyway because sometimes it works (depending on the os and os packages installed)
309-
array, sampling_rate = self._decode_mp3_torchaudio(path_or_file)
310-
except RuntimeError:
311-
global _ffmpeg_warned
312-
if not _ffmpeg_warned:
313-
warnings.warn(
314-
"\nTo support 'mp3' decoding with `torchaudio>=0.12.0`, make sure you have `ffmpeg` system package with at least version 4 installed. "
315-
"Alternatively, you can downgrade `torchaudio`:\n\n"
316-
"\tpip install \"torchaudio<0.12\".\n\nOtherwise 'mp3' files will be decoded with `librosa`."
317-
)
318-
_ffmpeg_warned = True
319-
try:
320-
# flake8: noqa
321-
import librosa
322-
except ImportError as err:
323-
raise ImportError(
324-
"\nTo support 'mp3' decoding with `torchaudio>=0.12.0`, make sure you have `ffmpeg` system package with at least version 4 installed. "
325-
"\tpip install \"torchaudio<0.12\".\n\nTo decode 'mp3' files without `torchaudio`, please install `librosa`:\n\n"
326-
"\tpip install librosa\n\nNote that decoding might be extremely slow in that case."
327-
) from err
328-
# try to decode with librosa for torchaudio>=0.12.0 as a workaround
329-
global _librosa_warned
330-
if not _librosa_warned:
331-
warnings.warn("Decoding mp3 with `librosa` instead of `torchaudio`, decoding might be slow.")
332-
_librosa_warned = True
333-
try:
334-
array, sampling_rate = self._decode_mp3_librosa(path_or_file)
335-
except RuntimeError as err:
336-
raise RuntimeError(
337-
"Decoding of 'mp3' failed, probably because of streaming mode "
338-
"(`librosa` cannot decode 'mp3' file-like objects, only path-like)."
339-
) from err
340-
341-
return array, sampling_rate
342-
343-
def _decode_mp3_torchaudio(self, path_or_file):
344-
import torchaudio
345-
import torchaudio.transforms as T
346-
347-
array, sampling_rate = torchaudio.load(path_or_file, format="mp3")
348-
if self.sampling_rate and self.sampling_rate != sampling_rate:
349-
if not hasattr(self, "_resampler") or self._resampler.orig_freq != sampling_rate:
350-
self._resampler = T.Resample(sampling_rate, self.sampling_rate)
351-
array = self._resampler(array)
352-
sampling_rate = self.sampling_rate
353-
array = array.numpy()
354-
if self.mono:
355-
array = array.mean(axis=0)
356-
return array, sampling_rate
357-
358-
def _decode_mp3_librosa(self, path_or_file):
359-
import librosa
360-
361-
global _audioread_warned
362-
363-
with warnings.catch_warnings():
364-
if _audioread_warned:
365-
warnings.filterwarnings("ignore", "pysoundfile failed.+?", UserWarning, module=librosa.__name__)
366-
else:
367-
_audioread_warned = True
368-
array, sampling_rate = librosa.load(path_or_file, mono=self.mono, sr=self.sampling_rate)
369-
370-
return array, sampling_rate

src/datasets/utils/py_utils.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -640,9 +640,13 @@ def _save_regex(pickler, obj):
640640

641641
@pklregister(obj_type)
642642
def _save_tensor(pickler, obj):
643+
# `torch.from_numpy` is not picklable in `torch>=1.11.0`
644+
def _create_tensor(np_array):
645+
return torch.from_numpy(np_array)
646+
643647
dill_log(pickler, f"To: {obj}")
644648
args = (obj.detach().cpu().numpy(),)
645-
pickler.save_reduce(torch.from_numpy, args, obj=obj)
649+
pickler.save_reduce(_create_tensor, args, obj=obj)
646650
dill_log(pickler, "# To")
647651
return
648652

0 commit comments

Comments
 (0)