-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Describe the bug
I am trying to download the librispeech_asr clean dataset, which results in a FSTimeoutError exception after downloading around 61% of the data.
Steps to reproduce the bug
import datasets
datasets.load_dataset("librispeech_asr", "clean")
The output is as follows:
Downloading data: 61%|βββββββββββββββ | 3.92G/6.39G [05:00<03:06, 13.2MB/s]Traceback (most recent call last):
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/fsspec/implementations/http.py", line 262, in _get_file
chunk = await r.content.read(chunk_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/aiohttp/streams.py", line 393, in read
await self._wait("read")
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/aiohttp/streams.py", line 311, in _wait
with self._timer:
^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/aiohttp/helpers.py", line 713, in exit
raise asyncio.TimeoutError from None
TimeoutErrorThe above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/load_dataset.py", line 3, in
datasets.load_dataset("librispeech_asr", "clean")
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/load.py", line 2096, in load_dataset
builder_instance.download_and_prepare(
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/builder.py", line 924, in download_and_prepare
self._download_and_prepare(
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/builder.py", line 1647, in _download_and_prepare
super()._download_and_prepare(
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/builder.py", line 977, in _download_and_prepare
split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Timon/.cache/huggingface/modules/datasets_modules/datasets/librispeech_asr/2712a8f82f0d20807a56faadcd08734f9bdd24c850bb118ba21ff33ebff0432f/librispeech_asr.py", line 115, in _split_generators
archive_path = dl_manager.download(_DL_URLS[self.config.name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/download/download_manager.py", line 159, in download
downloaded_path_or_paths = map_nested(
^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 512, in map_nested
_single_map_nested((function, obj, batched, batch_size, types, None, True, None))
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 380, in _single_map_nested
return [mapped_item for batch in iter_batched(data_struct, batch_size) for mapped_item in function(batch)]
^^^^^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/download/download_manager.py", line 216, in _download_batched
self._download_single(url_or_filename, download_config=download_config)
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/download/download_manager.py", line 225, in _download_single
out = cached_path(url_or_filename, download_config=download_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/utils/file_utils.py", line 205, in cached_path
output_path = get_from_cache(
^^^^^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/utils/file_utils.py", line 415, in get_from_cache
fsspec_get(url, temp_file, storage_options=storage_options, desc=download_desc, disable_tqdm=disable_tqdm)
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/utils/file_utils.py", line 334, in fsspec_get
fs.get_file(path, temp_file.name, callback=callback)
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/fsspec/asyn.py", line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/fsspec/asyn.py", line 101, in sync
raise FSTimeoutError from return_result
fsspec.exceptions.FSTimeoutError
Downloading data: 61%|βββββββββββββββ | 3.92G/6.39G [05:00<03:09, 13.0MB/s]
Expected behavior
Complete the download
Environment info
Python version 3.12.6
Dependencies:
dependencies = [
"accelerate>=0.34.2",
"datasets[audio]>=3.0.0",
"ipython>=8.18.1",
"librosa>=0.10.2.post1",
"torch>=2.4.1",
"torchaudio>=2.4.1",
"transformers>=4.44.2",
]
MacOS 14.6.1 (23G93)