Skip to content

fsspec.exceptions.FSTimeoutError when downloading datasetΒ #7164

@timonmerk

Description

@timonmerk

Describe the bug

I am trying to download the librispeech_asr clean dataset, which results in a FSTimeoutError exception after downloading around 61% of the data.

Steps to reproduce the bug

import datasets
datasets.load_dataset("librispeech_asr", "clean")

The output is as follows:

Downloading data: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3.92G/6.39G [05:00<03:06, 13.2MB/s]Traceback (most recent call last):
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/fsspec/implementations/http.py", line 262, in _get_file
chunk = await r.content.read(chunk_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/aiohttp/streams.py", line 393, in read
await self._wait("read")
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/aiohttp/streams.py", line 311, in _wait
with self._timer:
^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/aiohttp/helpers.py", line 713, in exit
raise asyncio.TimeoutError from None
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/load_dataset.py", line 3, in
datasets.load_dataset("librispeech_asr", "clean")
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/load.py", line 2096, in load_dataset
builder_instance.download_and_prepare(
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/builder.py", line 924, in download_and_prepare
self._download_and_prepare(
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/builder.py", line 1647, in _download_and_prepare
super()._download_and_prepare(
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/builder.py", line 977, in _download_and_prepare
split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Timon/.cache/huggingface/modules/datasets_modules/datasets/librispeech_asr/2712a8f82f0d20807a56faadcd08734f9bdd24c850bb118ba21ff33ebff0432f/librispeech_asr.py", line 115, in _split_generators
archive_path = dl_manager.download(_DL_URLS[self.config.name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/download/download_manager.py", line 159, in download
downloaded_path_or_paths = map_nested(
^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 512, in map_nested
_single_map_nested((function, obj, batched, batch_size, types, None, True, None))
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/utils/py_utils.py", line 380, in _single_map_nested
return [mapped_item for batch in iter_batched(data_struct, batch_size) for mapped_item in function(batch)]
^^^^^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/download/download_manager.py", line 216, in _download_batched
self._download_single(url_or_filename, download_config=download_config)
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/download/download_manager.py", line 225, in _download_single
out = cached_path(url_or_filename, download_config=download_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/utils/file_utils.py", line 205, in cached_path
output_path = get_from_cache(
^^^^^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/utils/file_utils.py", line 415, in get_from_cache
fsspec_get(url, temp_file, storage_options=storage_options, desc=download_desc, disable_tqdm=disable_tqdm)
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/datasets/utils/file_utils.py", line 334, in fsspec_get
fs.get_file(path, temp_file.name, callback=callback)
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/fsspec/asyn.py", line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Timon/Documents/iEEG_deeplearning/wav2vec_pretrain/.venv/lib/python3.12/site-packages/fsspec/asyn.py", line 101, in sync
raise FSTimeoutError from return_result
fsspec.exceptions.FSTimeoutError
Downloading data: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 3.92G/6.39G [05:00<03:09, 13.0MB/s]

Expected behavior

Complete the download

Environment info

Python version 3.12.6

Dependencies:

dependencies = [
"accelerate>=0.34.2",
"datasets[audio]>=3.0.0",
"ipython>=8.18.1",
"librosa>=0.10.2.post1",
"torch>=2.4.1",
"torchaudio>=2.4.1",
"transformers>=4.44.2",
]

MacOS 14.6.1 (23G93)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions