Skip to content

Commit a6109be

Browse files
yarikopticclaude
andcommitted
bf: pass aiohttp timeouts to fsspec in RemoteReadableAsset.open()
The dandi-archive CLI integration tests started hanging for 6 hours on November 24, 2025 (dandi/dandi-archive#1762). Investigation of tinuous CI logs showed: - Nov 21 (last success): test_nwb2asset_remote_asset XFAIL'd in 0.4s - Nov 24 (first hang): same test hung for 6 hours until GitHub killed it - ALL test runs on Nov 24-25 hung, across unrelated PRs The code flow that hangs is: nwb2asset() → get_metadata() → _get_pynwb_metadata() → open_readable() → RemoteReadableAsset.open() → fsspec.open(url).open() → aiohttp HTTP read from minio in Docker → fsspec sync() blocks in threading.Event.wait() The key environmental change between Nov 21 and Nov 24 was dandi-cli PR #1744 updating dandischema from <0.12.0 to ~=0.12.0. With dandischema 0.11.x, the test hit a quick model validation mismatch (completing as XFAIL in 0.4s before reaching the fsspec read). With dandischema 0.12.0 (vendor-configurable models, schema 0.7.0), that mismatch no longer occurs, so the test now proceeds to the actual fsspec HTTP read — which hangs. The hang itself is a known interaction between h5py, fsspec, and GC: - h5py holds a global lock while reading from Python file objects - fsspec's sync() runs async aiohttp coroutines on a background thread and blocks the calling thread in threading.Event.wait() - Without socket-level timeouts, aiohttp blocks forever on stalled connections (aio-libs/aiohttp#11740) - GC running during this window can deadlock with h5py's lock (h5py/h5py#2019) The fix: pass explicit ClientTimeout to aiohttp via fsspec's client_kwargs so that stalled connections raise TimeoutError instead of blocking indefinitely. Additionally, the dandi-archive CI never had a pytest --timeout because dandi-cli's tox.ini [pytest] addopts (--timeout=300) are not read when pytest runs from the dandi-archive rootdir via `pytest --pyargs dandi`. References: - fsspec/filesystem_spec#1666 - h5py/h5py#2019 - aio-libs/aiohttp#11740 - #1762 - #1450 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 166324a commit a6109be

File tree

1 file changed

+18
-1
lines changed

1 file changed

+18
-1
lines changed

dandi/misctypes.py

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -345,10 +345,27 @@ def open(self) -> IO[bytes]:
345345
# Optional dependency:
346346
import fsspec
347347

348+
from aiohttp import ClientTimeout
349+
348350
# We need to call open() on the return value of fsspec.open() because
349351
# otherwise the filehandle will only be opened when used to enter a
350352
# context manager.
351-
return cast(IO[bytes], fsspec.open(self.url, mode="rb").open())
353+
#
354+
# Pass explicit timeouts to aiohttp to prevent indefinite hangs in
355+
# fsspec's sync() wrapper. Without these, a stalled connection to S3
356+
# (or minio in tests) causes fsspec's background IO thread to block
357+
# forever, which in turn blocks the calling thread in
358+
# threading.Event.wait() — see https://github.com/fsspec/filesystem_spec/issues/1666
359+
return cast(
360+
IO[bytes],
361+
fsspec.open(
362+
self.url,
363+
mode="rb",
364+
client_kwargs={
365+
"timeout": ClientTimeout(total=120, sock_read=60, sock_connect=30)
366+
},
367+
).open(),
368+
)
352369

353370
def get_size(self) -> int:
354371
return self.size

0 commit comments

Comments
 (0)