Skip to content

_isdir() does not put entry in the dircache #702

@aabmass

Description

@aabmass

Thanks for the awesome project! I'm running into an issue where gcsfs is making many unexpected GET requests when I try to upload to a directory within a GCS bucket using put(). I believe the issue can be reproduced by just calling GcsFileSystem.put("somefile.txt", "gs://bucket/subdir/somefile.txt") with a subdirectory in a bucket. The network logs look like

2025-09-23 04:40:25,909 - gcsfs - DEBUG - _call -- GET: b/{}/o/{}, ('my-fake-bucket', 'v2/9bc1e04f-7b17-4da6-a32f-3d460e1e275c_outputs.json'), None
2025-09-23 04:40:26,067 - gcsfs - DEBUG - _call -- GET: b/{}/o, ('my-fake-bucket',), None
2025-09-23 04:40:26,213 - gcsfs - DEBUG - _call -- GET: b/{}/o/{}, ('my-fake-bucket', 'v2/9bc1e04f-7b17-4da6-a32f-3d460e1e275c_outputs.json'), None
2025-09-23 04:40:26,431 - gcsfs - DEBUG - _call -- POST: https://storage.googleapis.com/upload/storage/v1/b/my-fake-bucket/o, (), {'Content-Type': 'multipart/related; boundary="==0=="'}

_put() -> _isdir() -> _info() which calls GcsFileSystem._ls_from_cache() several times (1, 2) with cache misses. I did some debugging and dircache is remaining empty even though I'm doing this in a loop. My actual code is using simplecache and looks something like this:

for file in files:
  with fsspec.open(f"simplecache::gs://my-fake-bucket/v2/{file}", "w+") as file:
    json.dump(some_obj, file)

(which is calling GcsFileSystem.put() under the hood here)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions