-
Notifications
You must be signed in to change notification settings - Fork 157
Open
Description
Thanks for the awesome project! I'm running into an issue where gcsfs is making many unexpected GET
requests when I try to upload to a directory within a GCS bucket using put()
. I believe the issue can be reproduced by just calling GcsFileSystem.put("somefile.txt", "gs://bucket/subdir/somefile.txt")
with a subdirectory in a bucket. The network logs look like
2025-09-23 04:40:25,909 - gcsfs - DEBUG - _call -- GET: b/{}/o/{}, ('my-fake-bucket', 'v2/9bc1e04f-7b17-4da6-a32f-3d460e1e275c_outputs.json'), None
2025-09-23 04:40:26,067 - gcsfs - DEBUG - _call -- GET: b/{}/o, ('my-fake-bucket',), None
2025-09-23 04:40:26,213 - gcsfs - DEBUG - _call -- GET: b/{}/o/{}, ('my-fake-bucket', 'v2/9bc1e04f-7b17-4da6-a32f-3d460e1e275c_outputs.json'), None
2025-09-23 04:40:26,431 - gcsfs - DEBUG - _call -- POST: https://storage.googleapis.com/upload/storage/v1/b/my-fake-bucket/o, (), {'Content-Type': 'multipart/related; boundary="==0=="'}
_put()
-> _isdir()
-> _info()
which calls GcsFileSystem._ls_from_cache()
several times (1, 2) with cache misses. I did some debugging and dircache
is remaining empty even though I'm doing this in a loop. My actual code is using simplecache and looks something like this:
for file in files:
with fsspec.open(f"simplecache::gs://my-fake-bucket/v2/{file}", "w+") as file:
json.dump(some_obj, file)
(which is calling GcsFileSystem.put()
under the hood here)
Metadata
Metadata
Assignees
Labels
No labels