Skip to content

SimpleCacheFileSystem._cache grows with every unique chained urlpath fsspec.open call #1930

@ap--

Description

@ap--

Hi @martindurant

When working with chained urlpaths I noticed that the SimpleCacheFileSystem._cache dict kept growing.
The root cause is that the "fo" item in storage_options causes fsspec.utils.tokenize to return tokens dependent on the path that's being accessed:

# cat fsspec-simplecache-tokenize.py 
import fsspec
from fsspec.utils import tokenize

fs0 = fsspec.open('simplecache::memory:///foo').fs
fs1 = fsspec.open('simplecache::memory:///bar').fs

print("fs0.storage_options", fs0.storage_options)
print("fs1.storage_options", fs1.storage_options)
print("id(fs0)", id(fs0))
print("id(fs1)", id(fs1))
print("fs0._fs_token_", fs0._fs_token_)
print("fs1._fs_token_", fs1._fs_token_)
$ python fsspec-simplecache-tokenize.py
fs0.storage_options {'target_options': {}, 'target_protocol': 'memory', 'fo': '/foo'}
fs1.storage_options {'target_options': {}, 'target_protocol': 'memory', 'fo': '/bar'}
id(fs0) 124024007322416
id(fs1) 124024006983344
fs0._fs_token_ e838936f5e20160de437f995501e2d03
fs1._fs_token_ 995b48f48e0829c5b85cc01f83d2099a

Should the "fo" item be popped from the kwargs provided to fsspec.utils.tokenize (and all nested target_options)?
I wonder if this should be applied in general or only for filesystems that have a passthrough behavior like simplecache.

Thanks,
Andreas

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions