Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions fsspec/implementations/reference.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,11 @@ def __init__(
if self.engine == "pyarrow" and find_spec("pyarrow") is None:
raise ImportError("engine choice `pyarrow` is not installed.")

# apply `lru_cache` decorator manually per instance
# This way `self` reference is not held on class level
self.listdir = lru_cache()(self.listdir)
self._key_to_record = lru_cache(maxsize=4096)(self._key_to_record)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, unfortunately, is not correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate, please? What is incorrect?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from functools import lru_cache


class A:
    def __init__(self, value):
        self.volatile_value = value
        self.calc = lru_cache()(self.calc)

    def calc(self):
        return self.volatile_value

    def calc2(self):
        @lru_cache
        def _calc():
            return self.volatile_value

        return _calc()


instance = A('a')
print(instance.calc())
print(instance.calc2())
instance.volatile_value = 'b'
print(instance.calc())
print(instance.calc2())

This prints: a a a b.
The expected output should be a a b b.
This cache self.calc = lru_cache()(self.calc) does not account for the change of relevant class members.
This thus would only work if all class members are immutable.
But we cannot guarantee this with python.
And this may lead to mysterious bugs that are hard to debug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, could you please also remove this line as well so that ruff can pick up issues like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:O Oh no! You're totally right, self and its attributes are not considered for the cache key in this solution.

Thank you for catching that!

Copy link
Contributor Author

@honzaflash honzaflash Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So really the only viable way is to cache a standalone free function that resides at the module level.

Options that I see:

  1. cached module level function that is then called from the method
  2. cached staticmethod/classmethod that is then called from the actual method
  3. a cached function defined in the constructor or setup just like open_refs
  4. assume instances are immutable and accept current fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the function/method is not static - it depends on the specific instance, but those details are not going to change. So, it would need to be func(instance, args), which is exactly what we have.

Copy link
Contributor Author

@honzaflash honzaflash Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, see the earlier comment above with a snippet using a @staticmethod and then a regular method that just wraps it, passing in the necessary parameters from self. That is what I meant by a static method.
It's not that different from a module level function - really only a matter of namespace I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to add to the "immutable assumption", the class does inherit from collections.abc.MutableMapping - could this be changed to just collections.abc.Mapping (does not have __setitem__)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It is a mutable mapping, you can write to keys. These two functions do not depend on those keys, though.


def __getattr__(self, item):
if item in ("_items", "record_size", "zmetadata"):
self.setup()
Expand Down Expand Up @@ -219,7 +224,6 @@ def create(root, storage_options=None, fs=None, record_size=10000, **kwargs):
fs.pipe("/".join([root, ".zmetadata"]), json.dumps(met).encode())
return LazyReferenceMapper(root, fs, **kwargs)

@lru_cache
def listdir(self):
"""List top-level directories"""
dirs = (p.rsplit("/", 1)[0] for p in self.zmetadata if not p.startswith(".z"))
Expand Down Expand Up @@ -331,7 +335,6 @@ def _load_one_key(self, key):
# URL, offset, size
return selection[:3]

@lru_cache(4096)
def _key_to_record(self, key):
"""Details needed to construct a reference for one key"""
field, chunk = key.rsplit("/", 1)
Expand Down
Loading