Skip to content

Conversation

@vodik
Copy link

@vodik vodik commented Dec 9, 2024

The problem with using a WeakRefDictionary is it's possible for distinct object instances to collide in the cache depending on their implementation of __eq__/__hash__. In particular, it prevents it from being used at all on Django models.

@vodik vodik requested a review from rhettinger as a code owner December 9, 2024 02:09
@ghost
Copy link

ghost commented Dec 9, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

@vodik
Copy link
Author

vodik commented Dec 9, 2024

I'm not an expert at weakref's, but is this even the right thing to use here? Wouldn't the closure we're caching hold a reference to obj alive?

        def _method(*args, **kwargs):
            if not args:
                raise TypeError(f'{funcname} requires at least '
                                '1 positional argument')
            return dispatch(args[0].__class__).__get__(obj, cls)(*args, **kwargs)

Wouldn't this hold a strong reference to obj and thus prevent either the original WeakKeyDictionary or my changed approach from cleaning up obj?

@vodik vodik changed the title gh-85160: Improve cacheing in singledispatchmethod gh-127750: Improve cacheing in singledispatchmethod Dec 9, 2024
@vodik vodik force-pushed the fix-issue-12345 branch 2 times, most recently from 2e08bd9 to a81b83e Compare December 9, 2024 02:26
@vodik
Copy link
Author

vodik commented Dec 9, 2024

Yes! Tracing through the logic manually, I can confirm that the weakref's callback does't fire unless I break the strong reference in _method.

So there's a bit of a space leak here (I assume Python's garbage collector might be able to figure it out though?)

I just put up a different that tried to solve this problem as well, but deviates further from the original. If we settle on an approach I'll figure out how to write tests...

'1 positional argument')
# Make sure to use the weakref here to prevent storing
# a strong reference to obj in the cache
return dispatch(args[0].__class__).__get__(obj_ref(), cls)(*args, **kwargs)
Copy link
Author

@vodik vodik Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this introduce a data race though? If I assigned the method and then deleted the underlying obj? I'll have to try it.

Is a weakref even appropriate for this cache then? Or should I throw a ReferenceError?

-                return dispatch(args[0].__class__).__get__(obj_ref(), cls)(*args, **kwargs)
+                obj = obj_ref()
+                if obj is None:
+                    raise ReferenceError
+
+                return dispatch(args[0].__class__).__get__(obj, cls)(*args, **kwargs)

Or is this whole approach maybe not correct enough?

Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, provide a regression test.

@eendebakpt
Copy link
Contributor

If the main issue is that different objects are colliding in the cache (due to equal __hash__ and __eq__ being equal), perhaps the only solution is to remove the cache altogether (e.g. revert #107148). Using the id in the cache can also cause collisions (for objects with non-overlapping life times https://docs.python.org/3/library/functions.html#id)

@AlexWaygood
Copy link
Member

Unless we can find a solution that is both robust and has less complexity than this patch, I'm inclined to agree with @eendebakpt — I think a clean revert would be better. It's sad, but it may be the best we can do :(

@vodik
Copy link
Author

vodik commented Dec 9, 2024

I'm coming to the same conclusion. This patch now makes this code break:

foo = Foo()
dispatcher = foo.mysingledismatchmethod
del foo
dispatcher(...)  # fails

What if we revert #107148 and I instead contribute a test case specifically for this situation? That'll help if someone wants to tackle this again. I personally wasn't worried about the performance of this feature.

Either way I'll give it another attempt tonight. Maybe there's another approach that can work.

@AlexWaygood
Copy link
Member

Possibly one way to make this work would be to use the id for the key but use weakref.finalize to register a callback for objects whose ids are stored in the cache such that the objects are evicted from the cache when they're garbage collected

@AlexWaygood
Copy link
Member

But yes, whether we think of a clever solution or just revert the feature, we should clearly add a regression test.

@AlexWaygood AlexWaygood changed the title gh-127750: Improve cacheing in singledispatchmethod gh-127750: Improve caching in singledispatchmethod Dec 9, 2024
@vodik
Copy link
Author

vodik commented Dec 9, 2024

@AlexWaygood Weakref aside, just to be clear, while working on this patch I found another problem with the caching - that the closure put in the cache also holds a reference to the same obj we key the cache with. So it's circular and won't cleanup. There's a memory leak here too.

I can't think of a way to break that without breaking the semantics of the language. And I think that's the deeper problem that requires this to be reverted rather than fixed. I don't think there is a fix for this.

@eendebakpt
Copy link
Contributor

One more try: what if we change the cache to a WeakValueDictionary, cache based on the id of the object and modify the definition of _method to

        def _method(*args, **kwargs):
            if not args:
                raise TypeError(f'{funcname} requires at least '
                                '1 positional argument')
            return dispatch(args[0].__class__).__get__(obj, cls)(*args, **kwargs)
        
        _method.__isabstractmethod__ = self.__isabstractmethod__
        _method.register = self.register
        update_wrapper(_method, self.func)
        _method._self_ref = (obj, _method) # holds a strong reference to the value _method in the WeakValueDictionary
  • The caching based on id avoids the collisions in the issue report
  • As long as the user has references to the object obj, the tuple _self_ref ensures there is a strong reference to the dictionary value, so the value is kept in the WeakValueDictionary
  • Once the only references to obj are internal in the _method, the garbage collector kicks in and removes the object
  • Since _obj and _method are collected at the same time, the id(obj) is a safe key to use.

It seems to work locally for me, but definitely needs a second look. I did not measure performance yet.

@rhettinger rhettinger removed their request for review December 10, 2024 05:31
@ZeroIntensity ZeroIntensity added the needs backport to 3.13 bugs and security fixes label Dec 11, 2024
Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we discuss approaches, I'd like to get a test down to make sure things are working.

@eendebakpt
Copy link
Contributor

Before we discuss approaches, I'd like to get a test down to make sure things are working.

In #127839 I added tests for the two issues observed here. They are in commits 6c6e8b8 and ef3f8c9

@vodik
Copy link
Author

vodik commented Dec 12, 2024

It seems reasonable to me too.

@vodik
Copy link
Author

vodik commented Dec 12, 2024

#127839 is a better approach

@vodik vodik closed this Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants