Skip to content

Pickle exception handling could state object path #130621

@albertz

Description

@albertz

Feature or enhancement

Proposal:

Consider the case that you get some exception during unpickling. This could be anything, in maybe your custom object __setstate__ or whatever else. For example, we got this crash:

...
  File "/u/dorian.koch/setups/2024-10-11--denoising-lm/recipe/returnn/returnn/util/multi_proc_non_daemonic_spawn.py", line 156, in NonDaemonicSpawnProcess._reconstruct_with_pre_init_func
    line: reconstruct_func, reconstruct_args, reconstruct_state = pickle.load(buffer)
    locals:
      reconstruct_func = <not found>
      reconstruct_args = <not found>
      reconstruct_state = <not found>
      pickle = <global> <module 'pickle' from '/work/tools/users/zeyer/linuxbrew/opt/[email protected]/lib/python3.11/pickle.py'>
      pickle.load = <global> <built-in function load>
      buffer = <local> <_io.BytesIO object at 0x74bbaa61e610>
  File "/work/tools/users/zeyer/linuxbrew/opt/[email protected]/lib/python3.11/multiprocessing/synchronize.py", line 110, in SemLock.__setstate__
    line: self._semlock = _multiprocessing.SemLock._rebuild(*state)
    locals:
      self = <local> <Lock(owner=unknown)>
      self._semlock = <local> !AttributeError: 'Lock' object has no attribute '_semlock'
      _multiprocessing = <global> <module '_multiprocessing' from '/work/tools/users/zeyer/linuxbrew/opt/[email protected]/lib/python3.11/lib-dynload/_multiprocessing.cpython-311-x86_64-linux-gnu.so'>
      _multiprocessing.SemLock = <global> <class '_multiprocessing.SemLock'>
      _multiprocessing.SemLock._rebuild = <global> <built-in method _rebuild of type object at 0x74bbb60322c0>
      state = <local> (132092164476928, 1, 1, '/mp-2wkdacg_')
FileNotFoundError: [Errno 2] No such file or directory

So, SemLock.__setstate__ fails here for some reason. Maybe some race condition. But when I saw this crash, my first thought was, where actually do we have a SemLock inside the pickled object?

So, this is what I would like: In case of an exception during unpickling, it can show me the object path during the construction which lead to this object. (In case there are multiple references to the object, just show me the first.)

I'm not sure exactly how this should be done. It means some overhead. For every single object that pickle creates, we would need to store the creating parent object + name/index/whatever. So maybe this is a feature which should be optional. It would be fine for me if I run unpickling first without, and if I get some exception, I run unpickling again with this debug flag enabled. Maybe it's also fine if this is only in the pure Python implementation.

Maybe I can already do sth like this by checking the local self in the stack frame where the exception occured and then using gc.get_referrers to get back to the root?

Links to previous discussion of this feature:

https://discuss.python.org/t/pickle-exception-handling-could-state-object-path/82395

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions