Skip to content

conv2d on linux does not work with parallel_saveΒ #43

@mdales

Description

@mdales

When you use conv2d from PyTorch with parallel_save it hangs. This was first seen in CI for PR #42 - not because of that PR, but because I fixed the tests, which in the "parallel" test case for conv2d was calling the save method not parallel_save - so this exposed what was a bug with the original implementation.

It's worth noting that the tests pass on macOS, but fail on Linux. If we interrupt it we see the following error:

This process (pid=366448) is multi-threaded, use of fork() may lead to deadlocks in the child.

This is then consistent with the hang that we see. My assumption is that under the hood PyTorch is using parallelism too, and it's a known issue that you shouldn't have children be parents in multiprocessing.

I'll revert the "fix" to the tests in PR #42, and this bug is then that we have to solve this somehow.

  • Change the spawn mode for multiprocessing?
  • Use PyTorch multiprocessing?
  • Detect if conv2d is used in an expression, and if so revert to non-parallel save/sum?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions