You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use spawn for multiprocessing start method (#3284)
Summary:
Pull Request resolved: #3284
CUDA context initialization is not fork-safe. If a CUDA context is created in a parent process, and then the process is forked (using `os.fork()`), the child process may encounter errors or undefined behavior when using CUDA. This is because the CUDA driver and runtime are not designed to be safely duplicated via `fork()`. It's recommended to use `spawn` or `forkserver`.
Among the two, `forkserver` needs to be use carefully and specifically, it's recommended to call `multiprocessing.set_start_method('forkserver')` at the very start of the program, and the parent process also needs to avoid initializing the CUDA context. When upgrading APS to CUDA 12.8, we encountered a test failure, and the test is apparently initializing the CUDA context before starting up two children processes, and I suspect that caused the test to hang - [post](https://fb.workplace.com/groups/319878845696681/posts/1494595861558301).
It's hard to avoid initializing the CUDA context early in this test, because it checks the GPU count in the test method's decorator - [code](https://fburl.com/code/27naz2eg). Among the `spawn` and `forkserver` start methods, `spawn` is less efficient but it's the most robust. Let's switch to that instead to avoid any potential undefined behaviors with CUDA 12.8 and multiprocessing.
Reviewed By: adamomainz, weifengpy
Differential Revision: D80305233
fbshipit-source-id: 228b09d7a40bfa8b4d7ee3c3d926db5c631fffcb
0 commit comments