Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a parent-process SIGSEGV seen when using accelerate launch on Intel XPU by avoiding Python/C++ teardown that triggers oneCCL static destructor order issues.
Changes:
- Add an XPU-only success-path early termination in
multi_gpu_launcher()aftertorch.distributed.runcompletes successfully.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| else: | ||
| raise | ||
| else: | ||
| if is_xpu_available(): |
There was a problem hiding this comment.
os._exit() bypasses normal shutdown (finally blocks, atexit handlers, logging/IO flushing). To reduce the chance of losing buffered output, consider explicitly flushing stdout/stderr (and/or logging handlers) immediately before the forced exit, and add a brief inline comment explaining that this is intentionally avoiding oneCCL static destructor teardown on XPU.
| if is_xpu_available(): | |
| if is_xpu_available(): | |
| # Intentionally bypass normal Python shutdown on XPU to avoid oneCCL | |
| # static destructor teardown; flush stdio and logging before forced exit. | |
| sys.stdout.flush() | |
| sys.stderr.flush() | |
| logging.shutdown() |
| import os as _os | ||
| _os._exit(0) |
There was a problem hiding this comment.
Calling os._exit(0) here will unconditionally terminate the entire Python process whenever multi_gpu_launcher() completes successfully on an XPU system. That’s risky when launch_command()/multi_gpu_launcher() are invoked programmatically (e.g., from a larger Python process or test runner), since it can prematurely end the host process and skip remaining work/tests. Consider gating this behavior behind an explicit CLI-only signal (e.g., an env var set by the accelerate entrypoint, or a dedicated argument/flag) so library/test usage can return normally while the CLI path still avoids the oneCCL teardown crash.
| import os as _os | |
| _os._exit(0) | |
| # On XPU systems, a hard process exit is used to avoid a known oneCCL teardown crash. | |
| # This can be disabled for library/test usage by setting the environment variable | |
| # ACCELERATE_DISABLE_XPU_PROCESS_EXIT to a truthy value (e.g. "1", "true", "yes"). | |
| disable_exit = os.environ.get("ACCELERATE_DISABLE_XPU_PROCESS_EXIT", "").lower() | |
| if disable_exit not in {"1", "true", "yes"}: | |
| import os as _os | |
| _os._exit(0) |
What does this PR do?
This PR fixes SIGSEGV (exit code 139) on XPU.
When using accelerate launch on Intel XPU, the parent process crashes with SIGSEGV (exit code 139) after all worker processes complete successfully. The root cause is a C++ static destruction order fiasco in libccl.so (Intel oneCCL). The parent process loads libccl.so as a side effect of import torch (line 26 of launch.py), which registers ~30+ file-scope std::map objects, a logger singleton, and other global C++ objects with non-trivial destructors. The parent never initializes or uses oneCCL — it only spawns workers via torch.distributed.run and waits for them. However, when the parent's Python interpreter shuts down, it triggers C++ static destructors across all loaded shared libraries, and the interdependencies among oneCCL's global objects cause a use-after-free crash. This does not occur on CUDA because libnccl.so's global objects have simpler destruction semantics. The fix calls os._exit(0) in the success path of multi_gpu_launcher() when XPU is available, which immediately terminates the parent process without running Python finalizers or C++ static destructors — this is safe because the parent has no remaining work after workers exit, and os._exit() is the standard pattern for avoiding teardown-order crashes in launcher processes (e.g., multiprocessing forkserver uses the same approach).