This repository was archived by the owner on Sep 30, 2022. It is now read-only.
v2.x: PSM/PSM2: Disable signal handler hijacking by default #1222
+46
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Per discussion on open-mpi/ompi#1767 (and some
subsequent phone calls and off-issue email discussions), the PSM
library is hijacking signal handlers by default. Specifically: unless
the environment variables
IPATH_NO_BACKTRACE=1(for PSM / IntelTrueScale) is set, the library constructor for this library will
hijack various signal handlers for the purpose of invoking its own
error reporting mechanisms.
This may be a bit surprising, but is not a problem, per se. The
real problem is that older versions of at least the PSM library do not
unregister these signal handlers upon being unloaded from memory.
Hence, a segv can actually result in a double segv (i.e., the original
segv and then another segv when the now-non-existent signal handler is
invoked).
This PSM signal hijacking subverts Open MPI's own signal reporting
mechanism, which may be a bit surprising for some users (particularly
those who do not have Intel TrueScale). As such, we disable it by
default so that Open MPI's own error-reporting mechanisms are used.
Additionally, there is a typo in the library destructor for the PSM2
library that may cause problems in the unloading of its signal
handlers. This problem can be avoided by setting
HFI_NO_BACKTRACE=1(for PSM2 / Intel OmniPath).
This is further compounded by the fact that the PSM / PSM2 libraries
can be loaded by the OFI MTL and the usNIC BTL (because they are
loaded by libfabric), even when there is no Intel networking hardware
present. Having the PSM/PSM2 libraries behave this way when no Intel
hardware is present is clearly undesirable (and is likely to be fixed
in future releases of the PSM/PSM2 libraries).
This commit sets the following two environment variables to disable
this behavior from the PSM/PSM2 libraries (if they are not already
set):
If the user has set these variables before invoking Open MPI, we will
not override their values (i.e., their preferences will be honored).
Signed-off-by: Jeff Squyres [email protected]
(cherry picked from commit open-mpi/ompi@5071602)
Reviewed by @rhc54 @matcabral
@hppritcha When CI finishes, good to go.