This repository was archived by the owner on Sep 30, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 68
v2.x: PSM/PSM2: Disable signal handler hijacking by default #1224
Merged
hppritcha
merged 2 commits into
open-mpi:v2.x
from
rhc54:pr/v2.0.0/disable-psm-psm2-signal-hijacking
Jun 15, 2016
Merged
v2.x: PSM/PSM2: Disable signal handler hijacking by default #1224
hppritcha
merged 2 commits into
open-mpi:v2.x
from
rhc54:pr/v2.0.0/disable-psm-psm2-signal-hijacking
Jun 15, 2016
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Per discussion on open-mpi/ompi#1767 (and some subsequent phone calls and off-issue email discussions), the PSM library is hijacking signal handlers by default. Specifically: unless the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel TrueScale) is set, the library constructor for this library will hijack various signal handlers for the purpose of invoking its own error reporting mechanisms. This may be a bit *surprising*, but is not a *problem*, per se. The real problem is that older versions of at least the PSM library do not unregister these signal handlers upon being unloaded from memory. Hence, a segv can actually result in a double segv (i.e., the original segv and then another segv when the now-non-existent signal handler is invoked). This PSM signal hijacking subverts Open MPI's own signal reporting mechanism, which may be a bit surprising for some users (particularly those who do not have Intel TrueScale). As such, we disable it by default so that Open MPI's own error-reporting mechanisms are used. Additionally, there is a typo in the library destructor for the PSM2 library that may cause problems in the unloading of its signal handlers. This problem can be avoided by setting `HFI_NO_BACKTRACE=1` (for PSM2 / Intel OmniPath). This is further compounded by the fact that the PSM / PSM2 libraries can be loaded by the OFI MTL and the usNIC BTL (because they are loaded by libfabric), even when there is no Intel networking hardware present. Having the PSM/PSM2 libraries behave this way when no Intel hardware is present is clearly undesirable (and is likely to be fixed in future releases of the PSM/PSM2 libraries). This commit sets the following two environment variables to disable this behavior from the PSM/PSM2 libraries (if they are not already set): * IPATH_NO_BACKTRACE=1 * HFI_NO_BACKTRACE=1 If the user has set these variables before invoking Open MPI, we will not override their values (i.e., their preferences will be honored). Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit open-mpi/ompi@5071602)
subsequent phone calls and off-issue email discussions), the PSM
library is hijacking signal handlers by default. Specifically: unless
the environment variables IPATH_NO_BACKTRACE=1 (for PSM / Intel
TrueScale) is set, the library constructor for this library will
hijack various signal handlers for the purpose of invoking its own
error reporting mechanisms.
This may be a bit surprising, but is not a problem, per se. The
real problem is that older versions of at least the PSM library do not
unregister these signal handlers upon being unloaded from memory.
Hence, a segv can actually result in a double segv (i.e., the original
segv and then another segv when the now-non-existent signal handler is
invoked).
This PSM signal hijacking subverts Open MPI's own signal reporting
mechanism, which may be a bit surprising for some users (particularly
those who do not have Intel TrueScale). As such, we disable it by
default so that Open MPI's own error-reporting mechanisms are used.
Additionally, there is a typo in the library destructor for the PSM2
library that may cause problems in the unloading of its signal
handlers. This problem can be avoided by setting HFI_NO_BACKTRACE=1
(for PSM2 / Intel OmniPath).
This is further compounded by the fact that the PSM / PSM2 libraries
can be loaded by the OFI MTL and the usNIC BTL (because they are
loaded by libfabric), even when there is no Intel networking hardware
present. Having the PSM/PSM2 libraries behave this way when no Intel
hardware is present is clearly undesirable (and is likely to be fixed
in future releases of the PSM/PSM2 libraries).
This commit sets the following two environment variables to disable
this behavior from the PSM/PSM2 libraries (if they are not already
set):
IPATH_NO_BACKTRACE=1
HFI_NO_BACKTRACE=1
If the user has set these variables before invoking Open MPI, we will
not override their values (i.e., their preferences will be honored).
Signed-off-by: Jeff Squyres [email protected]
(cherry picked from commit open-mpi/ompi@5071602)
Reviewed by @rhc54 @matcabral
@hppritcha When CI finishes, good to go.
|
Test FAILed. |
|
@jladd-mlnx any idea what the failure is here? I ask because this change should have zero effect on Mellanox hardware. It passed Jenkins on the other branches. I don't think there were other PRs on v2.0 that failed, were there? |
|
bot:retest |
|
Test PASSed. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Per discussion on open-mpi/ompi#1767 (and some
subsequent phone calls and off-issue email discussions), the PSM
library is hijacking signal handlers by default. Specifically: unless
the environment variables IPATH_NO_BACKTRACE=1 (for PSM / Intel
TrueScale) is set, the library constructor for this library will
hijack various signal handlers for the purpose of invoking its own
error reporting mechanisms.
This may be a bit surprising, but is not a problem, per se. The
real problem is that older versions of at least the PSM library do not
unregister these signal handlers upon being unloaded from memory.
Hence, a segv can actually result in a double segv (i.e., the original
segv and then another segv when the now-non-existent signal handler is
invoked).
This PSM signal hijacking subverts Open MPI's own signal reporting
mechanism, which may be a bit surprising for some users (particularly
those who do not have Intel TrueScale). As such, we disable it by
default so that Open MPI's own error-reporting mechanisms are used.
Additionally, there is a typo in the library destructor for the PSM2
library that may cause problems in the unloading of its signal
handlers. This problem can be avoided by setting HFI_NO_BACKTRACE=1
(for PSM2 / Intel OmniPath).
This is further compounded by the fact that the PSM / PSM2 libraries
can be loaded by the OFI MTL and the usNIC BTL (because they are
loaded by libfabric), even when there is no Intel networking hardware
present. Having the PSM/PSM2 libraries behave this way when no Intel
hardware is present is clearly undesirable (and is likely to be fixed
in future releases of the PSM/PSM2 libraries).
This commit sets the following two environment variables to disable
this behavior from the PSM/PSM2 libraries (if they are not already
set):
If the user has set these variables before invoking Open MPI, we will
not override their values (i.e., their preferences will be honored).
Signed-off-by: Jeff Squyres [email protected]
(cherry picked from commit open-mpi/ompi@5071602)
Reviewed by @rhc54 @matcabral
@hppritcha When CI finishes, good to go.