- 
                Notifications
    You must be signed in to change notification settings 
- Fork 929
Enable the PMIx event notification capability #1767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @ggouaillardet @bosilca I could use some help here, folks. Everything seems to be working just fine with the event notification code, with the exception of MPI_Abort. The application procs that don't call "abort" are segfaulting, but I cannot get a complete core. I've tried a variety of tricks, but nothing I do seems to help get a core, and so I am having trouble identifying the source of the failure. I'd appreciate any insight you can provide on where the fault is occurring. | 
| @rhc54 i will give it a try | 
| @ggouaillardet Just gave it a try with -disable-dlopen, and as you suspected it hides the bug - no failures. | 
| i get a segfault in  in  if configure'd with  i ll make a quick patch for that and see how things evolve ... | 
| @ggouaillardet I think what is happening is that an event is still defined and firing after the component holding the callback function is unloaded. I thought I had some ideas on where it might be, but they proved incorrect. Still searching, so holler if you gain any insights | 
| holler holler holler after the trivial patch to make  i do not know how the MPI tasks are supposed to be killed (suicide in PMIx ? external kill by orted ?) but what happens is the tasks do receive  and guess who is trapping  as a workaround, you can either  this  
 @jsquyres any thoughts ? (a consistent behavior might be desired in  | 
| @ggouaillardet Hmm.  I thought the  | 
| i quickly checked that, and the fix is only available from OFED 3.18-2rc1 i run on an up-to-date CentOS 7 box, and this fix might not even land before RHEL8 ... i'd rather put it this way | 
| That's a fair point. @matcabral @yburette Is there a way to work around this in Open MPI if the user hasn't upgraded to OFED >= v3.18-2rc1? | 
| @ggouaillardet Wow - we definitely need someway of solving this more generally. I've wasted days of my time chasing this ghost again, and it keeps biting us. Putting this out there in the wild? We'll have users going nuts chasing false segfaults. | 
| I will inform RedHat tomorrow, they might be willing to backport the fix. | 
| one more thing ... | 
| @ggouaillardet I don't think so - I think the core gets truncated because the lib causing the segfault is no longer loaded in memory. One possible workaround: if I could detect that we have the older ofed, and/or that psm was considered and declined, then I could skip the sigterm and go directly to sigkill, thus avoiding the problem. Anyone have an idea on whether or not that is possible? | 
| FWIW: at least on my system, the  | 
| @rhc54 It should be possible for the PSM/PSM2 MTLs to detect the older version; it would be great if they could react accordingly (e.g., putenv something that tells their library not intercept SIGTERM...?). | 
| in your environment, libinfinipath.so might be pulled by libfabric. for the time being, the easiest option would be to any thoughts ? | 
| I'll bug the folks over here - I think @jsquyres proposal makes the most sense, and hopefully is doable without great pain. | 
| Ah, just remembered - @matcabral is on vacation this week. Will bring this to his attention when he returns. Thanks guys! | 
| keep in mind the PSM signal handlers are set in the library constructor. | 
| Crumb - that's right. Doing a putenv in the component won't solve the problem, will it? Sounds like it has to be in the environ prior to opening the component so they don't register that errhandler in the constructor. Which means that the configure logic has to pickup the situation, but the putenv has to go in MPI_Init (or somewhere before any libfabric-based component is opened). Sigh. | 
| yep, the component is linked with the PSM library, so the environment variable must be set before the component is dlopen'ed | 
| on second thought ... the signal handler is set in the constructor, e.g. when the component is dlopen'ed strictly speaking, the component can overwrite the signal handler (but not with a function defined in the component) if the PSM lib is busted. | 
| good point - but that would mean coordinating between the various components that dlopen either libfabric or the psm/psm2 libraries directly since we cannot know which one(s) might be touched (e.g., user could ignore some via mca param). Could become a rather invasive solution, so I imagine just detecting it at configure and pushing a global envar is the least-painful solution | 
| Hey guys, I'm catching with this thread. @rhc54 I was on vacation the first days of the week (traveling far away south), now I'm just in a different time zone, GMT -3. So, back to the signals hijacking. For libinfinipath (PSM gen1) the issue was solved in the library itself and the patch for OMPI should check the version when opening and rejecting the old one. I don't have the versions here, but will open bug to solve this. | 
| @matcabral Thanks for checking in! The problem is that we cannot let the component even be loaded, and so checking the version and rejecting the old one is too late. What we need to do is detect the old version during configure, and then set that envar during MPI_Init so that the library does the right thing when loaded (even if it subsequently declines and unloads itself). | 
| A few points for this discussion... 
 | 
| FYI | 
| fwiw, here is a simple patch that sets  diff --git a/config/ompi_check_psm.m4 b/config/ompi_check_psm.m4
index 44a5834..e63923b 100644
--- a/config/ompi_check_psm.m4
+++ b/config/ompi_check_psm.m4
@@ -12,7 +12,7 @@ dnl Copyright (c) 2004-2006 The Regents of the University of California.
 dnl                         All rights reserved.
 dnl Copyright (c) 2006      QLogic Corp. All rights reserved.
 dnl Copyright (c) 2009-2016 Cisco Systems, Inc.  All rights reserved.
-dnl Copyright (c) 2015      Research Organization for Information Science
+dnl Copyright (c) 2015-2016 Research Organization for Information Science
 dnl                         and Technology (RIST). All rights reserved.
 dnl Copyright (c) 2016      Los Alamos National Security, LLC. All rights
 dnl                         reserved.
@@ -44,6 +44,7 @@ AC_DEFUN([OMPI_CHECK_PSM],[
    ompi_check_psm_$1_save_CPPFLAGS="$CPPFLAGS"
    ompi_check_psm_$1_save_LDFLAGS="$LDFLAGS"
    ompi_check_psm_$1_save_LIBS="$LIBS"
+   ompi_check_psm_$1_busted=0
    AS_IF([test "$with_psm" != "no"],
               [AS_IF([test ! -z "$with_psm" && test "$with_psm" != "yes"],
@@ -77,9 +78,24 @@ AC_DEFUN([OMPI_CHECK_PSM],[
                [AC_MSG_WARN([glob.h not found.  Can not build component.])
                ompi_check_psm_happy="no"])])
+       AS_IF([test "$ompi_check_psm_happy" = "yes"],
+              [AC_COMPILE_IFELSE([AC_LANG_SOURCE([
+#include <psm.h>
+
+#if PSM_VERNO < 0x0110
+#error busted PSM library
+#endif
+])],
+                                 [],
+                                 [ompi_check_psm_$1_busted=1])])
+
    OPAL_SUMMARY_ADD([[Transports]],[[Intel TrueScale (PSM)]],[$1],[$ompi_check_psm_happy])
     fi
+    AC_DEFINE_UNQUOTED([OMPI_PSM_BUSTED],
+                       [$ompi_check_psm_$1_busted],
+                       [Whether libinfinipath.so unsets its signal handler in the destructor])
+
     AS_IF([test "$ompi_check_psm_happy" = "yes"],
           [$1_LDFLAGS="[$]$1_LDFLAGS $ompi_check_psm_LDFLAGS"
       $1_CPPFLAGS="[$]$1_CPPFLAGS $ompi_check_psm_CPPFLAGS"
diff --git a/ompi/runtime/ompi_mpi_init.c b/ompi/runtime/ompi_mpi_init.c
index 5616992..6b3b3ca 100644
--- a/ompi/runtime/ompi_mpi_init.c
+++ b/ompi/runtime/ompi_mpi_init.c
@@ -489,6 +489,12 @@ int ompi_mpi_init(int argc, char **argv, int requested, int *provided)
         putenv(av);
     }
+#if OMPI_PSM_BUSTED
+    if (NULL == getenv("IPATH_NO_BACKTRACE")) {
+        putenv("IPATH_NO_BACKTRACE=1");
+    }
+#endif
+
     /* open the rte framework */
     if (OMPI_SUCCESS != (ret = mca_base_framework_open(&ompi_rte_base_framework, 0))) {
         error = "ompi_rte_base_open() failed"; | 
| @rhc54 how many other cases had required in the past analyzing signal handlers inside OMPI? it seems to me that this is an isolated case that is trying to workaround a bug in an old version of a specific library. Unless there are other uses, it could just be enough to add a few lines with clear comments on why they are there. | 
| @rhc54 Oy, I forgot about libfabric. Good point. Hmm. Perhaps we need a little infrastructure to register signal handlers that should definitely be removed after dlclose? E.g., MTL PSM (and OFI MTL?) can register the PSM signal handlers. When the MCA finally dlcloses those MTLs, it can ensure that those signal handlers are not set (i.e., if they are set, reset them to SIG_DFL). | 
| @matcabral I'm unaware of any other instances. But the "infrastructure" I'm referring to could be quite small / easy. Perhaps something like: // I'm typing this off the top of my head -- types are made up
void mca_register_handlers_to_clear(mca_component_t *component, sig_handler_fn_t signal_handler);These 2 pieces of info should be good enough for the MCA subsystem to check and see if those signal handlers are still present upon dlclose. ...actually, I think I've lost track in the thread here: are we talking about ensuring that those signal handlers are gone upon dlclose, or ensuring that they're gone during MTL PSM and OFI component open? | 
| I do think we need to avoid overreacting and creating too much work here. However, my point was just that adding protection specifically in the PSM/MTL component isn't sufficient. At least on my machine, the problem is coming in thru libfabric, so we need a solution that covers all impacted components. | 
| Just to help @jsquyres back on track - we are dealing with the problem where PSM hijacks the signal handlers upon dl_open, and doesn't deregister them when closed, thus leaving them pointing to invalid memory. The only time this surfaces is when someone hits those procs with a SIGTERM (or one of the other hijacked signals). It would never be seen during "normal" execution. | 
| @rhc54 Right, but where did we land: did we want to always / unconditionally un-register the PSM handlers (even during component init)? Or are we solely concerned with making sure they're deregistered when the library is unloaded? | 
| @jsquyres Hmmm...that's a good point. Technically, we avoid the segfault with the latter. However, leaving those handlers registered when the PSM library remains loaded means that the user will get unexpected behavior - i.e., instead of their handler being called, the PSM handler will execute. I gather the PSM handler creates a file and stuffs a backtrace into it, which means we litter the filesystem with files that the user (a) doesn't know about and (b) we cannot clean up. So I'd vote that we always deregister them, but I'm not hard on that opinion | 
| IMHO, the problem with major severity is the one to address: leaving the handlers pointing to dummy addresses after dlcose in the "old" psm lib. Which was addressed in the newer lib version. | 
| (Only reading this today.) As far as the OFI MTL is concerned, I think that we can catch this at the libfabric level. The PSM provider would make sure to de-register the handlers upon dlclose. Am I missing something or would this be enough? | 
| @matcabral After thinking about your response a bit, I have to disagree. I just realized that your PSM library can even effect my usNIC BTL (because it uses libfabric). If this only affected your setups, I would agree that Intel as the owning vendor/organization can do whatever you want. But I definitely do not want usNIC customers to have to adhere to the PSM library putting stack traces in Intel-specific places on the filesystem. Specifically: I want usNIC customers to see the Open MPI default behavior. Please make that possible without requiring my customers to have to set Intel-specific environment variables. | 
| @yburette I'm sorry, but that is not enough. It still forces PSM-specific behavior when the usNIC BTL is used (because libfabric has not yet been dlclosed). | 
| @jsquyres what if we (OpenMPI) take the problem the other way around if (NULL != getenv("IPATH_NO_BACKTRACE)) {
   putenv("IPATH_NO_BACKTRACE=1");
}in  that would do the trick for both  | 
| that's the solution we were considering (Jeff and I discussed on the phone), but I'm talking to folks internally here to ensure we have both a short and long-term answer | 
| @ggouaillardet Yeah, per @rhc54's comment, I think it's going to come down to this. It's a horrid abstraction break (i.e., putting vendor/transport-specific code in the core), but we might be out of options here. 😧 (I forgot about  I'll file a PR in the immediate future for setting both  | 
Per discussion on open-mpi#1767 (and some subsequent phone calls and off-issue email discussions), the PSM and PSM2 libraries are hijacking signal handlers by default. Specifically: unless the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel TrueScale) and `HFI_NO_BACKTRACE=1` (for PSM2 / Intel OmniPath) are set, the library constructors for these two libraries will hijack various signal handlers for the purpose of invoking their own error reporting mechanisms. This may be a bit *surprising*, but is not a *problem*, per se. The real problem is that older versions of at least the PSM library do not unregister these signal handlers upon being unloaded from memory. Hence, a segv can actually result in a double segv (i.e., the original segv and then another segv when the now-non-existent signal handler is invoked). This is further compounded by the fact that the PSM / PSM2 libraries can be loaded by the OFI MTL and the usNIC BTL (because they are loaded by libfabric), even when there is no Intel networking hardware present. Having the PSM libraries behave this way when no Intel hardware is present is clearly undesirable (and is likely to be fixed in future releases of the PSM/PSM2 libraries). Finally, this signal hijacking subverts Open MPI's own signal reporting mechanism, which may be a bit surprising for some users (particularly those who do not have Intel TrueScale/OmniPath hardware). As such, we disable it by default so that Open MPI's own error-reporting mechanisms are used. This commit will set the following two environment variables to disable the signal hijacking from the PSM/PSM2 libraries (if they are not already set): * IPATH_NO_BACKTRACE=1 * HFI_NO_BACKTRACE=1 If the user has set these variables before invoking Open MPI, we will not override their values (i.e., their preferences will be honored). Signed-off-by: Jeff Squyres <[email protected]>
Per discussion on open-mpi#1767 (and some subsequent phone calls and off-issue email discussions), the PSM and PSM2 libraries are hijacking signal handlers by default. Specifically: unless the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel TrueScale) and `HFI_NO_BACKTRACE=1` (for PSM2 / Intel OmniPath) are set, the library constructors for these two libraries will hijack various signal handlers for the purpose of invoking their own error reporting mechanisms. This may be a bit *surprising*, but is not a *problem*, per se. The real problem is that older versions of at least the PSM library do not unregister these signal handlers upon being unloaded from memory. Hence, a segv can actually result in a double segv (i.e., the original segv and then another segv when the now-non-existent signal handler is invoked). This is further compounded by the fact that the PSM / PSM2 libraries can be loaded by the OFI MTL and the usNIC BTL (because they are loaded by libfabric), even when there is no Intel networking hardware present. Having the PSM libraries behave this way when no Intel hardware is present is clearly undesirable (and is likely to be fixed in future releases of the PSM/PSM2 libraries). Finally, this signal hijacking subverts Open MPI's own signal reporting mechanism, which may be a bit surprising for some users (particularly those who do not have Intel TrueScale/OmniPath hardware). As such, we disable it by default so that Open MPI's own error-reporting mechanisms are used. This commit sets the following two environment variables to disable the signal hijacking from the PSM/PSM2 libraries (if they are not already set): * IPATH_NO_BACKTRACE=1 * HFI_NO_BACKTRACE=1 If the user has set these variables before invoking Open MPI, we will not override their values (i.e., their preferences will be honored). Signed-off-by: Jeff Squyres <[email protected]>
| @matcabral @rhc54 @yburette @ggouaillardet I just filed a master PR for this: #1781 Please review ASAP; I'd like to merge this to master and PR over to v1.10.3 and v2.0.0 so that we can make v2.0.0rc3 today. Thanks! | 
Per discussion on open-mpi#1767 (and some subsequent phone calls and off-issue email discussions), the PSM library is hijacking signal handlers by default. Specifically: unless the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel TrueScale) is set, the library constructor for this library will hijack various signal handlers for the purpose of invoking its own error reporting mechanisms. This may be a bit *surprising*, but is not a *problem*, per se. The real problem is that older versions of at least the PSM library do not unregister these signal handlers upon being unloaded from memory. Hence, a segv can actually result in a double segv (i.e., the original segv and then another segv when the now-non-existent signal handler is invoked). This PSM signal hijacking subverts Open MPI's own signal reporting mechanism, which may be a bit surprising for some users (particularly those who do not have Intel TrueScale). As such, we disable it by default so that Open MPI's own error-reporting mechanisms are used. Additionally, there is a typo in the library destructor for the PSM2 library that may cause problems in the unloading of its signal handlers. This problem can be avoided by setting `HFI_NO_BACKTRACE=1` (for PSM2 / Intel OmniPath). This is further compounded by the fact that the PSM / PSM2 libraries can be loaded by the OFI MTL and the usNIC BTL (because they are loaded by libfabric), even when there is no Intel networking hardware present. Having the PSM/PSM2 libraries behave this way when no Intel hardware is present is clearly undesirable (and is likely to be fixed in future releases of the PSM/PSM2 libraries). This commit sets the following two environment variables to disable this behavior from the PSM/PSM2 libraries (if they are not already set): * IPATH_NO_BACKTRACE=1 * HFI_NO_BACKTRACE=1 If the user has set these variables before invoking Open MPI, we will not override their values (i.e., their preferences will be honored). Signed-off-by: Jeff Squyres <[email protected]>
Per discussion on open-mpi/ompi#1767 (and some subsequent phone calls and off-issue email discussions), the PSM library is hijacking signal handlers by default. Specifically: unless the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel TrueScale) is set, the library constructor for this library will hijack various signal handlers for the purpose of invoking its own error reporting mechanisms. This may be a bit *surprising*, but is not a *problem*, per se. The real problem is that older versions of at least the PSM library do not unregister these signal handlers upon being unloaded from memory. Hence, a segv can actually result in a double segv (i.e., the original segv and then another segv when the now-non-existent signal handler is invoked). This PSM signal hijacking subverts Open MPI's own signal reporting mechanism, which may be a bit surprising for some users (particularly those who do not have Intel TrueScale). As such, we disable it by default so that Open MPI's own error-reporting mechanisms are used. Additionally, there is a typo in the library destructor for the PSM2 library that may cause problems in the unloading of its signal handlers. This problem can be avoided by setting `HFI_NO_BACKTRACE=1` (for PSM2 / Intel OmniPath). This is further compounded by the fact that the PSM / PSM2 libraries can be loaded by the OFI MTL and the usNIC BTL (because they are loaded by libfabric), even when there is no Intel networking hardware present. Having the PSM/PSM2 libraries behave this way when no Intel hardware is present is clearly undesirable (and is likely to be fixed in future releases of the PSM/PSM2 libraries). This commit sets the following two environment variables to disable this behavior from the PSM/PSM2 libraries (if they are not already set): * IPATH_NO_BACKTRACE=1 * HFI_NO_BACKTRACE=1 If the user has set these variables before invoking Open MPI, we will not override their values (i.e., their preferences will be honored). Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit open-mpi/ompi@5071602)
Per discussion on open-mpi/ompi#1767 (and some subsequent phone calls and off-issue email discussions), the PSM library is hijacking signal handlers by default. Specifically: unless the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel TrueScale) is set, the library constructor for this library will hijack various signal handlers for the purpose of invoking its own error reporting mechanisms. This may be a bit *surprising*, but is not a *problem*, per se. The real problem is that older versions of at least the PSM library do not unregister these signal handlers upon being unloaded from memory. Hence, a segv can actually result in a double segv (i.e., the original segv and then another segv when the now-non-existent signal handler is invoked). This PSM signal hijacking subverts Open MPI's own signal reporting mechanism, which may be a bit surprising for some users (particularly those who do not have Intel TrueScale). As such, we disable it by default so that Open MPI's own error-reporting mechanisms are used. Additionally, there is a typo in the library destructor for the PSM2 library that may cause problems in the unloading of its signal handlers. This problem can be avoided by setting `HFI_NO_BACKTRACE=1` (for PSM2 / Intel OmniPath). This is further compounded by the fact that the PSM / PSM2 libraries can be loaded by the OFI MTL and the usNIC BTL (because they are loaded by libfabric), even when there is no Intel networking hardware present. Having the PSM/PSM2 libraries behave this way when no Intel hardware is present is clearly undesirable (and is likely to be fixed in future releases of the PSM/PSM2 libraries). This commit sets the following two environment variables to disable this behavior from the PSM/PSM2 libraries (if they are not already set): * IPATH_NO_BACKTRACE=1 * HFI_NO_BACKTRACE=1 If the user has set these variables before invoking Open MPI, we will not override their values (i.e., their preferences will be honored). Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit open-mpi/ompi@5071602)
…ror notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler. Add PMIx 2.0 Remove PMIx 1.1.4 Cleanup copying of component Add missing file Touchup a typo in the Makefile.am Update the pmix ext114 component Minor cleanups and resync to master Update to latest PMIx 2.x Update to the PMIx event notification branch latest changes
| bot:retest | 
| Just a heads-up: I am rerunning the tests on this in preparation for commit. So anybody who has concerns - please speak up now. | 
Per discussion on open-mpi#1767 (and some subsequent phone calls and off-issue email discussions), the PSM library is hijacking signal handlers by default. Specifically: unless the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel TrueScale) is set, the library constructor for this library will hijack various signal handlers for the purpose of invoking its own error reporting mechanisms. This may be a bit *surprising*, but is not a *problem*, per se. The real problem is that older versions of at least the PSM library do not unregister these signal handlers upon being unloaded from memory. Hence, a segv can actually result in a double segv (i.e., the original segv and then another segv when the now-non-existent signal handler is invoked). This PSM signal hijacking subverts Open MPI's own signal reporting mechanism, which may be a bit surprising for some users (particularly those who do not have Intel TrueScale). As such, we disable it by default so that Open MPI's own error-reporting mechanisms are used. Additionally, there is a typo in the library destructor for the PSM2 library that may cause problems in the unloading of its signal handlers. This problem can be avoided by setting `HFI_NO_BACKTRACE=1` (for PSM2 / Intel OmniPath). This is further compounded by the fact that the PSM / PSM2 libraries can be loaded by the OFI MTL and the usNIC BTL (because they are loaded by libfabric), even when there is no Intel networking hardware present. Having the PSM/PSM2 libraries behave this way when no Intel hardware is present is clearly undesirable (and is likely to be fixed in future releases of the PSM/PSM2 libraries). This commit sets the following two environment variables to disable this behavior from the PSM/PSM2 libraries (if they are not already set): * IPATH_NO_BACKTRACE=1 * HFI_NO_BACKTRACE=1 If the user has set these variables before invoking Open MPI, we will not override their values (i.e., their preferences will be honored). Signed-off-by: Jeff Squyres <[email protected]>
Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler.
Add PMIx 2.0
Remove PMIx 1.1.4
Cleanup copying of component