Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion ompi/mpi/java/c/mpi_MPI.c
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
* All rights reserved.
* Copyright (c) 2015 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2015 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2015-2016 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2015 Intel, Inc. All rights reserved.
* Copyright (c) 2015 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
Expand Down Expand Up @@ -131,6 +131,10 @@ OBJ_CLASS_INSTANCE(ompi_java_buffer_t,
*/
jint JNI_OnLoad(JavaVM *vm, void *reserved)
{
// Ensure that PSM signal hijacking is disabled *before* loading
// the library (see comment in the function for more detail).
opal_init_psm();

libmpi = dlopen("libmpi." OPAL_DYN_LIB_SUFFIX, RTLD_NOW | RTLD_GLOBAL);

#if defined(HAVE_DL_INFO) && defined(HAVE_LIBGEN_H)
Expand Down
9 changes: 8 additions & 1 deletion opal/runtime/opal.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
* Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved.
* Copyright (c) 2010-2012 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2010-2016 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2014 Intel, Inc. All rights reserved.
* $COPYRIGHT$
*
Expand Down Expand Up @@ -76,6 +76,13 @@ OPAL_DECLSPEC int opal_finalize(void);
*/
OPAL_DECLSPEC int opal_init_util(int* pargc, char*** pargv);

/**
* Disable PSM/PSM2 signal hijacking.
*
* See comment in the function for more detail.
*/
OPAL_DECLSPEC int opal_init_psm(void);

/**
* Finalize the OPAL layer, excluding the MCA system.
*
Expand Down
34 changes: 33 additions & 1 deletion opal/runtime/opal_init.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
* University of Stuttgart. All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
* Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2007-2016 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2007 Sun Microsystems, Inc. All rights reserved.
* Copyright (c) 2009 Oak Ridge National Labs. All rights reserved.
* Copyright (c) 2010-2015 Los Alamos National Security, LLC.
Expand Down Expand Up @@ -265,6 +265,34 @@ opal_err2str(int errnum, const char **errmsg)
}


int opal_init_psm(void)
{
/* Very early in the init sequence -- before *ANY* MCA components
are opened -- we need to disable some behavior from the PSM and
PSM2 libraries (by default): at least some old versions of
these libraries hijack signal handlers during their library
constructors and then do not un-hijack them when the libraries
are unloaded.

It is a bit of an abstraction break that we have to put
vendor/transport-specific code in the OPAL core, but we're
out of options, unfortunately.

NOTE: We only disable this behavior if the corresponding
environment variables are not already set (i.e., if the
user/environment has indicated a preference for this behavior,
we won't override it). */
if (NULL == getenv("IPATH_NO_BACKTRACE")) {
opal_setenv("IPATH_NO_BACKTRACE", "1", true, &environ);
}
if (NULL == getenv("HFI_NO_BACKTRACE")) {
opal_setenv("HFI_NO_BACKTRACE", "1", true, &environ);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit late, however, I believe this is unnecessary. Even though PSM2 library calls into the fini() code and runs sigaction() the resulting calls use a static struct that is initialized to all ZEROs (I double checked all the structs are in the .bss section). The resulting calls into sigaction with this parameter set, results in a simple query of the signal handlers, but no modification. So this work around really doesn't fix anything, other than avoiding some CPU cycles during dlclose(). I only mention this as near the end of the month this will be patched and then the HFI_NO_BACKTRACE will not exist inside libpsm2, so it will be wasted code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We opted for the "extra" protection solely because we couldn't be certain of the behavior, and the old libraries are going to be around on systems for a very long time (or at least so history says). So even though it technically isn't required, it doesn't hurt and maybe saves us from a little user angst.

}

return OPAL_SUCCESS;
}


int
opal_init_util(int* pargc, char*** pargv)
{
Expand Down Expand Up @@ -328,6 +356,10 @@ opal_init_util(int* pargc, char*** pargv)
goto return_error;
}

// Disable PSM signal hijacking (see comment in function for more
// details)
opal_init_psm();

/* Setup the parameter system */
if (OPAL_SUCCESS != (ret = mca_base_var_init())) {
error = "mca_base_var_init";
Expand Down