Skip to content

Commit 7d5fbcf

Browse files
committed
opal: Fix opal_initialized reference counter
Before this change, the reference counters `opal_util_initialized` and `opal_initialized` were incremented at the beginning of the `opal_init_util` and the `opal_init` functions respectively. In other words, they were incremented before fully initialized. This causes the following program to abort by SIGFPE if `--enable-timing` is enabled on `configure`. ```c // need -lm option on link int main(int argc, char *argv[]) { // raise SIGFPE on division-by-zero feenableexcept(FE_DIVBYZERO); MPI_Init(&argc, &argv); MPI_Finalize(); return 0; } ``` The logic of the SIGFPE is: 1. `MPI_Init` calls `opal_init` through `ompi_rte_init`. 2. `opal_init` changes the value of `opal_initialized` to 1. 3. `opal_init` calls `opal_init_util`. 4. `opal_init_util` calls `opal_timing_ts_func` through `OPAL_TIMING_ENV_INIT`, and `opal_timing_ts_func` returns `get_ts_cycle` instead of `get_ts_gettimeofday` because `opal_initialized` to 1. (This is the problem) 5. `opal_init_util` calls `get_ts_cycle` through `OPAL_TIMING_ENV_INIT`. 6. `get_ts_cycle` executes `opal_timer_base_get_cycles()) / opal_timer_base_get_freq()` and it raises SIGFPE (division-by-zero) because the OPAL TIMER framework is not initialized yet and `opal_timer_base_get_freq` returns 0. This commit changes the increment timing of `opal_util_initialized` and `opal_initialized` to the end of `opal_init_util` and the `opal_init` functions respectively. Signed-off-by: Tsubasa Yanagibashi <[email protected]>
1 parent f496f25 commit 7d5fbcf

File tree

1 file changed

+11
-4
lines changed

1 file changed

+11
-4
lines changed

opal/runtime/opal_init.c

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
* All rights reserved.
2525
* Copyright (c) 2018-2019 Triad National Security, LLC. All rights
2626
* reserved.
27+
* Copyright (c) 2020 FUJITSU LIMITED. All rights reserved.
2728
* $COPYRIGHT$
2829
*
2930
* Additional copyrights may follow
@@ -470,10 +471,11 @@ opal_init_util(int* pargc, char*** pargv)
470471
char *error = NULL;
471472
OPAL_TIMING_ENV_INIT(otmng);
472473

473-
if( ++opal_util_initialized != 1 ) {
474-
if( opal_util_initialized < 1 ) {
474+
if( opal_util_initialized != 0 ) {
475+
if( opal_util_initialized < 0 ) {
475476
return OPAL_ERROR;
476477
}
478+
++opal_util_initialized;
477479
return OPAL_SUCCESS;
478480
}
479481

@@ -615,6 +617,8 @@ opal_init_util(int* pargc, char*** pargv)
615617

616618
OPAL_TIMING_ENV_NEXT(otmng, "opal_if_init");
617619

620+
++opal_util_initialized;
621+
618622
return OPAL_SUCCESS;
619623
}
620624

@@ -635,10 +639,11 @@ opal_init(int* pargc, char*** pargv)
635639
{
636640
int ret;
637641

638-
if( ++opal_initialized != 1 ) {
639-
if( opal_initialized < 1 ) {
642+
if( opal_initialized != 0 ) {
643+
if( opal_initialized < 0 ) {
640644
return OPAL_ERROR;
641645
}
646+
++opal_initialized;
642647
return OPAL_SUCCESS;
643648
}
644649

@@ -688,5 +693,7 @@ opal_init(int* pargc, char*** pargv)
688693
return opal_init_error ("opal_reachable_base_select", ret);
689694
}
690695

696+
++opal_initialized;
697+
691698
return OPAL_SUCCESS;
692699
}

0 commit comments

Comments
 (0)