You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before this change, the reference counters `opal_util_initialized`
and `opal_initialized` were incremented at the beginning of the
`opal_init_util` and the `opal_init` functions respectively.
In other words, they were incremented before fully initialized.
This causes the following program to abort by SIGFPE if
`--enable-timing` is enabled on `configure`.
```c
// need -lm option on link
int main(int argc, char *argv[])
{
// raise SIGFPE on division-by-zero
feenableexcept(FE_DIVBYZERO);
MPI_Init(&argc, &argv);
MPI_Finalize();
return 0;
}
```
The logic of the SIGFPE is:
1. `MPI_Init` calls `opal_init` through `ompi_rte_init`.
2. `opal_init` changes the value of `opal_initialized` to 1.
3. `opal_init` calls `opal_init_util`.
4. `opal_init_util` calls `opal_timing_ts_func` through
`OPAL_TIMING_ENV_INIT`, and `opal_timing_ts_func` returns
`get_ts_cycle` instead of `get_ts_gettimeofday` because
`opal_initialized` to 1.
(This is the problem)
5. `opal_init_util` calls `get_ts_cycle` through
`OPAL_TIMING_ENV_INIT`.
6. `get_ts_cycle` executes
`opal_timer_base_get_cycles()) / opal_timer_base_get_freq()`
and it raises SIGFPE (division-by-zero) because the OPAL TIMER
framework is not initialized yet and `opal_timer_base_get_freq`
returns 0.
This commit changes the increment timing of `opal_util_initialized`
and `opal_initialized` to the end of `opal_init_util` and the
`opal_init` functions respectively.
Signed-off-by: Tsubasa Yanagibashi <[email protected]>
0 commit comments