-
Notifications
You must be signed in to change notification settings - Fork 935
Description
I've just set up a new machine with Debian 13 Trixie which includes OpenMPI 5.0.7-1 and gfortan 14.2.0. Hardware is AMD Ryzen 7840U.
I'm developing the NASA GISS ModelE GCM, and the line with call MPI_INIT(rc) is causing a floating point exception. The backtrace is below, and Frame 18 at at model/MPI_Support/dist_grid_mod.F90:277 is the line call MPI_INIT(rc)
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x1555550232ba in ???
#1 0x155555022375 in ???
#2 0x155554d59def in ???
#3 0x155554408b43 in ???
#4 0x1555543c1892 in ???
#5 0x15555439cec4 in ???
#6 0x1555554fcc61 in ???
#7 0x1555542bd03b in ???
#8 0x1555542af328 in ???
#9 0x155553850789 in ???
#10 0x155553851163 in ???
#11 0x15555385b2d9 in ???
#12 0x15555469741b in ???
#13 0x15555469b429 in ???
#14 0x15555469c187 in ???
#15 0x15555469371f in ???
#16 0x1555546c449e in ???
#17 0x15555542bbc9 in ???
#0 0x1555550232ba in ???
#1 0x155555022375 in ???
#2 0x155554d59def in ???
#3 0x155554408b43 in ???
#4 0x1555543c1892 in ???
#5 0x15555439cec4 in ???
#6 0x1555554fcc61 in ???
#7 0x1555542bd03b in ???
#8 0x1555542af328 in ???
#9 0x155553850789 in ???
#10 0x155553851163 in ???
#11 0x15555385b2d9 in ???
#12 0x15555469741b in ???
#13 0x15555469b429 in ???
#14 0x15555469c187 in ???
#15 0x15555469371f in ???
#16 0x1555546c449e in ???
#17 0x15555542bbc9 in ???
#18 0x5555564fb94d in __dist_grid_mod_MOD_init_app
at model/MPI_Support/dist_grid_mod.F90:277
#18 0x5555564fb94d in __dist_grid_mod_MOD_init_app
at model/MPI_Support/dist_grid_mod.F90:277
#19 0x5555557cd185 in initializemodele
at model/MODELE.f:588
#20 0x5555557ca97c in giss_modele_
at model/MODELE.f:234
#21 0x5555557c4971 in modele_maindriver_
at model/MODELE_DRV.f:27
#22 0x555555560b46 in MAIN__
at model/main.F90:2
#23 0x555555560b96 in main
at model/main.F90:3
#19 0x5555557cd185 in initializemodele
at model/MODELE.f:588
#20 0x5555557ca97c in giss_modele_
at model/MODELE.f:234
#21 0x5555557c4971 in modele_maindriver_
at model/MODELE_DRV.f:27
#22 0x555555560b46 in MAIN__
at model/main.F90:2
#23 0x555555560b96 in main
at model/main.F90:3
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 500573 on node fw13 exited on
signal 8 (Floating point exception).
--------------------------------------------------------------------------
[Thread 0x1555542ff6c0 (LWP 500572) exited]
[Thread 0x1555545006c0 (LWP 500571) exited]
[Inferior 1 (process 500568) exited with code 0210]
(gdb)
If I turn of FPE checking around that line, the model runs:
+ use ieee_exceptions, only: ieee_divide_by_zero, ieee_invalid, ieee_overflow, ieee_set_halting_mode
...
+ ! Disable FPE trapping before MPI_Init
+ call ieee_set_halting_mode(ieee_divide_by_zero, .false.)
+ call ieee_set_halting_mode(ieee_invalid, .false.)
+ call ieee_set_halting_mode(ieee_overflow, .false.)
+
call MPI_INIT(rc)
call setCommunicator(MPI_COMM_WORLD)
call MPI_COMM_SIZE(COMMUNICATOR, NPES_WORLD, rc)
call MPI_COMM_RANK(COMMUNICATOR, rank, rc)
+
+ ! Re-enable FPE trapping
+ call ieee_set_halting_mode(ieee_divide_by_zero, .true.)
+ call ieee_set_halting_mode(ieee_invalid, .true.)
+ call ieee_set_halting_mode(ieee_overflow, .true.)
+This new system is new hardware, new OS, and updated gfortran and OpenMPI. In an attempt to isolate the issue I've done the following:
- Tested on this new hardware with old dev environment in Docker. Debian 12 Bookworm, gfortran 12, OpenMPI 4.something. No issue -> Not hardware.
- Tested with gfortran-12 on this OS, installed with
apt install gfortran-12and adjusting the Makefile. I assume this uses the same system installed OpenMPI 5.x. Issue exists -> Not gfortran. - Disabling FPE checking just for the
MPI_INITline above suggests this may be related to the new OpenMPI.
I am sorry but I am unable to create an MWE, but if someone did want to test this, I could help set up a dev environment. The latest GCM code is the last link at https://simplex.giss.nasa.gov/snapshots/ and the system can run in Docker (see https://github.com/nasa-giss/docker).