forked from HPSCTerrSys/TSMP2_workflow-engine
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
FYI @kvrigor
Note:
iconworks for theintel_psmpibuild.- using grid
0070
1. Path to folders for the "bin" and "simulation_run"
gnu_openmpi
/p/scratch/cslts/gonzalez5/TSMP2/BUILDS/TSMP2/bin/JURECADC_ICON_2025_gnu_openmpi
/p/scratch/cslts/gonzalez5/TSMP2/tsmp2_eclm-parflow_tests/TSMP2_WFE_simexp_ideal_scal/run/sim_pft13-sid02-sv06_0070_icon_20150701_gnu_openmpi
gnu_psmpi
/p/scratch/cslts/gonzalez5/TSMP2/BUILDS/TSMP2/bin/JURECADC_ICON_2025_gnu_psmpi
/p/scratch/cslts/gonzalez5/TSMP2/tsmp2_eclm-parflow_tests/TSMP2_WFE_simexp_ideal_scal/run/sim_pft13-sid02-sv06_0070_icon_20150701_gnu_psmpi
2. ERRORS
gnu_openmpi
The combination icon with gnu_openmpi produces the following error:
adding new var_list ext_data_atm_td_D01
corrupted double-linked list
corrupted double-linked list
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
corrupted double-linked list
corrupted double-linked list
corrupted double-linked list
corrupted double-linked list
corrupted double-linked list
corrupted double-linked list
mo_ext_data_state:construct_ext_data: Construction of data structure for external data finished
mo_ext_data_init:init_ext_data: Running with analytical topography
corrupted double-linked list
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
corrupted double-linked list
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
mo_ext_data_init:init_ext_data: read_ext_data_atm completed
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
corrupted double-linked list
corrupted double-linked list
corrupted double-linked list
corrupted double-linked list
corrupted double-linked list
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
corrupted double-linked list
corrupted double-linked list
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
corrupted double-linked list
corrupted double-linked list
corrupted double-linked list
corrupted double-linked list
corrupted double-linked list
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
corrupted double-linked list
corrupted double-linked list
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
corrupted double-linked list
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
corrupted double-linked list
corrupted double-linked list
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
corrupted double-linked list
corrupted double-linked list
(mo_nh_testcases) init_nh_testtopo:: running Convective Boundary Layer Experiment
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
corrupted double-linked list
corrupted double-linked list
gnu_psmpi
icon with gnu_psmpi produces the following error:
adding new var_list ext_data_atm_td_D01
[jrc0715:930019:0:930019] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
mo_ext_data_state:construct_ext_data: Construction of data structure for external data finished
mo_ext_data_init:init_ext_data: Running with analytical topography
mo_ext_data_init:init_ext_data: read_ext_data_atm completed
[jrc0715:930099:0:930099] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
(mo_nh_testcases) init_nh_testtopo:: running Convective Boundary Layer Experiment
[jrc0715:930015:0:930015] Caught signal 7 (Bus error: Sent by the kernel)
[jrc0715:930086:0:930086] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
[jrc0715:929988:0:929988] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
malloc(): unaligned tcache chunk detected
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
double free or corruption (out)
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
==== backtrace (tid: 930015) ====
0 0x000000000003ebf0 __GI___sigaction() :0
1 0x00000000011a5dc3 __mo_hash_table_MOD_hashtable_destruct() ???:0
2 0x0000000000573e69 __mo_key_value_store_MOD_key_value_store_destruct() ???:0
3 0x000000000067243c __mo_dictionary_MOD_dict_finalize() ???:0
4 0x00000000004df782 __mo_ext_data_init_MOD_init_ext_data() ???:0
5 0x0000000000433c37 __mo_atmo_model_MOD_construct_atmo_model() ???:0
6 0x0000000000434729 __mo_atmo_model_MOD_atmo_model() ???:0
7 0x000000000040c448 MAIN__() icon.f90:0
8 0x000000000040be1d main() ???:0
9 0x00000000000295d0 __libc_start_call_main() ???:0
10 0x0000000000029680 __libc_start_main_alias_2() :0
11 0x000000000040be75 _start() ???:0
=================================
Program received signal SIGBUS: Access to an undefined portion of a memory object.
Backtrace for this error:
#0 0x14f32de2bbef in ???
#1 0x14f32de78edc in ???
#2 0x14f32de2bb45 in ???
#3 0x14f32de15832 in ???
#4 0x14f32de16171 in ???
#5 0x14f32de82f86 in ???
#6 0x14f32de86ffb in ???
#7 0x14f2bddd187e in pscom_req_create
at /dev/shm/swmanage/jurecadc/pscom/5-default/GCCcore-13.3.0/pscom-5.8.0-1/lib/pscom/pscom_req.c:152
#8 0x14f2bddc90d8 in pscom_request_create
at /dev/shm/swmanage/jurecadc/pscom/5-default/GCCcore-13.3.0/pscom-5.8.0-1/lib/pscom/pscom_io.c:1602
#9 0x14f2c02d457b in ???
#10 0x14f2c02ca2ce in ???
#11 0x14f2c017f404 in ???
#12 0x14f32e080a2b in ???
#13 0x423a53 in ???
#14 0xa12017 in ???
#15 0x55c6a7 in ???
#16 0x490d63 in ???
#17 0x7b001c in ???
#18 0x53b9c2 in ???
#19 0x433dbc in ???
#20 0x434728 in ???
#21 0x40c447 in ???
#22 0x40be1c in ???
#23 0x14f32de165cf in ???
#24 0x14f32de1667f in ???
#25 0x40be74 in ???
#26 0xffffffffffffffff in ???
#0 0x1535da2c7bef in ???
#1 0x1535da314edc in ???
#2 0x1535da2c7b45 in ???
#3 0x1535da2b1832 in ???
#4 0x1535da2b2171 in ???
#5 0x1535da31ef86 in ???
#6 0x1535da320c6f in ???
#7 0x1535da3232c4 in ???
#8 0x15356739d16e in ???
#9 0x15356739eea5 in ???
#10 0x1535673a0b70 in ???
#11 0x1535673722e6 in ???
#12 0x153567372430 in ???
#13 0x153567488759 in find_address_in_section
at debug/debug.c:338
#14 0x153567345bab in ???
#15 0x1535674893ca in get_line_info
at debug/debug.c:370
#16 0x1535674893ca in ucs_debug_backtrace_create
at debug/debug.c:401
#17 0x1535674898c4 in ucs_debug_backtrace_create
at debug/debug.c:390
#18 0x153567489d61 in ucs_debug_show_innermost_source_file
at debug/debug.c:551
#19 0x15356748ae6f in ucs_handle_error
at debug/debug.c:1091
#20 0x15356748b043 in ucs_debug_handle_error_signal
at debug/debug.c:1044
#21 0x15356748b1e9 in ucs_error_signal_handler
at debug/debug.c:1066
#22 0x1535da2c7bef in ???
#23 0x4f4d14 in ???
#24 0x4346e5 in ???
#25 0x434728 in ???
#26 0x40c447 in ???
#27 0x40be1c in ???
#28 0x1535da2b25cf in ???
#29 0x1535da2b267f in ???
#30 0x40be74 in ???
#31 0xffffffffffffffff in ???
#0 0x152adb23fbef in ???
#1 0x11a5dc3 in ???
#2 0x573e68 in ???
#3 0x67243b in ???
#4 0x4df781 in ???
#5 0x433c36 in ???
#6 0x434728 in ???
#7 0x40c447 in ???
#8 0x40be1c in ???
#9 0x152adb22a5cf in ???
#10 0x152adb22a67f in ???
#11 0x40be74 in ???
#12 0xffffffffffffffff in ???
srun: error: jrc0715: tasks 0-39,41-73,75,77-127: Terminated
srun: error: jrc0715: task 40: Bus error (core dumped)
srun: error: jrc0715: tasks 74,76: Aborted (core dumped)
srun: Force Terminated StepId=14189510.0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels