Skip to content

Conversation

@bedroge
Copy link
Contributor

@bedroge bedroge commented Jan 15, 2026

In #2 we saw a node failure when building OpenMPI, and for another attempt it failed in the test step:

== testing...
== Running pre-test hook...
  >> running shell command:
        make check
        [started at: 2026-01-14 22:06:18]
        [working dir: /tmp/eessibot/easybuild/build/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7]
        [output and state saved to /tmp/eb-w5zwibz0/eb-y8z0smid/run-shell-cmd-output/make-811mi_m2]


ERROR: Shell command failed!
    full command              ->  make check
    exit code                 ->  2
    called from               ->  'test_step' function in 
/cvmfs/dev.eessi.io/riscv/versions/2025.06-001/software/linux/riscv64/generic/so
ftware/EasyBuild/5.2.0/lib/python3.13/site-packages/easybuild/easyblocks/generic
/configuremake.py (line 401)
    working directory         ->  
/tmp/eessibot/easybuild/build/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7
    output (stdout + stderr)  ->  
/tmp/eb-w5zwibz0/eb-y8z0smid/run-shell-cmd-output/make-811mi_m2/out.txt
    interactive shell script  ->  
/tmp/eb-w5zwibz0/eb-y8z0smid/run-shell-cmd-output/make-811mi_m2/cmd.sh

== ... (took 4 mins 1 secs)

The EB log showed:

make[3]: Leaving directory '/tmp/eessibot/easybuild/build/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7/test/class'
make  check-TESTS
make[3]: Entering directory '/tmp/eessibot/easybuild/build/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7/test/class'
make[4]: Entering directory '/tmp/eessibot/easybuild/build/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7/test/class'
PASS: ompi_rb_tree
PASS: opal_bitmap
PASS: opal_hash_table
PASS: opal_proc_table
PASS: opal_list
PASS: opal_value_array
PASS: opal_pointer_array
../../config/test-driver: line 119: 1307382 Segmentation fault      (core dumped) "$@" >> "$log_file" 2>&1
FAIL: opal_lifo
PASS: opal_fifo
PASS: opal_cstring
============================================================================
Testsuite summary for Open MPI 5.0.7
============================================================================
# TOTAL: 10
# PASS:  9
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

Let's debug it further in this PR.

@bedroge
Copy link
Contributor Author

bedroge commented Jan 15, 2026

bot: build repo:dev.eessi.io-riscv-2025.06-001 instance:eessi-bot-riscv for:arch=riscv64/generic

@riscv-eessi-io-bot
Copy link

riscv-eessi-io-bot bot commented Jan 15, 2026

New job on instance eessi-bot-riscv for repository dev.eessi.io-riscv-2025.06-001
Building on: generic
Building for: riscv64/generic
Job dir: /home/eessibot/shared/jobs/2026.01/pr_7/261412

date job status comment
Jan 15 08:36:03 UTC 2026 submitted job id 261412 awaits release by job manager
Jan 15 08:36:49 UTC 2026 released job awaits launch by Slurm scheduler
Jan 15 08:37:53 UTC 2026 running job 261412 is running
Jan 15 18:49:03 UTC 2026 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job261412.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Jan 15 18:49:03 UTC 2026 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job261412.test does not exist in job directory, or parsing it failed.

@bedroge
Copy link
Contributor Author

bedroge commented Jan 15, 2026

The build seems to be stuck, the test step was started 6 hours ago:

eessibot 1227497  937318  0 11:18 ?        00:00:00 make check
eessibot 1227499 1227497  0 11:18 ?        00:00:00 /bin/bash -c fail=; \ if (target_option=k; case ${target_option-} in ?) ;; *) echo "am__make_running_with_option: internal error: invalid" "target option '${target_option-}' specified" >&2; exit 1;; esac; has_opt=no; sane_makeflags=$MAKEFLAGS; if { if test -z '0'; then false; elif test -n 'riscv64-pc-linux-gnu'; then true; elif test -n '4.4.1' && test -n '/tmp/eessibot/easybuild/build/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7'; then true; else false; fi; }; then sane_makeflags=$MFLAGS; else case $MAKEFLAGS in *\\[\ \?]*) bs=\\; sane_makeflags=`printf '%s\n' "$MAKEFLAGS" | sed "s/$bs$bs[$bs $bs?]*//g"`;; esac; fi; skip_next=no; strip_trailopt () { flg=`printf '%s\n' "$flg" | sed "s/$1.*$//"`; }; for flg in $sane_makeflags; do test $skip_next = yes && { skip_next=no; continue; }; case $flg in *=*|--*) continue;; -*I) strip_trailopt 'I'; skip_next=yes;; -*I?*) strip_trailopt 'I';; -*O) strip_trailopt 'O'; skip_next=yes;; -*O?*) strip_trailopt 'O';; -*l) strip_trailopt 'l'; skip_next=yes;; -*l?*) strip_trailopt 'l';; -[dEDm]) skip_next=yes;; -[JT]) skip_next=yes;; esac; case $flg in *$target_option*) has_opt=yes; break;; esac; done; test $has_opt = yes); then \   failcom='fail=yes'; \ else \   failcom='exit 1'; \ fi; \ dot_seen=no; \ target=`echo check-recursive | sed s/-recursive//`; \ case "check-recursive" in \   distclean-* | maintainer-clean-*) list='config contrib 3rd-party opal ompi oshmem test docs' ;; \   *) list='config contrib 3rd-party opal ompi oshmem test docs' ;; \ esac; \ for subdir in $list; do \   echo "Making $target in $subdir"; \   if test "$subdir" = "."; then \     dot_seen=yes; \     local_target="$target-am"; \   else \     local_target="$target"; \   fi; \   (CDPATH="${ZSH_VERSION+.}:" && cd $subdir && make  $local_target) \   || eval $failcom; \ done; \ if test "$dot_seen" = "no"; then \   make  "$target-am" || exit 1; \ fi; test -z "$fail"
eessibot 1228991 1227499  0 11:19 ?        00:00:00 make check
eessibot 1228992 1228991  0 11:19 ?        00:00:00 /bin/bash -c fail=; \ if (target_option=k; case ${target_option-} in ?) ;; *) echo "am__make_running_with_option: internal error: invalid" "target option '${target_option-}' specified" >&2; exit 1;; esac; has_opt=no; sane_makeflags=$MAKEFLAGS; if { if test -z '1'; then false; elif test -n 'riscv64-pc-linux-gnu'; then true; elif test -n '4.4.1' && test -n '/tmp/eessibot/easybuild/build/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7/test'; then true; else false; fi; }; then sane_makeflags=$MFLAGS; else case $MAKEFLAGS in *\\[\ \?]*) bs=\\; sane_makeflags=`printf '%s\n' "$MAKEFLAGS" | sed "s/$bs$bs[$bs $bs?]*//g"`;; esac; fi; skip_next=no; strip_trailopt () { flg=`printf '%s\n' "$flg" | sed "s/$1.*$//"`; }; for flg in $sane_makeflags; do test $skip_next = yes && { skip_next=no; continue; }; case $flg in *=*|--*) continue;; -*I) strip_trailopt 'I'; skip_next=yes;; -*I?*) strip_trailopt 'I';; -*O) strip_trailopt 'O'; skip_next=yes;; -*O?*) strip_trailopt 'O';; -*l) strip_trailopt 'l'; skip_next=yes;; -*l?*) strip_trailopt 'l';; -[dEDm]) skip_next=yes;; -[JT]) skip_next=yes;; esac; case $flg in *$target_option*) has_opt=yes; break;; esac; done; test $has_opt = yes); then \   failcom='fail=yes'; \ else \   failcom='exit 1'; \ fi; \ dot_seen=no; \ target=`echo check-recursive | sed s/-recursive//`; \ case "check-recursive" in \   distclean-* | maintainer-clean-*) list='event support asm class threads datatype util mpool monitoring spc' ;; \   *) list='support asm class threads datatype util mpool monitoring spc' ;; \ esac; \ for subdir in $list; do \   echo "Making $target in $subdir"; \   if test "$subdir" = "."; then \     dot_seen=yes; \     local_target="$target-am"; \   else \     local_target="$target"; \   fi; \   (CDPATH="${ZSH_VERSION+.}:" && cd $subdir && make  $local_target) \   || eval $failcom; \ done; \ if test "$dot_seen" = "no"; then \   make  "$target-am" || exit 1; \ fi; test -z "$fail"
eessibot 1230795 1228992  0 11:20 ?        00:00:00 make check
eessibot 1233662 1230795  0 11:21 ?        00:00:00 make check-TESTS
eessibot 1233670 1233662  0 11:21 ?        00:00:00 /bin/bash -c set +e; bases='ompi_rb_tree.log opal_bitmap.log opal_hash_table.log opal_proc_table.log opal_list.log opal_value_array.log opal_pointer_array.log opal_lifo.log opal_fifo.log opal_cstring.log'; bases=`for i in $bases; do echo $i; done | sed 's/\.log$//'`; bases=`echo $bases`; \ log_list=`for i in $bases; do echo $i.log; done`; \ log_list=`echo $log_list`; \ make  test-suite.log TEST_LOGS="$log_list"; \ exit $?;
eessibot 1233678 1233670  0 11:21 ?        00:00:00 make test-suite.log TEST_LOGS=ompi_rb_tree.log opal_bitmap.log opal_hash_table.log opal_proc_table.log opal_list.log opal_value_array.log opal_pointer_array.log opal_lifo.log opal_fifo.log opal_cstring.log
eessibot 1234112 1233678  0 11:22 ?        00:00:00 /bin/bash ../../config/test-driver --test-name opal_lifo --log-file opal_lifo.log --trs-file opal_lifo.trs --color-tests no --enable-hard-errors yes --expect-failure no -- ./opal_lifo
eessibot 1234118 1234112 99 11:22 ?        05:30:15 /tmp/eessibot/easybuild/build/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7/test/class/.libs/lt-opal_lifo

top shows that it's still running at 100%:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                          
1234118 eessibot  20   0   44008   2580   1812 R 100.0   0.0 336:08.36 /tmp/eessibot/easybuild/build/OpenMPI/5.0.7/GCC-14.2.0/openmpi-5.0.7/test/class/.libs/lt-opal_lifo                                                                               

but strace doesn't show anything.

@bedroge
Copy link
Contributor Author

bedroge commented Jan 15, 2026

bot: build repo:dev.eessi.io-riscv-2025.06-001 instance:eessi-bot-riscv for:arch=riscv64/generic

@riscv-eessi-io-bot
Copy link

riscv-eessi-io-bot bot commented Jan 15, 2026

New job on instance eessi-bot-riscv for repository dev.eessi.io-riscv-2025.06-001
Building on: generic
Building for: riscv64/generic
Job dir: /home/eessibot/shared/jobs/2026.01/pr_7/261558

date job status comment
Jan 15 15:57:50 UTC 2026 submitted job id 261558 awaits release by job manager
Jan 15 15:58:26 UTC 2026 released job awaits launch by Slurm scheduler
Jan 15 15:59:33 UTC 2026 running job 261558 is running
Jan 15 17:47:58 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-261558.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-riscv64-generic-riscv-1768499167.tar.zstsize: 6 MiB (6952425 bytes)
entries: 67
modules under 2025.06-001/software/linux/riscv64/generic/modules/all
UCC/1.3.0-GCCcore-14.2.0.lua
software under 2025.06-001/software/linux/riscv64/generic/software
UCC/1.3.0-GCCcore-14.2.0
reprod directories under 2025.06-001/software/linux/riscv64/generic/reprod
no reprod directories in tarball
other under 2025.06-001/software/linux/riscv64/generic
no other files in tarball
Jan 15 17:47:58 UTC 2026 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job261558.test does not exist in job directory, or parsing it failed.

@bedroge
Copy link
Contributor Author

bedroge commented Jan 15, 2026

Cancelled the first job, the second one failed with the same error as before:

../../config/test-driver: line 119: 291975 Segmentation fault      (core dumped) "$@" >> "$log_file" 2>&1
FAIL: opal_lifo

@bedroge
Copy link
Contributor Author

bedroge commented Jan 16, 2026

Reconfigured the bot to use the arriesgado-hirsute partition (manual build worked in that partition), trying again...

bot: build repo:dev.eessi.io-riscv-2025.06-001 instance:eessi-bot-riscv for:arch=riscv64/generic

@riscv-eessi-io-bot
Copy link

riscv-eessi-io-bot bot commented Jan 16, 2026

New job on instance eessi-bot-riscv for repository dev.eessi.io-riscv-2025.06-001
Building on: generic
Building for: riscv64/generic
Job dir: /home/eessibot/shared/jobs/2026.01/pr_7/261769

date job status comment
Jan 16 21:50:20 UTC 2026 submitted job id 261769 awaits release by job manager
Jan 16 21:50:33 UTC 2026 released job awaits launch by Slurm scheduler
Jan 16 21:51:36 UTC 2026 running job 261769 is running

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant