Skip to content

Errors in running genesis in HPC #3

@mersalas

Description

@mersalas

I am using the HPC facility of DOST ASTI (https://asti.dost.gov.ph/coare/wiki/Main/using-coare/hpc/basic-hpc/). The COARE team installed genesis v2.1.4 for both gpu & cpu versions in a singularity container. I tried submitting the following submission script:
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --qos=batch_default
#SBATCH --nodes=4
#SBATCH --ntasks=32
#SBATCH --nodelist=saliksik-cpu-07
#SBATCH --job-name="3md_0-100"
#SBATCH --output="3md_0-100.out"
##SBATCH --requeue
##SBATCH --ntasks-per-node=8

module purge
module load anaconda/3-2023.07-2
module load openmpi/4.1.2

#MAIN
export OMP_NUM_THREADS=4
mpirun -np 8 -npernode 4 singularity exec genesis.sif spdyn 3md_0-100.inp > 3md_0-100.out

But got this error:

Program received signal SIGILL: Illegal instruction.
Backtrace for this error:
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
#0 0x2ac383bfd960 in ???
#1 0x2ac383bfcac5 in ???
#2 0x2ac38414751f in ???
#3 0x5633100a741a in ???
#4 0x56330ff16df8 in ???
#5 0x56331008b542 in ???
#6 0x5633100a2903 in ???
#7 0x56330fbdc1ae in ???
#8 0x2ac38412ed8f in ???
#9 0x2ac38412ee3f in ???
#10 0x56330fbdc1d4 in ???
#11 0xffffffffffffffff in ???
mpirun noticed that process rank 1 with PID 0 on node saliksik-cpu-07 exited on signal 4 (Illegal instruction).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions