Skip to content

Nexus doesn't detect run failure #5833

@ye-luo

Description

@ye-luo

Describe the bug
Running lab2_qmc_basics/oxygen_dimer without QE binaries on the PATH.
Modify machine to ws8.

$ python O_dimer.py
...
  starting runs:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
  elapsed time 0.0 s  memory 105.95 MB 
    Entering ./scale_1.0 0 
      writing input files  0 dft 
    Entering ./scale_1.0 0 
      sending required files  0 dft 
      submitting job  0 dft 
    Entering ./scale_1.0 0 
      Executing:  
        export OMP_NUM_THREADS=1
        mpirun -np 4 pw.x -input dft.in 

  elapsed time 3.0 s  memory 105.98 MB 
  elapsed time 6.1 s  memory 105.99 MB 
  elapsed time 9.1 s  memory 105.99 MB 

No running pw.x and no error caught. nexus just sits idle.

Then export PATH=my_qe_installation/bin:$PATH

$ python O_dimer.py
...
  starting runs:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
  elapsed time 0.0 s  memory 105.98 MB 
  elapsed time 3.0 s  memory 105.98 MB 
  elapsed time 6.1 s  memory 105.99 MB 
  elapsed time 9.1 s  memory 105.99 MB 
  elapsed time 12.1 s  memory 105.99 MB 

pw.x still doesn't run and no error caught. nexus just sits idle.

$ rm scale_1.0
$ python O_dimer.py
...
  starting runs:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
  elapsed time 0.0 s  memory 106.28 MB 
    Entering ./scale_1.0 0 
      writing input files  0 dft 
    Entering ./scale_1.0 0 
      sending required files  0 dft 
      submitting job  0 dft 
    Entering ./scale_1.0 0 
      Executing:  
        export OMP_NUM_THREADS=1
        mpirun -np 4 pw.x -input dft.in 

  elapsed time 3.0 s  memory 577.72 MB 
  elapsed time 6.1 s  memory 2247.41 MB 
  elapsed time 9.1 s  memory 2919.17 MB 
  elapsed time 12.2 s  memory 3127.30 MB 

finally it runs. then next is qmcpack run

    Entering ./scale_1.0 2 
      writing input files  2 opt 
    Entering ./scale_1.0 2 
      sending required files  2 opt 
      submitting job  2 opt 
    Entering ./scale_1.0 2 
      Executing:  
        export OMP_NUM_THREADS=1
        mpirun -np 4 qmcpack opt.in.xml 

  elapsed time 134.7 s  memory 106.50 MB 
  elapsed time 137.7 s  memory 106.51 MB 
  elapsed time 140.8 s  memory 106.51 MB 
  elapsed time 143.8 s  memory 106.51 MB 

again it is idle again dule to missing qmcpack executables on the PATH.

Expected behavior
Properly error out when mpirun fails. Probably also worth checking the existence of executables.

System:

  • laptop ws8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions