Skip to content

Potential problem handling array jobs #23

@cbutakoff

Description

@cbutakoff

I have an array job limited to 2 jobs at a time:

   2145_[4-190%2]   compute   EP_108      opc PD       0:00     10 (JobArrayTaskLimit)
   2145_3   compute   EP_108      opc  R      48:42     10 compute-hpc-node-[100,373,397,421,425,429,455,457,813,896]
   2145_2   compute   EP_108      opc  R    4:13:06     10 compute-hpc-node-[69,237,245,272,347,553,724,817,931,993]

But slurm or oci tries to still provision the 3rd cluster and fails (because of lack of available nodes) but it just keeps on retrying. E.g.:
Selection_022

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions