Skip to content

Retrieving the status of a submitted batch job returns NEW unless the user waits a few seconds #399

@Andrew-S-Rosen

Description

@Andrew-S-Rosen

I tried submitting a Slurm job "manually" and getting the ID. Using this native ID, I used the following code to get the job state.

from psij import Job, JobExecutor

job_executor = JobExecutor.get_instance("slurm")

job = Job()

job_executor.attach(job, "123456") # placeholder native ID obtained from `sbatch`
print(job.status.state)

When doing this within the first ~2-3 seconds after the job was submitted, I get back NEW even though the job is marked Q in the queue (and is submitted because otherwise I wouldn't have had the native ID). If I add a sleep timer of 4 seconds, it returns QUEUED every time as expected, but I'm worried that this might not be a general solution because if the filesystem is slow it might change that.

Here is a complete demonstration that works on Perlmutter:

from psij import Job, JobAttributes, JobExecutor, JobSpec, ResourceSpecV1
import time

job_executor = JobExecutor.get_instance("slurm")
job = Job(
    JobSpec(
        name="test",
        executable="/bin/date",
        resources=ResourceSpecV1(node_count = 1),
        attributes=JobAttributes(project_name = 'MyAccountName', custom_attributes = {'slurm.constraint': 'cpu', 'slurm.qos': 'debug'}),
        launcher="single",
    )
)
job_executor.submit(job)
native_id = job.native_id
print(native_id) #---> prints the Slurm ID correctly


job_executor = JobExecutor.get_instance("slurm")
job = Job()
job_executor.attach(job, native_id)
print(job.status.state) #---> prints NEW
time.sleep(4)
print(job.status.state) #---> prints ACTIVE

Is this the expected behavior, or is this something to be addressed? I, naturally, get the same behavior when using a separate Python process that uses PSI/J to submit the Slurm job.


Sidenote: This feature of retrieving the job state should also be added to the documentation somewhere.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions