-
Notifications
You must be signed in to change notification settings - Fork 20
Description
I tried submitting a Slurm job "manually" and getting the ID. Using this native ID, I used the following code to get the job state.
from psij import Job, JobExecutor
job_executor = JobExecutor.get_instance("slurm")
job = Job()
job_executor.attach(job, "123456") # placeholder native ID obtained from `sbatch`
print(job.status.state)When doing this within the first ~2-3 seconds after the job was submitted, I get back NEW even though the job is marked Q in the queue (and is submitted because otherwise I wouldn't have had the native ID). If I add a sleep timer of 4 seconds, it returns QUEUED every time as expected, but I'm worried that this might not be a general solution because if the filesystem is slow it might change that.
Here is a complete demonstration that works on Perlmutter:
from psij import Job, JobAttributes, JobExecutor, JobSpec, ResourceSpecV1
import time
job_executor = JobExecutor.get_instance("slurm")
job = Job(
JobSpec(
name="test",
executable="/bin/date",
resources=ResourceSpecV1(node_count = 1),
attributes=JobAttributes(project_name = 'MyAccountName', custom_attributes = {'slurm.constraint': 'cpu', 'slurm.qos': 'debug'}),
launcher="single",
)
)
job_executor.submit(job)
native_id = job.native_id
print(native_id) #---> prints the Slurm ID correctly
job_executor = JobExecutor.get_instance("slurm")
job = Job()
job_executor.attach(job, native_id)
print(job.status.state) #---> prints NEW
time.sleep(4)
print(job.status.state) #---> prints ACTIVEIs this the expected behavior, or is this something to be addressed? I, naturally, get the same behavior when using a separate Python process that uses PSI/J to submit the Slurm job.
Sidenote: This feature of retrieving the job state should also be added to the documentation somewhere.