Skip to content

Read exit code from file for successful Google Batch jobs to avoid intermediate states#6848

Open
thalassemia wants to merge 1 commit intonextflow-io:masterfrom
thalassemia:no-intermediate
Open

Read exit code from file for successful Google Batch jobs to avoid intermediate states#6848
thalassemia wants to merge 1 commit intonextflow-io:masterfrom
thalassemia:no-intermediate

Conversation

@thalassemia
Copy link

@thalassemia thalassemia commented Feb 19, 2026

When google.batch.maxSpotAttempts is set to a value greater than 0, Google Batch handles retrying of jobs on VMs that fail with exit code 50001 (spot preemption). While retrying, the job continues to stay in a RUNNING state. Once the job finishes, Batch marks the job as SUCCEEDED, which triggers the block of code I modified in this PR.

Even though the getExitCode function is supposed to read all task exit codes and pick the most recent one, I frequently found that it picks up the 50001 exit code instead of the final exit code for jobs internally retried due to preemption. This causes the workflow to fail if the 50001 exit code is not handled in Nextflow as well, which defeats the purpose of letting Batch handle it. In all these cases, the .exitcode does have the correct final exit code of 0 (job was successful after all). Thus, to handle this case, I propose always reading from .exitcode for successful jobs as it appears to be more reliable than the Batch API when there is a preemption event.

Example messages in .nextflow.log

$ cat .nextflow.log | grep "sim_gen_12 (variant=1/seed=8/generation=12/agent_id=000000000000)"
Feb-19 13:14:01.659 [Task submitter] INFO  nextflow.Session - [3b/4416b0] Submitted process > sim_gen_12 (variant=1/seed=8/generation=12/agent_id=000000000000)
Feb-19 13:33:15.584 [Task monitor] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Process `sim_gen_12 (variant=1/seed=8/generation=12/agent_id=000000000000)` - terminated job=nf-a4266646-1771506841059; task=0; state=SUCCEEDED
Feb-19 13:33:15.659 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 3935; name: sim_gen_12 (variant=1/seed=8/generation=12/agent_id=000000000000); status: COMPLETED; exit: 50001; error: -; workDir: gs://vecoli-us-south/test-t2d-south/nextflow/nextflow_workdirs/3b/4416b09a42ab1f4c6d16c0e966e7fe]
  task: name=sim_gen_12 (variant=1/seed=8/generation=12/agent_id=000000000000); work-dir=gs://vecoli-us-south/test-t2d-south/nextflow/nextflow_workdirs/3b/4416b09a42ab1f4c6d16c0e966e7fe
  error [nextflow.exception.ProcessFailedException]: Process `sim_gen_12 (variant=1/seed=8/generation=12/agent_id=000000000000)` terminated with an error exit status (50001)
Feb-19 13:33:16.152 [TaskFinalizer-5] INFO  nextflow.processor.TaskProcessor - [3b/4416b0] NOTE: Process `sim_gen_12 (variant=1/seed=8/generation=12/agent_id=000000000000)` terminated with an error exit status (50001) -- Error is ignored
Feb-19 13:33:16.536 [TaskFinalizer-5] DEBUG nextflow.Session - Setting fail-on-ignore flag due to ignored task 'sim_gen_12 (variant=1/seed=8/generation=12/agent_id=000000000000)'

…ates

Signed-off-by: Sean Cheah <cheah_sean@yahoo.com>
@netlify
Copy link

netlify bot commented Feb 19, 2026

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit b4b379d
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/699719cc97d9e80008f09c79

@thalassemia thalassemia changed the title Read exit code from file for successful jobs to avoid intermediate states Read exit code from file for successful Google Batch jobs to avoid intermediate states Feb 19, 2026
@pditommaso pditommaso force-pushed the master branch 2 times, most recently from d9fa5cd to d752bc2 Compare February 28, 2026 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant