Skip to content

Conversation

nefrathenrici
Copy link
Member

@nefrathenrici nefrathenrici commented Aug 9, 2025

Purpose

Adds restarts for PBS model runs specifically for model runs that are terminated by the PBS job scheduler.

To-do

Content

  • The main code changes are in pbs_trap_block, which sets up a signal handler that catches SIGTERM and uses qrerun to requeue the job. This is added to the bash script generated by generate_pbs_script
  • Specifies a tmpdir for derecho to prevent issues writing large checkpoint files in the user's home directory.
  • This PR also has the changes from Add default status if PBS job status fails #217 , maybe they'd be easier to bundle together.

  • I have read and checked the items on the review checklist.

@nefrathenrici nefrathenrici changed the title Ne/restart Add restarts for PBS model run timeouts Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant