Skip to content

WRF jobs don't always kill when seq fault is encountered #218

@fossell

Description

@fossell

When wrf.exe encounters a seg fault, the job doesn't necessarily fail, it just hangs. since there was no wall clock time set, the job will sit running indefinitely. This occurred with a cfl/segfault error when testing the Pacific NW domain as part of the new config gui testing. Need more robust error/status checking or monitoring to catch these instances.

Temporary fail safe solution: Added a hard coded wall clock time of 12 hours so job won't run indefinitely.

Expected Behavior

If the wrf.exe encounters seg fault, need to

Environment

Describe your runtime environment:
1. Machine: (e.g. HPC name, Linux Workstation, Mac Laptop)
2. OS: (e.g. RedHat Linux, MacOS)
3. Software version number(s)

To Reproduce

Describe the steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

Relevant Deadlines

List relevant project deadlines here or state NONE.

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority

Projects and Milestone

  • Select first Project for support of the current release
  • Select second Project for development toward the next official release
  • Select Milestone as the next bugfix version

Bugfix Checklist

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of main_<Version>.
    Branch name: bugfix_<Issue Number>/main_<Version>_<Description>
  • Fix the bug and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into main_<Version>.
    Pull request: bugfix <Issue Number> main_<Version> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Development issue
    Select: Project for support of the current release
    Select: Milestone as the next bugfix version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Complete the steps above to fix the bug on the develop branch.
    Branch name: bugfix_<Issue Number>/develop_<Description>
    Pull request: bugfix <Issue Number> develop <Description>
    Select: Reviewer(s) and Development issue
    Select: Project for the next official release
    Select: Milestone as the next official version
  • Close this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    alert: NEED MORE DEFINITIONNot yet actionable, additional definition requiredtype: bugFix something that is not working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions