Skip to content

Stop eval_runner.py on job failure by default#452

Merged
cvolkcvolk merged 1 commit intomainfrom
cvolk/fix/stop_evaluation_on_error
Mar 2, 2026
Merged

Stop eval_runner.py on job failure by default#452
cvolkcvolk merged 1 commit intomainfrom
cvolk/fix/stop_evaluation_on_error

Conversation

@cvolkcvolk
Copy link
Collaborator

@cvolkcvolk cvolkcvolk commented Feb 27, 2026

Summary

Change the eval_runner.py default behavior to stop immediately when a job fails, instead of silently continuing with remaining jobs. Prevents spinning up full simulation environments for subsequent jobs when they will all fail for the same reason (e.g. missing model path or memory/ Cuda errors).

Adds a --continue_on_error CLI flag to opt into the previous behavior of running all remaining jobs even after a failure.

@cvolkcvolk cvolkcvolk changed the title Stop eval runner on job failure by default Stop eval_runner.py on job failure by default Feb 27, 2026
@cvolkcvolk cvolkcvolk merged commit 6dc1146 into main Mar 2, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants