You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Make slurm job canceling more robust, by killing all remaining jobs after a short wait time
* [Debug] Log times for scancel handling
* Fix signal handling for cluster executors which stopped working if multiple executors were instantiated in the same process before.
* First stop the file wait thread before canceling the jobs to avoid checking the job state after canceling which will log lots of errors for jobs that were canceled before running, because they don't turn up in the slurm accounting
* Cleanup
* Properly mock (and restore) env variables
* Update changelog
* Add f-string
* Exempt completing jobs when checking whether cancellation worked fast enough
* Test whether monkeypatching works
* add time logs to find out where job cancellation time is spent
* Fix setting sigterm_wait_in_s to 0 during tests
* Improve slurm cancellation test to assert that original sigint handler was called
* Format
* Fix typing
* Fix nonlocal variable access
* Add test for signal handling regression when multiple executors are instantiated
* Garbage collect after first executor ran to provoke regression
* Apply some PR feedback
* Delete executor1 in test to provoke bug
* Add pytest-timeout to avoid hanging tests and wait for futures in test
* Cleanup and fix hanging tests
* Add comment
* Remove pytest-timeout dependency again
* Restore uv.lock
* When shutting down cluster executor and wait if False, treat as if executor was killed
* Decrease SIGTERM_WAIT_IN_S for new test and assert that shutdown hooks are cleaned up
* Format
* Remove dask executor from cluster tools
* Linting
* Update changelog
* Actually deregister shutdown hook and use with statements to ensure executor shutdown
* Also update webknossos uv.lock
* Add pytest-timeout
* Unify the two variables tracking executor shutdown
* Fix kubernetes dependency
* Assert that no jobs run before the tests
* Add debug logging
* Fix signal handling test
* Revert "Update changelog"
This reverts commit 31c472a.
* Revert "Remove dask executor from cluster tools"
This reverts commit 9866368.
* Revert "Linting"
This reverts commit 4ee81c0.
Copy file name to clipboardExpand all lines: cluster_tools/Changelog.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,6 +16,8 @@ For upgrade instructions, please check the respective *Breaking Changes* section
16
16
### Changed
17
17
18
18
### Fixed
19
+
- Fixed that sometimes not all slurm jobs were canceled when an executor was killed. [#1317](https://github.com/scalableminds/webknossos-libs/pull/1317)
20
+
- Fixed that when multiple cluster executors were instantiated in the same process, the original SIGINT handler sometimes was no longer called, leading to the main application not shutting down correctly after a SIGINT signal. [#1317](https://github.com/scalableminds/webknossos-libs/pull/1317)
0 commit comments