File tree Expand file tree Collapse file tree 2 files changed +21
-0
lines changed
Expand file tree Collapse file tree 2 files changed +21
-0
lines changed Original file line number Diff line number Diff line change @@ -364,6 +364,7 @@ int shell_log_init (flux_shell_t *shell, const char *progname)
364364 logger .level = FLUX_SHELL_NOTICE ;
365365 logger .fp_level = FLUX_SHELL_NOTICE ;
366366 logger .active = 0 ;
367+ logger .exception_logged = 0 ;
367368 logger .fp = stderr ;
368369 logger .rank = -1 ;
369370 if (progname && !(logger .prog = strdup (progname )))
Original file line number Diff line number Diff line change 1515 *
1616 * SIGINT - forward to all local tasks
1717 * SIGTERM - forward
18+ * SIGALRM - forward
1819 *
20+ * Notes:
21+ *
22+ * By setting up the signal watchers during "shell.init", there is the
23+ * potential for inconsistent exit codes if a signal is received before all
24+ * tasks have started. For example, this could be seen with something
25+ * like:
26+ *
27+ * jobid=`flux submit -n1000 foo.sh`
28+ * flux job raise --type=foo --severity=0 $jobid
29+ *
30+ * i.e. raise sends SIGTERM to job/shell immediately after starting,
31+ * but due to the large task count of 1000, the signal is received
32+ * before tasks are all setup. Some tasks could receive SIGTERM while
33+ * some (to be created ones) do not.
34+ *
35+ * Note that the shell should always return an error, but the error
36+ * may not be consistent. This situation is extremely rare and only
37+ * seen is testing situations such as the above. So we elect to not
38+ * fix this race.
1939 */
2040#define FLUX_SHELL_PLUGIN_NAME "signals"
2141
You can’t perform that action at this time.
0 commit comments