Skip to content

Commit bd60cc5

Browse files
committed
shell: document potential signal race
Problem: If a trapped signal (SIGTERM, SIGINT, SIGALRM) is sent before all shell tasks have been launched, unexpected shell exit codes could happen. This is rare and the tradeoff is accepted, but we do not document this anywhere. Solution: Add some comments about this possibility in shell/signals.c.
1 parent 19964cd commit bd60cc5

File tree

1 file changed

+19
-0
lines changed

1 file changed

+19
-0
lines changed

src/shell/signals.c

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,25 @@
1717
* SIGTERM - forward
1818
* SIGALRM - forward
1919
*
20+
* Notes:
21+
*
22+
* By setting up the signal watchers during "shell.init", there is the
23+
* potential for inconsistent exit codes if a signal is received before all
24+
* tasks have started. For example, this could be seen with something
25+
* like:
26+
*
27+
* jobid=`flux submit -n1000 foo.sh`
28+
* flux job raise --type=foo --severity=0 $jobid
29+
*
30+
* i.e. raise sends SIGTERM to job/shell immediately after starting,
31+
* but due to the large task count of 1000, the signal is received
32+
* before tasks are all setup. Some tasks could receive SIGTERM while
33+
* some (to be created ones) do not.
34+
*
35+
* Note that the shell should always return an error, but the error
36+
* may not be consistent. This situation is extremely rare and only
37+
* seen is testing situations such as the above. So we elect to not
38+
* fix this race.
2039
*/
2140
#define FLUX_SHELL_PLUGIN_NAME "signals"
2241

0 commit comments

Comments
 (0)