Commit fc5b7ab
job-manager: make some replay errors non-fatal
Problem: if a few jobs get messed up in the KVS due to an
improper shutdown, recovery is a tedious process involving
starting flux in --recovery mode, fixing one job, and starting
again.
When a job cannot be replayed from the KVS and the reason is
that the directory is incomplete, log the failure at LOG_ERR
level but let replay continue and ultimately the flux restart
be successful.
If a job has more serious problems like incorrect content in
the eventlog, treat that as a fatal error as before. This
avoids breaking the 'valid' tests that check backwards
compatibility with older kvs dumps, which might use an older
eventlog format.
Update t2219-job-manage-restart.t to expect warnings rather
than failure when such jobs are encountered during replay.
Fixes #51471 parent d8f2f10 commit fc5b7ab
File tree
4 files changed
+45
-4
lines changed- src/modules/job-manager
- t
- job-manager/dumps/warn
4 files changed
+45
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
50 | 53 | | |
51 | 54 | | |
52 | 55 | | |
| |||
63 | 66 | | |
64 | 67 | | |
65 | 68 | | |
| 69 | + | |
66 | 70 | | |
67 | 71 | | |
68 | 72 | | |
69 | 73 | | |
| 74 | + | |
70 | 75 | | |
71 | 76 | | |
72 | 77 | | |
73 | 78 | | |
| 79 | + | |
74 | 80 | | |
75 | 81 | | |
76 | | - | |
| 82 | + | |
77 | 83 | | |
| 84 | + | |
| 85 | + | |
78 | 86 | | |
79 | 87 | | |
80 | 88 | | |
81 | 89 | | |
82 | 90 | | |
83 | 91 | | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
84 | 96 | | |
85 | 97 | | |
86 | 98 | | |
| |||
101 | 113 | | |
102 | 114 | | |
103 | 115 | | |
104 | | - | |
105 | | - | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
106 | 132 | | |
107 | 133 | | |
108 | 134 | | |
| |||
File renamed without changes.
File renamed without changes.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
22 | 32 | | |
23 | 33 | | |
24 | 34 | | |
| |||
193 | 203 | | |
194 | 204 | | |
195 | 205 | | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
196 | 211 | | |
197 | 212 | | |
198 | 213 | | |
| |||
0 commit comments