Commit dcaec77
authored
Maintain a stable order of children context, resolves a non-determinism around cancels (#1183)
After some tough-to-identify determinism issues in what appeared to be correct user workflows, and some investigations by both us and them:
This PR resolves a non-deterministic behavior involving child context cancellation propagation, in particular when unblocking selects based on those contexts (possibly transitively, e.g. via activity futures).
As this was previously non-deterministic behavior, both the previous and new code _could_ cause determinism failures after upgrading... but the random execution order previously stood a good chance of failing a few times and then automatically resolving itself. Unfortunately that is not maintained here - failures are likely to be permanent.
Resolving this is... probably not feasible currently. We do not record client-library versions in workflow history, so we cannot maintain backwards compatibility accurately in scenarios like this. We almost certainly _should_ record this on decisions, at least when it changes - we could randomly cancel entries in the list when replaying old decisions, and allow the random behavior to eventually choose a stable execution on a host somewhere.
In any case, for all future workflows this makes behavior deterministic, and should resolve the issue for good.
---
A full repro can be seen with:
1. Create multiple cancellable child contexts off a single cancellable parent context, populating its child-context map.
2. Base some behavior off each child context. Any one-shot logic works, but activities are pretty easy and occur a lot in practice (i.e. waiting on N activities, and being able to cancel many at once).
3. Block on the selector.
4. Cancel the parent context. This will:
1. Cancel the parent context
2. Propagate that to a _random_ child context
3. Which will synchronously resolve the future(s) attached to the child context
4. Which will synchronously trigger any pending callbacks
5. One of which is a "first call wins" closure which the selector uses to choose which branch to execute
Maintaining the children contexts in _an_ order resolves this, as it ensures the same child is canceled first (then second, etc) each time. Any order should work.
For clearer semantics, I chose to implement it as a compacting FIFO list (as children can remove themselves if they are cancelled independently). This is not noticeably costly (maintenance in a large list will be dwarfed by any side effects of canceling) and it makes it very easy to define and hopefully maintain, as it _must not_ be changed.
---
This order decision _will not_ be a defined semantic of workflows, however. Cancellation of multiple futures / selector branches _should_ be treated as unordered, and implementing exactly the same behavior in other languages may not be efficient.
In a future implementation it may be worth making selectors choose from _any_ available branch pseudo-randomly, e.g. by run-ID, for the same reason Go explicitly randomizes these behaviors: it prevents accidentally depending on implementation details, by exposing logical flaws sooner.1 parent aa89bb7 commit dcaec77
2 files changed
+176
-28
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
230 | 230 | | |
231 | 231 | | |
232 | 232 | | |
233 | | - | |
234 | | - | |
235 | | - | |
236 | | - | |
| 233 | + | |
237 | 234 | | |
238 | 235 | | |
239 | 236 | | |
| |||
258 | 255 | | |
259 | 256 | | |
260 | 257 | | |
261 | | - | |
| 258 | + | |
262 | 259 | | |
263 | 260 | | |
264 | 261 | | |
| |||
278 | 275 | | |
279 | 276 | | |
280 | 277 | | |
281 | | - | |
| 278 | + | |
282 | 279 | | |
283 | 280 | | |
284 | 281 | | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
285 | 303 | | |
286 | 304 | | |
287 | 305 | | |
| |||
300 | 318 | | |
301 | 319 | | |
302 | 320 | | |
303 | | - | |
304 | | - | |
| 321 | + | |
| 322 | + | |
305 | 323 | | |
306 | 324 | | |
307 | 325 | | |
| |||
320 | 338 | | |
321 | 339 | | |
322 | 340 | | |
323 | | - | |
324 | | - | |
325 | | - | |
326 | | - | |
327 | | - | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
328 | 344 | | |
329 | 345 | | |
330 | 346 | | |
| |||
374 | 390 | | |
375 | 391 | | |
376 | 392 | | |
377 | | - | |
| 393 | + | |
378 | 394 | | |
379 | 395 | | |
380 | 396 | | |
| |||
395 | 411 | | |
396 | 412 | | |
397 | 413 | | |
398 | | - | |
| 414 | + | |
399 | 415 | | |
400 | | - | |
401 | | - | |
402 | | - | |
403 | | - | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
404 | 420 | | |
405 | 421 | | |
406 | 422 | | |
407 | 423 | | |
408 | | - | |
| 424 | + | |
409 | 425 | | |
410 | | - | |
| 426 | + | |
411 | 427 | | |
412 | | - | |
| 428 | + | |
413 | 429 | | |
414 | | - | |
| 430 | + | |
415 | 431 | | |
416 | | - | |
| 432 | + | |
417 | 433 | | |
418 | | - | |
| 434 | + | |
419 | 435 | | |
420 | 436 | | |
421 | 437 | | |
| |||
425 | 441 | | |
426 | 442 | | |
427 | 443 | | |
428 | | - | |
| 444 | + | |
429 | 445 | | |
430 | 446 | | |
431 | 447 | | |
| |||
437 | 453 | | |
438 | 454 | | |
439 | 455 | | |
440 | | - | |
| 456 | + | |
441 | 457 | | |
442 | | - | |
| 458 | + | |
443 | 459 | | |
444 | 460 | | |
445 | 461 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| 30 | + | |
| 31 | + | |
29 | 32 | | |
30 | 33 | | |
31 | 34 | | |
| |||
128 | 131 | | |
129 | 132 | | |
130 | 133 | | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
0 commit comments