Commit fdaed29
authored
fix(sdk): fix race condition causing termination to happen too early (#418)
*Issue #, if available:*
*Description of changes:*
Fixing a bug where concurrent operations that could cause termination
would cause early termination. This would result in invocations exiting
too early with a PENDING status.
The root cause is that in a tight loop, termination can get scheduled
and execute before the checkpoint updates receive their response. This
is because checkpoint calls are asynchronous, so if there's some time
before the response arrives, the termination may happen too early.
We previously only checked the termination conditions before scheduling,
but now I've changed it to check the conditions both when scheduling,
and when actually terminating.
I've added a test which simulates this with checkpoint delay. This test
fails before the fix with `Cannot return PENDING status with no pending
operations.`, but succeeds now.
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.1 parent ec96c42 commit fdaed29
File tree
12 files changed
+10333
-96
lines changed- packages
- aws-durable-execution-sdk-js-examples
- src/examples
- map
- failure-threshold-exceeded-count
- failure-threshold-exceeded-percentage
- high-concurrency-invoke
- non-durable
- parallel
- failure-threshold-exceeded-percentage
- wait
- aws-durable-execution-sdk-js/src/utils/checkpoint
12 files changed
+10333
-96
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
| 29 | + | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
| 29 | + | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| |||
0 commit comments