Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There currently exists two bugs in task_any.
Resolves #536
First Bug
Certain orchestrations will throw the following error:
This error is thrown in DurableOrchestrationContext.py, in _add_to_open_tasks, when certain criteria are met:
This can happen if:
Example orchestrator - assume that the result of sample_suborchestrator is just the input task_id:
If these circumstances are met, the second WhenAnyTask will immediately return the result from the completed subtask, without scheduling the new AtomicTask. However, the new AtomicTask will have been registered in the orchestration context's open task list, and the next time this new task is yielded back, when the context tries to register it, it encounters some old ReplaySchema.V1 logic, tries to call
.append()
on the previously registered AtomicTask, and throws.This PR adds a check to this .append() call to make sure that the value in the open task list is not already referencing the incoming task before attempting to append. I have tested this to make sure that using this logic, the orchestrations complete as expected.
Second bug
In this same case, when a WhenAnyTask is called with a task that has already completed, and a new AtomicTask that has not yet been scheduled, there is an unexpected interaction. When this happens, the TaskOrchestrationExecutor will register all of the subtasks for the new WhenAnyTask internally, but then it checks if any of the subtasks have received a result. If they have, the Executor assumes this WhenAnyTask was already scheduled and replays it with the completed task as the result. This results in the new AtomicTask never getting scheduled with the WebJobs extension and so it will never execute. We can call this the "limbo" task.
This is fine, until the user creates a new AtomicTask and schedules it. When the result of this final Task is received, the WebJobs extension and the Python extension are improperly indexed due to the limbo Task from before, and the Python extension assigns the result of the final task to the Limbo task. It then waits for a result for the final task, but since as far as WebJobs is concerned, it has already sent that result, the orchestration becomes stuck in "Running" state.
This PR addresses this problem as well by ensuring that all Tasks are scheduled before resuming replay.