-
Notifications
You must be signed in to change notification settings - Fork 321
Fixing the Exception Case for AzureStorageOrchestrationService.CompleteTaskOrchestrationWorkItemAsync #1235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing the Exception Case for AzureStorageOrchestrationService.CompleteTaskOrchestrationWorkItemAsync #1235
Conversation
|
Update: seems this feedback has been incorporated, ty. non-blocking tip (I'm just an OSS contributor now) - this seems like an involved failure case. I recommend adding in the PR description a short example of how this error case is reached, in the form of a timeline. For example:
It might make it easier to determine that the PR is doing what's expected. Though others caught up with the context might not need this level of detail. |
src/DurableTask.AzureStorage/AzureStorageOrchestrationService.cs
Outdated
Show resolved
Hide resolved
src/DurableTask.AzureStorage/AzureStorageOrchestrationService.cs
Outdated
Show resolved
Hide resolved
src/DurableTask.AzureStorage/Tracking/AzureTableTrackingStore.cs
Outdated
Show resolved
Hide resolved
| // Precondition failure is expected to be handled internally and logged as a warning. | ||
| // The orchestration dispatcher will handle this exception by abandoning the work item | ||
| throw new SessionAbortedException("Aborting execution due to failed precondition.", rfe); | ||
| throw new SessionAbortedException("Aborting execution due to conflicting completion of the work item.", dtse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - if this is possibly customer facing, it's useful to add a little hint here that the error is benign. Consider:
| throw new SessionAbortedException("Aborting execution due to conflicting completion of the work item.", dtse); | |
| throw new SessionAbortedException("Aborting execution because the work item was already completed. This can occur when another worker took over processing, and is benign when it occurs rarely", dtse); | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SessionAbortedExceptions do not surface, though someone else from the team correct me if I'm wrong, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't insist on this one, it's my personal style to augment these nonetheless but it really doesn't matter. Feel free to dismiss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than my comment changes, and the "partitionCount" log that should be generalized, the behavior here looks right to me.
Great job with the test, I personally don't think I ever had to write a test with such low level control over the orchestration processing state machine, so it's cool to see that in action.
As always - I'm just an OSS contributor, so please wait for someone actively in the team to approve before moving forward. Just wanted to signal that behavior-wise this seems correct, and that test seems complete.
Update: I cleared my approval just to avoid accidentally confusing the team into thinking the PR is already approved by a team member.
Co-authored-by: David Justo <[email protected]>
Co-authored-by: David Justo <[email protected]>
AzureStorageOrchestrationService.CompleteTaskOrchestrationWorkItemAsyncis not correctly catching the exception thrown in the case of a conflict when calling theITrackingStore.UpdateStateAsyncmethod. TheInstanceStoreBackedTrackingStoreimplementation actually does not seem to throw any exceptions (it does an "unsafe" upsert, not using the passed in etag at all), while theAzureTableTrackingStoreimplementation wraps the currently-caughtRequestFailedExceptionexception in aDurableTaskStorageException, which is not being correctly caught. This PR adds this exception check to the method, and removes the no long relevant check for aRequestFailedExceptionexception.It also adds an additional check for the case of a
HttpStatusCode.Conflictbeing returned - if this is the first work item for the orchestration, then there is no history and etag stored, and we follow this logic with action typeTableTransactionActionType.Add. If the work item has already been completed, then the history already exists, and so the storage update fails with aHttpStatusCode.Conflictindicating that the table entity already exists. If this is not the first work item, then there is an etag stored, and the action type is insteadTableTransactionActionType.UpdateMerge. In this case, if the work item has already been completed and a new etag stored, then the update fails with aHttpStatusCode.PreconditionFailed.An example scenario that was previously not being caught, but is now correctly caught:
AzureStorageOrchestrationService.CompleteTaskOrchestrationWorkItemAsync.AzureStorageOrchestrationService.CompleteTaskOrchestrationWorkItemAsync. If this is the first work item for the orchestration, then aDurableTaskStorageExceptionwith status codeConflictis thrown. If it is not the first work item, then the status code isPreconditionFailed. In either case, both of these exceptions are now caught and aSessionAbortedExceptionis thrown, which is the intended behavior.