Skip to content

Conversation

@nytian
Copy link
Collaborator

@nytian nytian commented May 1, 2025

This PR fixes several issues on partition manager v3:

  • The drain reason was always logged as "Shutdown," even when caused by lease loss. This is fixed by passing a CloseReason reason argument to CheckDrainTask.

  • When draining completes and the lease is released, the worker incorrectly resumes listening to the queue. This is fixed by adding a !ReleasedPartition check to the isRenewingToDrainQueue logic.

@nytian nytian requested review from bachuv and cgillum May 1, 2025 18:48

this.LogHelper(partition, claimedLease, stoleLease, renewedLease, drainedLease, releasedLease, previousOwner);
}
catch (DurableTaskStorageException ex) when (ex.HttpStatusCode == (int)HttpStatusCode.PreconditionFailed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does increasing the scope of the try/catch change any behavior? It seems the only place that would throw DurableTaskStorageException is the call to ReplaceEntityAsync on line 414, so it doesn't look like this change does anything. Or is there some other code path I'm not seeing, like something in OnTableLeaseAcquiredAsync which can throw?

The PR description mentioned logging, but again I'm not sure I understand how the current logging behavior is changed. Either way, if an exception is thrown, we rethrow and exit from this method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I see your concern—you’re right that this change increases the scope of the try/catch. I’ll think about how to improve it. The reason I made this change is because I believe OnTableLeaseAcquiredAsync cannot throw a DurableTaskStorageException with a PreconditionFailed status. but let me think if there is a better and clearer way

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad -- please forget my earlier message, I didn't see we already throw the exception. I will change it back

@nytian nytian changed the title Fix Partition Manager v3 Logging Issues and Prevent Queue Listening During Partition Release Fix DrainAsync Reason at Logging and Prevent Queue Listening During Partition Release May 1, 2025
@nytian nytian merged commit 6d09d03 into main May 2, 2025
44 checks passed
@nytian nytian deleted the nytian/pmv3-fix branch May 2, 2025 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants