Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 26 additions & 18 deletions src/DurableTask.AzureStorage/Partitioning/TablePartitionManager.cs
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,24 @@ public async Task<ReadTableReponse> ReadAndWriteTableAsync(bool isShuttingDown,
try
{
await this.partitionTable.ReplaceEntityAsync(partition, etag, forcefulShutdownToken);

// Ensure worker is listening to the control queue iff either:
// 1) worker just claimed the lease,
// 2) worker was already the owner in the partitions table and is not actively draining the queue.
// Note that during draining, we renew the lease but do not want to listen to new messages.
// Otherwise, we'll never finish draining our in-memory messages.
// When drain completes, and the worker may decide to release the lease. In that moment,
// IsDrainingPartition can still be true but renewedLease is false — without checking
// !releasedLease, the worker could incorrectly resume listening just before releasing the lease.
bool isRenewingToDrainQueue = renewedLease && response.IsDrainingPartition && !releasedLease;
if (claimedLease || !isRenewingToDrainQueue)
{
// Notify the orchestration session manager that we acquired a lease for one of the partitions.
// This will cause it to start reading control queue messages for that partition.
await this.service.OnTableLeaseAcquiredAsync(partition);
}

this.LogHelper(partition, claimedLease, stoleLease, renewedLease, drainedLease, releasedLease, previousOwner);
}
catch (DurableTaskStorageException ex) when (ex.HttpStatusCode == (int)HttpStatusCode.PreconditionFailed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does increasing the scope of the try/catch change any behavior? It seems the only place that would throw DurableTaskStorageException is the call to ReplaceEntityAsync on line 414, so it doesn't look like this change does anything. Or is there some other code path I'm not seeing, like something in OnTableLeaseAcquiredAsync which can throw?

The PR description mentioned logging, but again I'm not sure I understand how the current logging behavior is changed. Either way, if an exception is thrown, we rethrow and exit from this method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I see your concern—you’re right that this change increases the scope of the try/catch. I’ll think about how to improve it. The reason I made this change is because I believe OnTableLeaseAcquiredAsync cannot throw a DurableTaskStorageException with a PreconditionFailed status. but let me think if there is a better and clearer way

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad -- please forget my earlier message, I didn't see we already throw the exception. I will change it back

{
Expand All @@ -423,19 +441,6 @@ public async Task<ReadTableReponse> ReadAndWriteTableAsync(bool isShuttingDown,
$"Failed to update table entry due to an Etag mismatch. Failed ETag value: '{etag}'.");
throw;
}

// Ensure worker is listening to the control queue iff either:
// 1) worker just claimed the lease,
// 2) worker was already the owner in the partitions table and is not actively draining the queue. Note that during draining, we renew the lease but do not want to listen to new messages. Otherwise, we'll never finish draining our in-memory messages.
bool isRenewingToDrainQueue = renewedLease & response.IsDrainingPartition;
if (claimedLease || !isRenewingToDrainQueue)
{
// Notify the orchestration session manager that we acquired a lease for one of the partitions.
// This will cause it to start reading control queue messages for that partition.
await this.service.OnTableLeaseAcquiredAsync(partition);
}

this.LogHelper(partition, claimedLease, stoleLease, renewedLease, drainedLease, releasedLease, previousOwner);
}
}

Expand Down Expand Up @@ -505,7 +510,8 @@ void RenewOrReleaseMyLease(
partition,
ref releasedLease,
ref renewedLease,
ref drainedLease);
ref drainedLease,
CloseReason.LeaseLost);
}
}

Expand Down Expand Up @@ -583,7 +589,8 @@ void TryDrainAndReleaseAllPartitions(
partition,
ref releasedLease,
ref renewedLease,
ref drainedLease);
ref drainedLease,
CloseReason.Shutdown);

if (releasedLease)
{
Expand Down Expand Up @@ -661,7 +668,7 @@ await this.partitionTable.ReplaceEntityAsync(
partition,
etag,
forceShutdownToken);

this.settings.Logger.LeaseStealingSucceeded(
this.storageAccountName,
this.settings.TaskHubName,
Expand Down Expand Up @@ -815,7 +822,8 @@ void CheckDrainTask(
TablePartitionLease partition,
ref bool releasedLease,
ref bool renewedLease,
ref bool drainedLease)
ref bool drainedLease,
CloseReason reason)
{
// Check if drain process has started.
if (this.backgroundDrainTasks.TryGetValue(partition.RowKey!, out Task? drainTask))
Expand Down Expand Up @@ -844,7 +852,7 @@ void CheckDrainTask(
}
else// If drain task hasn't been started yet, start it and keep renewing the lease to prevent it from expiring.
{
this.DrainPartition(partition, CloseReason.Shutdown);
this.DrainPartition(partition, reason);
this.RenewLease(partition);
renewedLease = true;
drainedLease = true;
Expand Down
Loading