Skip to content

Add partial purge timeout support in gRPC proto mapping#3397

Closed
YunchuWang wants to merge 2 commits intodevfrom
wangbill/agents
Closed

Add partial purge timeout support in gRPC proto mapping#3397
YunchuWang wants to merge 2 commits intodevfrom
wangbill/agents

Conversation

@YunchuWang
Copy link
Member

Summary

Map the new timeout proto field in PurgeInstanceFilter to DTFx's PurgeInstanceFilter.Timeout, and forward PurgeResult.IsComplete back to the proto response. This enables the isolated worker SDK's partial purge feature to work end-to-end through the gRPC boundary.

Motivation

The isolated worker SDK communicates with the Functions host via gRPC. When purging large numbers of instances, the gRPC call times out before the host-side purge completes, causing:

  1. OperationCanceledException on the host side
  2. RpcException(StatusCode.Internal) on the worker side
  3. Deleted count = 0 reported (even though rows were actually deleted)

With this change, the SDK can pass a timeout through the proto, and the host maps it to DTFx's PurgeInstanceFilter.Timeout. The purge stops within the timeout, returns accurate counts, and sets isComplete = false so the caller knows to loop.

Changes

  • Proto: Add google.protobuf.Duration timeout = 4 to PurgeInstanceFilter message
  • ProtobufUtils.ToPurgeInstanceFilter: Map proto timeout → DTFx PurgeInstanceFilter.Timeout
  • ProtobufUtils.CreatePurgeInstancesResponse: Map DTFx PurgeResult.IsComplete → proto isComplete (was previously not being set)

Bug Fix

CreatePurgeInstancesResponse was not setting the isComplete field on the proto response, even though the field existed in the proto definition. This PR fixes that — PurgeResult.IsComplete is now forwarded to the response.

Breaking Changes

None. Additive proto field. Old clients that don't send timeout get unchanged behavior. The isComplete field fix is also non-breaking (was always present in proto, just not populated).

Dependencies

Benchmark Results (500K instances, EP1, isolated worker)

Metric Without timeout With timeout (25s)
Reported deleted 17,402 (3.5%) 499,560 (99.9%)
Errors 41 (95% failure) 0
Purge rate 12.3 inst/s 318.1 inst/s

- Add timeout field (4) to PurgeInstanceFilter proto message
- Map proto timeout to DTFx PurgeInstanceFilter.Timeout in ToPurgeInstanceFilter
- Forward PurgeResult.IsComplete to proto PurgeInstancesResponse.isComplete
- Enables isolated worker SDK to use time-bounded partial purge
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to enable end-to-end “partial purge” over the host/worker gRPC boundary by extending the purge proto contract and mapping it into DTFx (PurgeInstanceFilter.Timeout) and back (PurgeResult.IsComplete → proto response).

Changes:

  • Add google.protobuf.Duration timeout to the PurgeInstanceFilter proto message.
  • Map proto timeout into DurableTask.Core.PurgeInstanceFilter.Timeout in ProtobufUtils.ToPurgeInstanceFilter.
  • Populate PurgeInstancesResponse.isComplete from PurgeResult.IsComplete in ProtobufUtils.CreatePurgeInstancesResponse.
  • Introduce new GitHub Actions workflows + agent instruction docs for automated PR verification and daily code review.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/WebJobs.Extensions.DurableTask/ProtobufUtils.cs Adds proto↔DTFx mapping for purge timeout + isComplete.
src/WebJobs.Extensions.DurableTask/Grpc/Protos/orchestrator_service.proto Extends purge filter proto with a timeout duration field.
.github/workflows/pr-verification.yaml Adds scheduled/manual workflow to run an automated PR verification agent.
.github/workflows/daily-code-review.yaml Adds scheduled/manual workflow to run an automated daily code review agent.
.github/copilot-instructions.md Adds repository context/instructions for AI assistants.
.github/agents/pr-verification.agent.md Adds detailed agent runbook for PR verification.
.github/agents/issue-triage.agent.md Adds detailed agent runbook for issue triage.
.github/agents/daily-code-review.agent.md Adds detailed agent runbook for daily autonomous review/fix PR creation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.


if (result.IsComplete.HasValue)
{
response.IsComplete = result.IsComplete.Value;
Comment on lines 543 to +579
internal static PurgeInstanceFilter ToPurgeInstanceFilter(P.PurgeInstancesRequest request)
{
// Empty lists are not allowed by the underlying code that takes in a PurgeInstanceFilter. However, some
// clients (like Java) may use empty lists by default instead of nulls.
// Long story short: we must make sure to only copy over the list if it's non-empty.
IEnumerable<OrchestrationStatus>? statusFilter = null;
if (request.PurgeInstanceFilter.RuntimeStatus != null && request.PurgeInstanceFilter.RuntimeStatus.Count > 0)
{
statusFilter = request.PurgeInstanceFilter.RuntimeStatus?.Select(status => (OrchestrationStatus)status).ToList();
}

// This ternary condition is necessary because the protobuf spec __insists__ that CreatedTimeFrom may never be null,
// but nonetheless if you pass null in function code, the value will be null here
return new PurgeInstanceFilter(
var filter = new PurgeInstanceFilter(
request.PurgeInstanceFilter.CreatedTimeFrom == null ? DateTime.MinValue : request.PurgeInstanceFilter.CreatedTimeFrom.ToDateTime(),
request.PurgeInstanceFilter.CreatedTimeTo?.ToDateTime(),
statusFilter);

if (request.PurgeInstanceFilter.Timeout != null)
{
filter.Timeout = request.PurgeInstanceFilter.Timeout.ToTimeSpan();
}

return filter;
}

internal static P.PurgeInstancesResponse CreatePurgeInstancesResponse(PurgeResult result)
{
return new P.PurgeInstancesResponse
var response = new P.PurgeInstancesResponse
{
DeletedInstanceCount = result.DeletedInstanceCount,
};

if (result.IsComplete.HasValue)
{
response.IsComplete = result.IsComplete.Value;
}
Comment on lines +32 to +44
env:
DOTNET_VER: "8.0.x"

steps:
- name: 📥 Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: ⚙️ Setup .NET
uses: actions/setup-dotnet@v4
with:
dotnet-version: ${{ env.DOTNET_VER }}
Comment on lines +20 to +33
env:
DOTNET_VER: "8.0.x"

steps:
- name: 📥 Checkout code (full history for better analysis)
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: ⚙️ Setup .NET
uses: actions/setup-dotnet@v4
with:
dotnet-version: ${{ env.DOTNET_VER }}

Comment on lines +18 to +24
contents: write
issues: write
pull-requests: write

jobs:
verify-prs:
runs-on: ubuntu-latest
Comment on lines +1 to +16
name: 🔎 PR Verification Agent

# Security: This workflow has write permissions to contents, issues, and PRs, so
# it must NOT use the `pull_request` trigger (which checks out untrusted PR code
# and could exfiltrate the job token). Instead, it runs on schedule/manual
# dispatch only. The agent fetches each PR's branch itself before building and
# verifying. The contents:write permission is needed to push verification test
# code to verification/pr-<N> branches.
on:
# Run periodically to pick up PRs labeled pending-verification
schedule:
- cron: "0 */6 * * *" # Every 6 hours

# Allow manual trigger for testing
workflow_dispatch:

@YunchuWang
Copy link
Member Author

Closing reopening with clean branch (wangbill/partial-purge-timeout) that removes unrelated agent pipeline changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants