Add partial purge timeout support in gRPC proto mapping#3397
Closed
YunchuWang wants to merge 2 commits intodevfrom
Closed
Add partial purge timeout support in gRPC proto mapping#3397YunchuWang wants to merge 2 commits intodevfrom
YunchuWang wants to merge 2 commits intodevfrom
Conversation
- Add timeout field (4) to PurgeInstanceFilter proto message - Map proto timeout to DTFx PurgeInstanceFilter.Timeout in ToPurgeInstanceFilter - Forward PurgeResult.IsComplete to proto PurgeInstancesResponse.isComplete - Enables isolated worker SDK to use time-bounded partial purge
Contributor
There was a problem hiding this comment.
Pull request overview
This PR aims to enable end-to-end “partial purge” over the host/worker gRPC boundary by extending the purge proto contract and mapping it into DTFx (PurgeInstanceFilter.Timeout) and back (PurgeResult.IsComplete → proto response).
Changes:
- Add
google.protobuf.Duration timeoutto thePurgeInstanceFilterproto message. - Map proto
timeoutintoDurableTask.Core.PurgeInstanceFilter.TimeoutinProtobufUtils.ToPurgeInstanceFilter. - Populate
PurgeInstancesResponse.isCompletefromPurgeResult.IsCompleteinProtobufUtils.CreatePurgeInstancesResponse. - Introduce new GitHub Actions workflows + agent instruction docs for automated PR verification and daily code review.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/WebJobs.Extensions.DurableTask/ProtobufUtils.cs | Adds proto↔DTFx mapping for purge timeout + isComplete. |
| src/WebJobs.Extensions.DurableTask/Grpc/Protos/orchestrator_service.proto | Extends purge filter proto with a timeout duration field. |
| .github/workflows/pr-verification.yaml | Adds scheduled/manual workflow to run an automated PR verification agent. |
| .github/workflows/daily-code-review.yaml | Adds scheduled/manual workflow to run an automated daily code review agent. |
| .github/copilot-instructions.md | Adds repository context/instructions for AI assistants. |
| .github/agents/pr-verification.agent.md | Adds detailed agent runbook for PR verification. |
| .github/agents/issue-triage.agent.md | Adds detailed agent runbook for issue triage. |
| .github/agents/daily-code-review.agent.md | Adds detailed agent runbook for daily autonomous review/fix PR creation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
|
|
||
| if (result.IsComplete.HasValue) | ||
| { | ||
| response.IsComplete = result.IsComplete.Value; |
Comment on lines
543
to
+579
| internal static PurgeInstanceFilter ToPurgeInstanceFilter(P.PurgeInstancesRequest request) | ||
| { | ||
| // Empty lists are not allowed by the underlying code that takes in a PurgeInstanceFilter. However, some | ||
| // clients (like Java) may use empty lists by default instead of nulls. | ||
| // Long story short: we must make sure to only copy over the list if it's non-empty. | ||
| IEnumerable<OrchestrationStatus>? statusFilter = null; | ||
| if (request.PurgeInstanceFilter.RuntimeStatus != null && request.PurgeInstanceFilter.RuntimeStatus.Count > 0) | ||
| { | ||
| statusFilter = request.PurgeInstanceFilter.RuntimeStatus?.Select(status => (OrchestrationStatus)status).ToList(); | ||
| } | ||
|
|
||
| // This ternary condition is necessary because the protobuf spec __insists__ that CreatedTimeFrom may never be null, | ||
| // but nonetheless if you pass null in function code, the value will be null here | ||
| return new PurgeInstanceFilter( | ||
| var filter = new PurgeInstanceFilter( | ||
| request.PurgeInstanceFilter.CreatedTimeFrom == null ? DateTime.MinValue : request.PurgeInstanceFilter.CreatedTimeFrom.ToDateTime(), | ||
| request.PurgeInstanceFilter.CreatedTimeTo?.ToDateTime(), | ||
| statusFilter); | ||
|
|
||
| if (request.PurgeInstanceFilter.Timeout != null) | ||
| { | ||
| filter.Timeout = request.PurgeInstanceFilter.Timeout.ToTimeSpan(); | ||
| } | ||
|
|
||
| return filter; | ||
| } | ||
|
|
||
| internal static P.PurgeInstancesResponse CreatePurgeInstancesResponse(PurgeResult result) | ||
| { | ||
| return new P.PurgeInstancesResponse | ||
| var response = new P.PurgeInstancesResponse | ||
| { | ||
| DeletedInstanceCount = result.DeletedInstanceCount, | ||
| }; | ||
|
|
||
| if (result.IsComplete.HasValue) | ||
| { | ||
| response.IsComplete = result.IsComplete.Value; | ||
| } |
Comment on lines
+32
to
+44
| env: | ||
| DOTNET_VER: "8.0.x" | ||
|
|
||
| steps: | ||
| - name: 📥 Checkout code | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 0 | ||
|
|
||
| - name: ⚙️ Setup .NET | ||
| uses: actions/setup-dotnet@v4 | ||
| with: | ||
| dotnet-version: ${{ env.DOTNET_VER }} |
Comment on lines
+20
to
+33
| env: | ||
| DOTNET_VER: "8.0.x" | ||
|
|
||
| steps: | ||
| - name: 📥 Checkout code (full history for better analysis) | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 0 | ||
|
|
||
| - name: ⚙️ Setup .NET | ||
| uses: actions/setup-dotnet@v4 | ||
| with: | ||
| dotnet-version: ${{ env.DOTNET_VER }} | ||
|
|
Comment on lines
+18
to
+24
| contents: write | ||
| issues: write | ||
| pull-requests: write | ||
|
|
||
| jobs: | ||
| verify-prs: | ||
| runs-on: ubuntu-latest |
Comment on lines
+1
to
+16
| name: 🔎 PR Verification Agent | ||
|
|
||
| # Security: This workflow has write permissions to contents, issues, and PRs, so | ||
| # it must NOT use the `pull_request` trigger (which checks out untrusted PR code | ||
| # and could exfiltrate the job token). Instead, it runs on schedule/manual | ||
| # dispatch only. The agent fetches each PR's branch itself before building and | ||
| # verifying. The contents:write permission is needed to push verification test | ||
| # code to verification/pr-<N> branches. | ||
| on: | ||
| # Run periodically to pick up PRs labeled pending-verification | ||
| schedule: | ||
| - cron: "0 */6 * * *" # Every 6 hours | ||
|
|
||
| # Allow manual trigger for testing | ||
| workflow_dispatch: | ||
|
|
Member
Author
|
Closing reopening with clean branch (wangbill/partial-purge-timeout) that removes unrelated agent pipeline changes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Map the new
timeoutproto field inPurgeInstanceFilterto DTFx'sPurgeInstanceFilter.Timeout, and forwardPurgeResult.IsCompleteback to the proto response. This enables the isolated worker SDK's partial purge feature to work end-to-end through the gRPC boundary.Motivation
The isolated worker SDK communicates with the Functions host via gRPC. When purging large numbers of instances, the gRPC call times out before the host-side purge completes, causing:
OperationCanceledExceptionon the host sideRpcException(StatusCode.Internal)on the worker sideWith this change, the SDK can pass a
timeoutthrough the proto, and the host maps it to DTFx'sPurgeInstanceFilter.Timeout. The purge stops within the timeout, returns accurate counts, and setsisComplete = falseso the caller knows to loop.Changes
google.protobuf.Duration timeout = 4toPurgeInstanceFiltermessageProtobufUtils.ToPurgeInstanceFilter: Map prototimeout→ DTFxPurgeInstanceFilter.TimeoutProtobufUtils.CreatePurgeInstancesResponse: Map DTFxPurgeResult.IsComplete→ protoisComplete(was previously not being set)Bug Fix
CreatePurgeInstancesResponsewas not setting theisCompletefield on the proto response, even though the field existed in the proto definition. This PR fixes that —PurgeResult.IsCompleteis now forwarded to the response.Breaking Changes
None. Additive proto field. Old clients that don't send
timeoutget unchanged behavior. TheisCompletefield fix is also non-breaking (was always present in proto, just not populated).Dependencies
Benchmark Results (500K instances, EP1, isolated worker)