Skip to content

Add another platform exception type to OutOfProcMiddleware#3383

Merged
sophiatev merged 10 commits intodevfrom
stevosyan/fix-2939
Mar 16, 2026
Merged

Add another platform exception type to OutOfProcMiddleware#3383
sophiatev merged 10 commits intodevfrom
stevosyan/fix-2939

Conversation

@sophiatev
Copy link
Collaborator

@sophiatev sophiatev commented Mar 13, 2026

Summary

What changed?

This PR adds another exception type (FunctionTimeoutAbortException) to our check for platform-level exceptions when invoking orchestrations, entities, and notably Activities.

Why is this change needed?

This exception is thrown when a worker restarts after another Function has reached its timeout. Currently, since result.Succeeded is false in this case, we will fail the Activity even though it itself did not exceed the Function timeout. This PR changes the behavior to instead throw a SessionAbortedException, such that the Activity (or orchestration, entity, etc.) is retried.

Issues / work items


Project checklist

  • Documentation changes are not required
    • Otherwise: Documentation PR is ready to merge and referenced in pending_docs.md
  • Release notes are not required for the next release
    • Otherwise: Notes added to release_notes.md
  • Backport is not required
    • Otherwise: Backport tracked by issue/PR #issue_or_pr
  • All required tests have been added/updated (unit tests, E2E tests)
  • No extra work is required to be leveraged by OutOfProc SDKs
    • Otherwise: Work tracked here: #issue_or_pr_in_each_sdk
  • No change to the version of the WebJobs.Extensions.DurableTask package
    • Otherwise: Major/minor updates are reflected in /src/Worker.Extensions.DurableTask/AssemblyInfo.cs
  • No EventIds were added to EventSource logs
  • This change should be added to the v2.x branch
    • Otherwise: This change applies exclusively to WebJobs.Extensions.DurableTask v3.x and will be retained only in the dev and main branches
  • Breaking change?
    • If yes:
      • Impact:
      • Migration guidance:

AI-assisted code disclosure (required)

Was an AI tool used? (select one)

  • No
  • Yes, AI helped write parts of this PR (e.g., GitHub Copilot)
  • Yes, an AI agent generated most of this PR

If AI was used:

  • Tool(s):
  • AI-assisted areas/files:
  • What you changed after AI output:

AI verification (required if AI was used):

  • I understand the code and can explain it
  • I verified referenced APIs/types exist and are correct
  • I reviewed edge cases/failure paths (timeouts, retries, cancellation, exceptions)
  • I reviewed concurrency/async behavior
  • I checked for unintended breaking or behavior changes

Testing

Automated tests

  • Result: Passed / Failed (link logs if failed)

Manual validation (only if runtime/behavior changed)

  • Environment (OS, .NET version, components):
  • Steps + observed results:
    1.
    2.
    3.
  • Evidence (optional):

Notes for reviewers

  • N/A

Copilot AI review requested due to automatic review settings March 13, 2026 18:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes issue #2939 where activities (and orchestrations/entities) were being incorrectly marked as failed when a FunctionTimeoutAbortException was thrown due to another function's timeout causing a worker restart. The fix ensures this exception type is treated as a platform-level exception, triggering a SessionAbortedException for durable retry instead of a permanent failure.

Changes:

  • Added FunctionTimeoutAbortException to the IsPlatformLevelException check (used by the orchestrator path) and added explicit checks in the entity and activity paths
  • Added dedicated unit tests for entity and activity paths with FunctionTimeoutAbortException
  • Added FunctionTimeoutAbortException to the PlatformLevelExceptions test data for orchestrator tests, plus minor code style cleanups (collection expressions, primary constructors, explicit types)

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/WebJobs.Extensions.DurableTask/OutOfProcMiddleware.cs Adds FunctionTimeoutAbortException handling in entity/activity dispatch paths and to IsPlatformLevelException
test/FunctionsV2/OutOfProcMiddlewareTests.cs Adds tests for entity/activity FunctionTimeoutAbortException handling, adds SetupEntityTest/SetupActivityTest helpers, and minor style cleanups

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot AI review requested due to automatic review settings March 13, 2026 20:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds handling for FunctionTimeoutAbortException in the OutOfProcMiddleware to fix an issue where activities (and entities) would be permanently failed when another function on the same worker hit a timeout, causing a worker restart. The fix ensures these work items are retried (via SessionAbortedException) instead of being marked as failed.

Changes:

  • Added FunctionTimeoutAbortException checks in the entity and activity execution paths of OutOfProcMiddleware, and added it to the IsPlatformLevelException method (used by the orchestrator path).
  • Bumped Microsoft.Azure.WebJobs package version from 3.0.39 to 3.0.45 to gain access to the FunctionTimeoutAbortException type.
  • Added unit tests for the new exception handling in entity and activity paths, plus added the exception to the existing orchestrator platform-level exception test data.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/WebJobs.Extensions.DurableTask/OutOfProcMiddleware.cs Added FunctionTimeoutAbortException handling in entity/activity paths and IsPlatformLevelException
test/FunctionsV2/OutOfProcMiddlewareTests.cs Added tests for new exception handling; modernized syntax (collection expressions, explicit types)
Directory.Packages.props Bumped Microsoft.Azure.WebJobs from 3.0.39 to 3.0.45

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot AI review requested due to automatic review settings March 16, 2026 17:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an issue (#2939) where activities (and entities) would permanently fail instead of being retried when FunctionTimeoutAbortException is thrown — an exception that occurs when another function on the same worker exceeds its timeout, causing a worker restart. The fix adds FunctionTimeoutAbortException to the set of recognized platform-level exceptions, ensuring the affected work items are retried via SessionAbortedException.

Changes:

  • Added FunctionTimeoutAbortException checks to the entity and activity execution paths in OutOfProcMiddleware, and to the IsPlatformLevelException helper (which covers orchestrators).
  • Bumped Microsoft.Azure.WebJobs from 3.0.39 to 3.0.45 to gain access to the FunctionTimeoutAbortException type.
  • Added unit tests for the new exception handling in entity and activity paths, and added FunctionTimeoutAbortException to the existing orchestrator platform-level exception test data.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/WebJobs.Extensions.DurableTask/OutOfProcMiddleware.cs Adds FunctionTimeoutAbortException handling in entity/activity paths and to IsPlatformLevelException
Directory.Packages.props Bumps Microsoft.Azure.WebJobs to 3.0.45 for the new exception type
test/FunctionsV2/OutOfProcMiddlewareTests.cs Adds tests for entity/activity paths, adds FunctionTimeoutAbortException to orchestrator test data, plus minor code modernization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@sophiatev sophiatev merged commit 60319b3 into dev Mar 16, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Activities fail with timeouts when the .NET worker process crashes

3 participants