-
Notifications
You must be signed in to change notification settings - Fork 104
Closed
Labels
Dapr-Agents-1.0P2documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request
Description
Extract and isolate resiliency logic (retry/backoff, error handling, etc.) from the linked issue: dapr/dapr-agents#167
Evaluate where resiliency should be applied in our DurableAgent and orchestrator workflows, and define a smart resiliency policy (i.e. one that knows when not to retry based on error type, not just blanket backoffs).
This applies to:
- DurableAgent workflow activities
- Orchestrator workflow activities
- External calls, especially LLM provider integrations
- Other potential areas where resiliency might be required
We need a "smart" resiliency policy that:
- Applies retry/backoff logic for transient failures (e.g. network timeouts, intermittent service outages)
- Does not retry for non-transient failures (e.g. invalid credentials, “out of credits”, malformed request)
- Logs or surfaces the classification of errors so it’s clear when resiliency kicked in vs when we aborted due to non-recoverable error
Acceptance Criteria
- Identify and specify where resiliency should be added in the DurableAgent workflow activities.
- Identify and specify where resiliency should be added in the orchestrator workflow activities.
- Define and bring in the concept of smart resiliency for external calls such as LLM providers—i.e. classify errors, decide when to retry vs abort.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Dapr-Agents-1.0P2documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request
Type
Projects
Status
Just Shipped