add retries and timeouts by derekmisler · Pull Request #51 · docker/cagent-action

derekmisler · 2026-02-23T20:03:28Z

Summary

Adds retry logic with exponential backoff and timeout handling to improve reliability of agent execution. Updates cagent version to v1.23.6 across all workflows and documentation. Enhances PR review workflow with verification that reviews are actually posted and fallback error handling for token expiry scenarios.

Changes

action.yml: Implemented retry loop with exponential backoff (max 2 retries by default), added max-retries and retry-delay inputs, improved timeout handling using PIPESTATUS to distinguish timeout (124) from other failures
review-pr/action.yml: Added 20-minute timeout to prevent GitHub App token expiry, implemented review verification step to confirm bot review was posted, added fallback comments when review fails or isn't posted, improved reaction handling to use github.token (6h lifetime) instead of potentially expired App token
Version bumps: Updated cagent from v1.23.4 to v1.23.6 in all workflows, actions, and documentation
README.md: Updated retry-delay description to clarify exponential backoff behavior

Breaking Changes

None

How to Test

Trigger a workflow that uses the action and verify it completes successfully
Test retry behavior by simulating transient failures (should retry up to 2 times with exponential backoff)
Test PR review workflow and verify that reviews are posted and fallback comments appear when reviews fail

Closes: https://github.com/docker/gordon/issues/171

derekmisler · 2026-02-23T20:04:08Z

/describe

docker-agent · 2026-02-23T20:04:59Z

✅ PR description has been generated and updated!

docker-agent

Critical Issues Found

This PR adds important retry and timeout functionality, but contains critical bugs in exit code handling that will prevent the retry logic from working correctly.

Summary

2 HIGH severity issues: Exit codes are captured incorrectly in both timeout and non-timeout cases, causing all failures to be reported as success
2 MEDIUM severity issues: Timestamp comparisons could miss reviews posted in the same second

The retry logic implementation is solid, but the exit code capture bugs will completely break its functionality.

action.yml

review-pr/action.yml

derekmisler · 2026-02-23T20:56:38Z

/review

docker-agent

Review Summary

This PR successfully adds retry logic with exponential backoff and review verification. However, I've identified 3 medium-severity issues in the error handling and verification logic that should be addressed:

Clock skew vulnerability in review verification - The timestamp comparison could miss legitimate reviews if clocks are not perfectly synchronized
API error masking - API failures are silently treated as "no reviews found", making it impossible to distinguish between legitimate zero-review results and API errors
Silent fallback failures - If the fallback comment posting fails, users receive no notification

These issues don't prevent the core retry functionality from working, but they could cause confusion when verification or fallback mechanisms fail.

review-pr/action.yml

Signed-off-by: Derek Misler <derek.misler@docker.com>

derekmisler · 2026-02-23T21:16:24Z

/review

derekmisler · 2026-02-23T21:17:11Z

/describe

docker-agent · 2026-02-23T21:17:54Z

✅ PR description has been generated and updated!

outdated review

docker-agent

Review Summary

I've identified one issue in the newly added review verification logic that should be addressed to improve robustness. The retry logic and timeout handling look good overall.

The main concern is that the timestamp generation lacks validation, which could lead to incorrect verification results if both date command variants fail (though this is extremely unlikely on standard runners).

review-pr/action.yml

krissetto · 2026-02-23T22:00:25Z

review-pr/action.yml

+        fi
+
+    - name: Save reviewer memory
+      if: always()


is always necessary for some failure cases or and i missing something? feels odd, but then again this is GHA after all so...

yeah, if the previous step failed, GHA would fail the whole thing and we would not save memory. but even if the run fails, i still want to save whatever memory accumulated during the run before it failed.

krissetto · 2026-02-23T22:01:01Z

review-pr/action.yml

+        if [ "$REVIEW_VERIFIED" == "false" ]; then
+          # Agent succeeded but review wasn't posted (likely token expiry)
+          gh api "repos/${{ github.repository }}/issues/comments/${{ steps.resolve-context.outputs.comment-id }}/reactions" \
+            -X POST -f content='confused' || true


content='confused' 😂

derekmisler changed the title ~~add fallback models, retries, and timeouts~~ add retries and timeouts Feb 23, 2026

derekmisler force-pushed the add-fallback-models-retries-and-timeouts branch 3 times, most recently from 2d66210 to bf8b8fc Compare February 23, 2026 20:21

derekmisler marked this pull request as ready for review February 23, 2026 20:22

derekmisler requested a review from a team as a code owner February 23, 2026 20:22

docker-agent bot previously requested changes Feb 23, 2026

View reviewed changes

action.yml Outdated Show resolved Hide resolved

action.yml Outdated Show resolved Hide resolved

review-pr/action.yml Outdated Show resolved Hide resolved

review-pr/action.yml Outdated Show resolved Hide resolved

derekmisler force-pushed the add-fallback-models-retries-and-timeouts branch from bf8b8fc to b27edd7 Compare February 23, 2026 20:56

docker-agent bot reviewed Feb 23, 2026

View reviewed changes

review-pr/action.yml Show resolved Hide resolved

review-pr/action.yml Show resolved Hide resolved

review-pr/action.yml Outdated Show resolved Hide resolved

add retries and timeouts

3224dd6

Signed-off-by: Derek Misler <derek.misler@docker.com>

derekmisler force-pushed the add-fallback-models-retries-and-timeouts branch from b27edd7 to 3224dd6 Compare February 23, 2026 21:16

docker-agent bot reviewed Feb 23, 2026

View reviewed changes

review-pr/action.yml Show resolved Hide resolved

krissetto reviewed Feb 23, 2026

View reviewed changes

krissetto approved these changes Feb 23, 2026

View reviewed changes

derekmisler merged commit 509e721 into docker:main Feb 23, 2026
17 checks passed

Conversation

derekmisler commented Feb 23, 2026 • edited by docker-agent bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Breaking Changes

How to Test

Uh oh!

derekmisler commented Feb 23, 2026

Uh oh!

docker-agent bot commented Feb 23, 2026

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Critical Issues Found

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

derekmisler commented Feb 23, 2026

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

derekmisler commented Feb 23, 2026

Uh oh!

derekmisler commented Feb 23, 2026

Uh oh!

docker-agent bot commented Feb 23, 2026

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

Uh oh!

krissetto Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

derekmisler Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

krissetto Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

derekmisler commented Feb 23, 2026 •

edited by docker-agent bot

Loading