Skip to content

Better retries when a sub-agent fails#75

Merged
derekmisler merged 1 commit intodocker:mainfrom
derekmisler:better-retries
Mar 6, 2026
Merged

Better retries when a sub-agent fails#75
derekmisler merged 1 commit intodocker:mainfrom
derekmisler:better-retries

Conversation

@derekmisler
Copy link
Copy Markdown
Contributor

@derekmisler derekmisler commented Mar 6, 2026

Summary

When a sub-agent fails (e.g., due to API overload), the root agent can recover and still post a review — but the pipeline previously treated any non-zero exit code as a hard failure, posting a redundant "Review Failed" comment and potentially retrying the entire pipeline, causing duplicate reviews. This PR disables pipeline-level retries and adds smarter exit-code handling to distinguish a true failure from a partial success.

Changes

  • review-pr/action.yml — disable pipeline retries: Sets max-retries: "0" on the review step with an explanatory comment, since the root agent already recovers internally when sub-agents fail and retrying the pipeline produces duplicate reviews.
  • review-pr/action.yml — smarter failure detection: Exposes verbose-log-file from the review step output and, on a non-zero exit code, checks whether a pull request review was actually posted (by grepping for pullrequestreview-[0-9]+ in the log). If a review was found, the pipeline reports ⚠️ Review completed with warnings instead of ❌ Review failed and skips posting the fallback failure comment.

How to Test

  • Simulate a sub-agent failure (e.g., trigger an API overload mid-review) and confirm the pipeline exits with a warning status rather than posting a "Review Failed" comment when the root agent successfully posted a review.
  • Confirm that a genuine total failure (no review posted, non-zero exit) still posts the ❌ PR Review Failed comment to the PR as before.
  • Verify no duplicate reviews appear on a PR when a sub-agent fails and the root agent recovers.

Signed-off-by: Derek Misler <derek.misler@docker.com>
@derekmisler derekmisler changed the title Better retries when a sub-agent failes Better retries when a sub-agent fails Mar 6, 2026
@derekmisler
Copy link
Copy Markdown
Contributor Author

/describe

@derekmisler derekmisler marked this pull request as ready for review March 6, 2026 17:02
@derekmisler derekmisler requested a review from a team as a code owner March 6, 2026 17:02
@docker-agent
Copy link
Copy Markdown
Contributor

docker-agent bot commented Mar 6, 2026

✅ PR description has been generated and updated!

Copy link
Copy Markdown
Contributor

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Assessment: 🟢 APPROVE

This PR adds intelligent error recovery to handle cases where a sub-agent fails but the root agent successfully posts a review anyway. The changes are well-reasoned:

Key improvements:

  • ✅ Disables retries (max-retries: "0") at the pipeline level since the review agent has internal recovery logic
  • ✅ Adds detection for reviews posted despite non-zero exit codes by checking for pullrequestreview-[0-9]+ in verbose logs
  • ✅ Treats this scenario as a "partial success" with a warning rather than a full failure
  • ✅ Only posts error comments when no review was actually posted

Review notes:

  • The grep pattern pullrequestreview-[0-9]+ is specific enough to avoid false positives in practice — it matches GitHub's review ID format which only appears when the API successfully returns a review object
  • The error handling logic correctly distinguishes between "failed with recovery" and "failed without recovery" scenarios

No bugs found in the changed code. This improves the robustness of the review workflow.

@derekmisler derekmisler merged commit 68c6c67 into docker:main Mar 6, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants