Add stop policy to worker retry loop #1568

creatorrr · 2025-09-17T18:49:24Z

User description

Summary

import tenacity's stop_after_attempt utility for the worker entrypoint retry decorator
cap retries at five attempts (~100 seconds) with an AIDEV note explaining the resiliency versus fail-fast balance

Testing

not run (not requested)

https://chatgpt.com/codex/tasks/task_b_68cb0221656483219bea456acd286bfa

PR Type

Enhancement

Description

Add retry limit to worker main function
Import stop_after_attempt from tenacity library
Cap retries at 5 attempts with explanatory comment

Diagram Walkthrough

flowchart LR
  A["Import stop_after_attempt"] --> B["Add stop policy to retry decorator"]
  B --> C["Limit retries to 5 attempts"]
  C --> D["Add AIDEV comment explaining policy"]

File Walkthrough

Relevant files

Enhancement

__main__.py `Add retry limit to worker retry decorator` src/agents-api/agents_api/worker/main.py Import `stop_after_attempt` from tenacity library Add `stop=stop_after_attempt(5)` to retry decorator Add AIDEV comment explaining retry policy rationale	+4/-1

Important

Adds a retry stop policy to the worker's main function in __main__.py, limiting retries to five attempts using stop_after_attempt.

Behavior:
- Adds stop_after_attempt(5) to the @retry decorator in __main__.py to limit retries to five attempts (~100 seconds total).
- Includes an AIDEV note explaining the balance between resiliency and fail-fast behavior.
Imports:
- Imports stop_after_attempt from tenacity in __main__.py.

^{This description was created by}^{for 6e9c489. You can customize this summary. It will automatically update as commits are pushed.}

qodo-merge-for-open-source · 2025-09-17T18:49:41Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Retry Policy Fit Confirm that capping retries at 5 with 20s wait (≈100s total) aligns with operational expectations for worker liveness and orchestration backoff, and that failing after 5 attempts does not prematurely terminate during longer transient outages. # AIDEV-NOTE: Allow up to five retries (~100s total) so transient Temporal outages # can recover while still failing fast enough for orchestration to replace the worker. @retry( wait=wait_fixed(20), retry=retry_if_exception_type(RuntimeError), stop=stop_after_attempt(5), after=after_log(logger, logging.DEBUG), ) Exception Scope Validate that retrying only on RuntimeError is sufficient; if Temporal client/connectivity errors raise different exception types, they may bypass retries unexpectedly. @retry( wait=wait_fixed(20), retry=retry_if_exception_type(RuntimeError), stop=stop_after_attempt(5), after=after_log(logger, logging.DEBUG),

github-actions · 2025-09-17T18:49:43Z

Claude finished @creatorrr's task —— View job

Changelog Bot Analysis

Todo List:

Read diff.patch to understand changes
Determine if changes are user-visible
Create changelog entry if needed
Commit and push changes if applicable

Analysis:
The changes modify the worker retry logic by:

Adding stop_after_attempt(5) to limit retries to 5 attempts
Adding explanatory comment about the ~100s retry window

Decision: No changelog entry needed. This is an internal infrastructure change that improves worker resilience but doesn't affect user-facing functionality. Users don't directly interact with the worker retry mechanism - this is purely an operational improvement for system stability.

Result: No action taken - exiting without writing, committing, or making changes as per changelog bot rules for non-user-visible changes.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 6e9c489 in 57 seconds. Click for details.

Reviewed 26 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/agents-api/agents_api/worker/__main__.py:11

Draft comment:
Good: Added stop_after_attempt import to limit retries.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. src/agents-api/agents_api/worker/__main__.py:25

Draft comment:
AIDEV-NOTE clarifies retry rationale. Consider removing or adjusting internal notes in production.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

3. src/agents-api/agents_api/worker/__main__.py:30

Draft comment:
Using stop_after_attempt(5) appropriately limits retries. Confirm that catching only RuntimeError is intended.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_yF95vU8rD09UVuWN

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

qodo-merge-for-open-source · 2025-09-17T18:50:28Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Expand retry exception types Expand the retry condition to include more specific transient exceptions, such as `ConnectionError` and `TimeoutError`, in addition to `RuntimeError`. src/agents-api/agents_api/worker/main.py [27-32] @retry( wait=wait_fixed(20), - retry=retry_if_exception_type(RuntimeError), + retry=retry_if_exception_type((RuntimeError, ConnectionError, TimeoutError)), stop=stop_after_attempt(5), after=after_log(logger, logging.DEBUG), ) Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion improves the robustness of the retry logic by proposing to catch more specific and relevant transient exceptions (`ConnectionError`, `TimeoutError`), which aligns well with the PR's goal of handling transient Temporal outages.	Medium
More

claude · 2025-09-17T18:56:12Z

Code Review for PR #1568: Add stop policy to worker retry loop

Overall Assessment

This is a good improvement that adds necessary bounds to the retry logic in the worker's main function. The change prevents infinite retry loops while maintaining reasonable resilience against transient failures.

Strengths

Good retry strategy: The 5-attempt limit (~100 seconds total with 20-second waits) strikes a reasonable balance between resilience and fail-fast behavior
Clear documentation: The AIDEV-NOTE comment effectively explains the rationale behind the retry policy
Consistent with codebase patterns: The implementation aligns with retry patterns used elsewhere in the codebase (e.g., rec_sum/generate.py also uses stop_after_attempt(5))
Proper import: The stop_after_attempt import is correctly added to the existing tenacity imports

Observations & Suggestions

Consider logging enhancement: When the retry limit is reached, it might be helpful to log a more explicit message. The current after_log will show retry attempts, but consider adding explicit error handling when all retries are exhausted by adding reraise=True to explicitly reraise after all retries.
Configuration consideration: For production environments, you might want to make the retry count configurable via environment variables, similar to how other service parameters are configured:
- WORKER_MAX_RETRIES = int(os.getenv('WORKER_MAX_RETRIES', '5'))
- Then use: stop=stop_after_attempt(WORKER_MAX_RETRIES)
Error specificity: Currently retrying only on RuntimeError. Consider if there are other transient errors (like ConnectionError or specific Temporal client errors) that should also trigger retries.

No Critical Issues Found

No security vulnerabilities identified
No performance degradation expected
Change is backward compatible
Dependencies are properly managed (tenacity>=9.0.0 is already in pyproject.toml)

Test Coverage Note

While no direct tests for main.py were found (which is common for entry point files), the retry behavior could be valuable to verify in integration tests, especially to ensure the worker properly fails after exhausting retries.

Approval

This change improves the robustness of the worker service by preventing infinite retry loops while maintaining reasonable resilience. The implementation is clean and well-documented.

Recommendation: Ready to merge after considering the optional suggestions above.

[AI] Add stop policy to worker retry

6e9c489

creatorrr added the codex label Sep 17, 2025 — with ChatGPT Codex Connector

qodo-merge-for-open-source bot added the Review effort 2/5 label Sep 17, 2025

ellipsis-dev bot reviewed Sep 17, 2025

View reviewed changes

Ahmad-mtos approved these changes Sep 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add stop policy to worker retry loop #1568

Add stop policy to worker retry loop #1568

Uh oh!

creatorrr commented Sep 17, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

qodo-merge-for-open-source bot commented Sep 17, 2025

Uh oh!

github-actions bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

qodo-merge-for-open-source bot commented Sep 17, 2025

Uh oh!

claude bot commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add stop policy to worker retry loop #1568

Are you sure you want to change the base?

Add stop policy to worker retry loop #1568

Uh oh!

Conversation

creatorrr commented Sep 17, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Testing

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-merge-for-open-source bot commented Sep 17, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog Bot Analysis

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

qodo-merge-for-open-source bot commented Sep 17, 2025

PR Code Suggestions ✨

Uh oh!

claude bot commented Sep 17, 2025

Code Review for PR #1568: Add stop policy to worker retry loop

Overall Assessment

Strengths

Observations & Suggestions

No Critical Issues Found

Test Coverage Note

Approval

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

creatorrr commented Sep 17, 2025 •

edited by ellipsis-dev bot

Loading

github-actions bot commented Sep 17, 2025 •

edited

Loading