-
Notifications
You must be signed in to change notification settings - Fork 781
Create retry utility and mark certain errors as non-retriable #476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create retry utility and mark certain errors as non-retriable #476
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a retriable flag to CLIError and marks specific authentication and parsing error paths as non-retriable across many CLI commands. Introduces new sync/async retry utilities that consult CLIError.retriable and UnauthenticatedError to decide retry behavior. No functional control-flow changes beyond error metadata and new retry helpers. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant RetryUtils as Retry Utils
participant CLI as CLI Command / Func
Note over RetryUtils: New retry_with_exponential_backoff / retry_async_with_exponential_backoff
User->>RetryUtils: call retry_with_exponential_backoff(func)
RetryUtils->>CLI: invoke func()
alt func succeeds
CLI-->>RetryUtils: result
RetryUtils-->>User: result
else func raises
CLI-->>RetryUtils: Exception e
Note over RetryUtils: Check retryability\n- if UnauthenticatedError => non-retryable\n- if CLIError => use e.retriable\n- else => retryable
alt retryable
RetryUtils->>RetryUtils: wait (exponential backoff)
RetryUtils->>CLI: invoke func() (retry)
else non-retryable
RetryUtils-->>User: re-raise e
end
end
alt attempts exhausted
RetryUtils-->>User: raise RetryError(original_error, attempts)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
bd0f3de to
ddb1062
Compare
ddb1062 to
79b1e41
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/mcp_agent/cli/cloud/commands/workflows/list/main.py (1)
44-44: LGTM: unauthenticated path marked non‑retriable.This prevents futile retries on missing credentials and matches the new
CLIError.retriablecontract. Consider centralizing this common message/raise in a small helper to avoid drift across commands.src/mcp_agent/cli/cloud/commands/workflows/describe/main.py (1)
47-47: Auth failures mostly non‑retriable — fix logger/tail inconsistencyMost auth checks (utils.py and workflows/*) already raise CLIError(..., retriable=False); however src/mcp_agent/cli/cloud/commands/logger/tail/main.py raises "Authentication failed. Try running 'mcp-agent login'" without retriable=False (around lines 237 and 242). Change those raises to include retriable=False and optionally add an early effective_api_key check to fail fast and avoid unnecessary network calls.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
src/mcp_agent/cli/cloud/commands/app/status/main.py(2 hunks)src/mcp_agent/cli/cloud/commands/auth/login/main.py(2 hunks)src/mcp_agent/cli/cloud/commands/auth/whoami/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/utils.py(3 hunks)src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/describe/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/list/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/resume/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/runs/main.py(1 hunks)src/mcp_agent/cli/exceptions.py(1 hunks)src/mcp_agent/cli/secrets/processor.py(3 hunks)src/mcp_agent/cli/utils/retry.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- src/mcp_agent/cli/exceptions.py
- src/mcp_agent/cli/cloud/commands/workflows/runs/main.py
- src/mcp_agent/cli/cloud/commands/auth/whoami/main.py
- src/mcp_agent/cli/secrets/processor.py
- src/mcp_agent/cli/cloud/commands/app/status/main.py
- src/mcp_agent/cli/cloud/commands/utils.py
- src/mcp_agent/cli/cloud/commands/workflows/resume/main.py
- src/mcp_agent/cli/cloud/commands/auth/login/main.py
- src/mcp_agent/cli/utils/retry.py
🧰 Additional context used
🧬 Code graph analysis (3)
src/mcp_agent/cli/cloud/commands/workflows/list/main.py (1)
src/mcp_agent/cli/exceptions.py (1)
CLIError(4-10)
src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py (1)
src/mcp_agent/cli/exceptions.py (1)
CLIError(4-10)
src/mcp_agent/cli/cloud/commands/workflows/describe/main.py (1)
src/mcp_agent/cli/exceptions.py (1)
CLIError(4-10)
🔇 Additional comments (1)
src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py (1)
42-42: LGTM: cancel requires auth and is correctly non‑retriable on missing login.Change aligns with retry behavior; no issues spotted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/mcp_agent/cli/cloud/commands/workflows/list/main.py (1)
72-80: Don’t swallow errors; re-raise as CLIError and preserve retriable flagThe inner except prints and returns success (exit code 0), masking failures and bypassing retry classification. Propagate CLIError as-is and wrap unknown exceptions with retriable=True so upstream retry logic can engage.
- except Exception as e: - print_error( - f"Error listing workflows for server {server_id_or_url}: {str(e)}" - ) + except CLIError: + # Already classified; let it bubble up. + raise + except Exception as e: + # Treat as transient by default; retry utility can decide. + raise CLIError( + f"Error listing workflows for server {server_id_or_url}: {e}", + retriable=True, + ) from e @@ - except Exception as e: - raise CLIError( - f"Error listing workflows for server {server_id_or_url}: {str(e)}" - ) from e + except CLIError: + raise + except Exception as e: + raise CLIError( + f"Error listing workflows for server {server_id_or_url}: {e}", + retriable=True, + ) from e
🧹 Nitpick comments (2)
src/mcp_agent/cli/cloud/commands/workflows/list/main.py (2)
34-36: Mark “server not deployed/has no URL” as non-retriableThis is a deterministic validation/state error, not a transient failure. Recommend setting retriable=False to avoid pointless retries.
- raise CLIError( - f"Server '{server_id_or_url}' is not deployed or has no server URL" - ) + raise CLIError( + f"Server '{server_id_or_url}' is not deployed or has no server URL", + retriable=False, + )
39-39: Also mark missing server URL as non-retriableSame reasoning: this won’t succeed by retrying.
- raise CLIError(f"No server URL found for server '{server_id_or_url}'") + raise CLIError( + f"No server URL found for server '{server_id_or_url}'", + retriable=False, + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
src/mcp_agent/cli/cloud/commands/app/status/main.py(2 hunks)src/mcp_agent/cli/cloud/commands/auth/login/main.py(2 hunks)src/mcp_agent/cli/cloud/commands/auth/whoami/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/utils.py(3 hunks)src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/describe/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/list/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/resume/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/runs/main.py(1 hunks)src/mcp_agent/cli/exceptions.py(1 hunks)src/mcp_agent/cli/secrets/processor.py(3 hunks)src/mcp_agent/cli/utils/retry.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
- src/mcp_agent/cli/cloud/commands/workflows/describe/main.py
- src/mcp_agent/cli/secrets/processor.py
- src/mcp_agent/cli/cloud/commands/utils.py
- src/mcp_agent/cli/exceptions.py
- src/mcp_agent/cli/cloud/commands/auth/login/main.py
- src/mcp_agent/cli/cloud/commands/app/status/main.py
- src/mcp_agent/cli/utils/retry.py
- src/mcp_agent/cli/cloud/commands/workflows/runs/main.py
- src/mcp_agent/cli/cloud/commands/auth/whoami/main.py
- src/mcp_agent/cli/cloud/commands/workflows/resume/main.py
- src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/mcp_agent/cli/cloud/commands/workflows/list/main.py (1)
src/mcp_agent/cli/exceptions.py (1)
CLIError(4-10)
🔇 Additional comments (1)
src/mcp_agent/cli/cloud/commands/workflows/list/main.py (1)
44-44: Auth error correctly marked non-retriable — LGTMThis aligns with the retry policy and prevents wasted retries when no API key is present.
79b1e41 to
a3ed6b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
src/mcp_agent/cli/cloud/commands/workflows/runs/main.py (1)
78-86: Avoid swallowing inner exceptions; let the outer handler show one clean error.Current block prints an error but completes with exit code 0. Re‑raise so the outer except wraps it in a CLIError.
Apply:
- except Exception as e: - print_error( - f"Error listing workflow runs for server {server_id_or_url}: {str(e)}" - ) + except Exception: + # Let the outer handler format a single, clean error message and exit non-zero + raisesrc/mcp_agent/cli/cloud/commands/utils.py (1)
46-51: Prefer Typer’s parameter error for invalid format.This is a CLI option validation case; Typer’s BadParameter renders better UX and exit codes.
Apply:
+import typer @@ - if format not in valid_formats: - raise CLIError( - f"Invalid format '{format}'. Valid options are: {', '.join(valid_formats)}", - retriable=False - ) + if format not in valid_formats: + raise typer.BadParameter( + f"Invalid format '{format}'. Valid options are: {', '.join(valid_formats)}" + )src/mcp_agent/cli/utils/retry.py (4)
68-73: Validate inputs to avoid surprising RuntimeError when max_attempts < 1.Guard early.
Apply:
if retryable_check is None: retryable_check = is_retryable_error + if max_attempts < 1: + raise ValueError("max_attempts must be >= 1") + if initial_delay <= 0: + raise ValueError("initial_delay must be > 0") + if backoff_multiplier < 1.0: + raise ValueError("backoff_multiplier must be >= 1.0") + last_exception = None delay = initial_delayRepeat the same validation in the async variant.
85-86: Add light jitter to backoff to reduce thundering herd.Small randomization helps when many clients retry simultaneously.
Apply (and add
import randomat top):- time.sleep(delay) + # Add jitter: 0.5x–1.5x + time.sleep(delay * (0.5 + random.random())) delay = min(delay * backoff_multiplier, max_delay)Mirror this change in the async function (
await asyncio.sleep(...)).
126-131: Async: add the same input validation as sync.Apply:
if retryable_check is None: retryable_check = is_retryable_error + if max_attempts < 1: + raise ValueError("max_attempts must be >= 1") + if initial_delay <= 0: + raise ValueError("initial_delay must be > 0") + if backoff_multiplier < 1.0: + raise ValueError("backoff_multiplier must be >= 1.0") + last_exception = None delay = initial_delay
143-144: Async: apply jitter as in sync.Apply:
- await asyncio.sleep(delay) + await asyncio.sleep(delay * (0.5 + random.random())) delay = min(delay * backoff_multiplier, max_delay)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
src/mcp_agent/cli/cloud/commands/app/status/main.py(2 hunks)src/mcp_agent/cli/cloud/commands/auth/login/main.py(2 hunks)src/mcp_agent/cli/cloud/commands/auth/whoami/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/utils.py(3 hunks)src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/describe/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/list/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/resume/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/runs/main.py(1 hunks)src/mcp_agent/cli/exceptions.py(1 hunks)src/mcp_agent/cli/secrets/processor.py(3 hunks)src/mcp_agent/cli/utils/retry.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- src/mcp_agent/cli/cloud/commands/workflows/describe/main.py
- src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py
- src/mcp_agent/cli/secrets/processor.py
- src/mcp_agent/cli/cloud/commands/workflows/list/main.py
- src/mcp_agent/cli/cloud/commands/auth/whoami/main.py
- src/mcp_agent/cli/cloud/commands/auth/login/main.py
- src/mcp_agent/cli/cloud/commands/workflows/resume/main.py
🧰 Additional context used
🧬 Code graph analysis (3)
src/mcp_agent/cli/cloud/commands/workflows/runs/main.py (1)
src/mcp_agent/cli/exceptions.py (1)
CLIError(4-10)
src/mcp_agent/cli/cloud/commands/utils.py (1)
src/mcp_agent/cli/exceptions.py (1)
CLIError(4-10)
src/mcp_agent/cli/utils/retry.py (3)
src/mcp_agent/cli/core/api_client.py (1)
UnauthenticatedError(9-12)src/mcp_agent/cli/exceptions.py (1)
CLIError(4-10)src/mcp_agent/cli/utils/ux.py (1)
print_warning(66-77)
🔇 Additional comments (8)
src/mcp_agent/cli/cloud/commands/workflows/runs/main.py (1)
49-51: Correctly marked as non‑retriable.Not having an API key should fail fast; setting retriable=False is correct.
src/mcp_agent/cli/cloud/commands/app/status/main.py (2)
60-62: Fail‑fast on missing auth is correct.retriable=False matches intended behavior.
96-98: Non‑retriable invalid key is correct.Mapping UnauthenticatedError to CLIError(retriable=False) is consistent with the retry policy.
src/mcp_agent/cli/cloud/commands/utils.py (2)
31-33: Good: unauthenticated path is non‑retriable.
100-104: Good: invalid API key mapped to non‑retriable.src/mcp_agent/cli/utils/retry.py (2)
15-18: RetryError looks good.Carries context and a clear message.
30-36: Retryability predicate is sensible and composable.Default rules (no retries for UnauthenticatedError, defer to CLIError.retriable, otherwise retry) align with PR goals.
src/mcp_agent/cli/exceptions.py (1)
7-10: Approve — repo scan clear.Default-True retriable flag is sensible and backward-compatible. Repository scan found no instantiations passing a boolean as the 2nd positional argument and no CLIError/CLIException calls with 3+ positional args; usages are single-positional or use keyword args (examples: src/mcp_agent/cli/cloud/commands/auth/whoami/main.py, src/mcp_agent/cli/secrets/processor.py, src/mcp_agent/cli/cloud/commands/utils.py).
a3ed6b7 to
17040ab
Compare
e4678fd to
25909e8
Compare
17040ab to
5b8c790
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/mcp_agent/cli/utils/retry.py (1)
88-94: Finalization logic looks correct (fix addresses prior bot comment)Break/raise paths now re-raise non‑retriables and wrap retriable failures in RetryError. Good.
Also applies to: 146-152
🧹 Nitpick comments (5)
src/mcp_agent/cli/utils/retry.py (5)
21-36: Also treat cancellations as non‑retriableAvoid retrying on asyncio cancellations to preserve cooperative cancel semantics.
Apply:
def is_retryable_error(error: Exception) -> bool: @@ - if isinstance(error, UnauthenticatedError): + if isinstance(error, UnauthenticatedError): return False + + # Do not retry cancellations + if isinstance(error, asyncio.CancelledError): + return False
39-48: Stronger typing for callables (ParamSpec/TypeVar) and async signatureImproves IDE/type-checker help; keeps return types precise.
Apply:
-import asyncio -import time -from typing import Any, Callable, Optional +import asyncio +import time +from typing import Any, Callable, Optional, TypeVar, ParamSpec, Awaitable @@ -def retry_with_exponential_backoff( - func: Callable, +P = ParamSpec("P") +R = TypeVar("R") + +def retry_with_exponential_backoff( + func: Callable[P, R], @@ - *args, - **kwargs -) -> Any: + *args: P.args, + **kwargs: P.kwargs, +) -> R: @@ -async def retry_async_with_exponential_backoff( - func: Callable, +async def retry_async_with_exponential_backoff( + func: Callable[P, Awaitable[R]], @@ - *args, - **kwargs -) -> Any: + *args: P.args, + **kwargs: P.kwargs, +) -> R:Also applies to: 97-106, 3-5
68-73: Validate retry parameters earlyPrevents silent misconfiguration (e.g., negative delays, zero attempts).
Apply:
if retryable_check is None: retryable_check = is_retryable_error last_exception = None delay = initial_delay + + if max_attempts < 1: + raise ValueError("max_attempts must be >= 1") + if initial_delay < 0: + raise ValueError("initial_delay must be >= 0") + if backoff_multiplier < 1.0: + raise ValueError("backoff_multiplier must be >= 1.0") + if max_delay < 0: + raise ValueError("max_delay must be >= 0") + if max_delay < delay: + delay = max_delay
83-87: Add jitter to backoff to avoid sync retriesReduces herd effects when many clients retry simultaneously.
Apply:
-import asyncio +import asyncio +import random @@ - print_warning(f"Attempt {attempt}/{max_attempts} failed: {e}. Retrying in {delay:.1f}s...") - - time.sleep(delay) + sleep_for = random.uniform(0, delay) + print_warning(f"Attempt {attempt}/{max_attempts} failed: {e}. Retrying in {sleep_for:.1f}s...") + time.sleep(sleep_for) delay = min(delay * backoff_multiplier, max_delay) @@ - print_warning(f"Attempt {attempt}/{max_attempts} failed: {e}. Retrying in {delay:.1f}s...") - - await asyncio.sleep(delay) + sleep_for = random.uniform(0, delay) + print_warning(f"Attempt {attempt}/{max_attempts} failed: {e}. Retrying in {sleep_for:.1f}s...") + await asyncio.sleep(sleep_for) delay = min(delay * backoff_multiplier, max_delay)Also applies to: 141-145, 3-4
64-67: Clarify Raises docstring to match behaviorCurrently RetryError is only raised when max_attempts > 1. Reflect that to avoid confusion.
Apply:
- RetryError: If all attempts fail with a retryable error + RetryError: If attempts > 1 and all attempts fail with a retryable error
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
src/mcp_agent/cli/cloud/commands/app/status/main.py(2 hunks)src/mcp_agent/cli/cloud/commands/auth/login/main.py(2 hunks)src/mcp_agent/cli/cloud/commands/auth/whoami/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/utils.py(3 hunks)src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/describe/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/list/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/resume/main.py(1 hunks)src/mcp_agent/cli/cloud/commands/workflows/runs/main.py(1 hunks)src/mcp_agent/cli/exceptions.py(1 hunks)src/mcp_agent/cli/secrets/processor.py(3 hunks)src/mcp_agent/cli/utils/retry.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
- src/mcp_agent/cli/cloud/commands/workflows/describe/main.py
- src/mcp_agent/cli/cloud/commands/workflows/list/main.py
- src/mcp_agent/cli/exceptions.py
- src/mcp_agent/cli/cloud/commands/auth/whoami/main.py
- src/mcp_agent/cli/cloud/commands/workflows/runs/main.py
- src/mcp_agent/cli/secrets/processor.py
- src/mcp_agent/cli/cloud/commands/auth/login/main.py
- src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py
- src/mcp_agent/cli/cloud/commands/utils.py
- src/mcp_agent/cli/cloud/commands/workflows/resume/main.py
- src/mcp_agent/cli/cloud/commands/app/status/main.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/mcp_agent/cli/utils/retry.py (4)
src/mcp_agent/cli/core/api_client.py (1)
UnauthenticatedError(9-12)src/mcp_agent/cli/exceptions.py (1)
CLIError(4-10)src/mcp_agent/cli/utils/ux.py (1)
print_warning(66-77)src/mcp_agent/logging/logger.py (1)
error(291-299)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks / test
🔇 Additional comments (1)
src/mcp_agent/cli/utils/retry.py (1)
12-19: Well-scoped RetryError with cause retention — LGTMCarries original error and attempts; preserves cause with "from". Good.
Merge activity
|

TL;DR
Added retry mechanism for CLI commands with proper error handling for non-retriable errors.
What changed?
retriableflag toCLIErrorclass to indicate whether an error can be retriedretriable=Falseto relevant error messagessrc/mcp_agent/cli/utils/retry.py) with:retriableflagRetryErrorclass to provide context about retry failuresHow to test?
Summary by CodeRabbit