Skip to content

Conversation

@jtcorbett
Copy link
Contributor

@jtcorbett jtcorbett commented Sep 15, 2025

TL;DR

Added retry mechanism for CLI commands with proper error handling for non-retriable errors.

What changed?

  • Added a new retriable flag to CLIError class to indicate whether an error can be retried
  • Set authentication and validation errors as non-retriable by adding retriable=False to relevant error messages
  • Created a new retry utility module (src/mcp_agent/cli/utils/retry.py) with:
    • Functions to handle exponential backoff retries for both synchronous and asynchronous operations
    • Logic to determine if errors are retriable based on error type and the new retriable flag
    • A RetryError class to provide context about retry failures

How to test?

  1. Test CLI commands that previously failed with authentication errors to ensure they still fail appropriately
  2. Verify that validation errors (like invalid API keys or formats) fail immediately without retries
  3. Test network-related operations to confirm they properly retry on transient failures
  4. Check that the exponential backoff works as expected with increasing delays between retry attempts

Summary by CodeRabbit

  • New Features
    • Adds automatic retry utilities with exponential backoff for CLI operations to improve reliability.
  • Bug Fixes
    • Marks authentication- and parsing-related failures as non-retriable to avoid futile retries (login, whoami, workflows, secrets, and related commands).
    • Ensures unauthenticated and expired-token errors are handled consistently across commands.
  • Chores
    • Standardizes error metadata to include an explicit retriable flag without changing messages or prompts.

@coderabbitai
Copy link

coderabbitai bot commented Sep 15, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a retriable flag to CLIError and marks specific authentication and parsing error paths as non-retriable across many CLI commands. Introduces new sync/async retry utilities that consult CLIError.retriable and UnauthenticatedError to decide retry behavior. No functional control-flow changes beyond error metadata and new retry helpers.

Changes

Cohort / File(s) Summary
Exceptions API
src/mcp_agent/cli/exceptions.py
Add retriable: bool = True parameter to CLIError.__init__ and store self.retriable. Existing message/exit_code behavior unchanged.
Retry utilities (new)
src/mcp_agent/cli/utils/retry.py
New module: RetryError, is_retryable_error, retry_with_exponential_backoff, retry_async_with_exponential_backoff. Uses CLIError.retriable and UnauthenticatedError to decide retries; prints warnings and applies exponential backoff.
Auth & status commands
src/mcp_agent/cli/cloud/commands/app/status/main.py, src/mcp_agent/cli/cloud/commands/auth/login/main.py, src/mcp_agent/cli/cloud/commands/auth/whoami/main.py, src/mcp_agent/cli/cloud/commands/utils.py
Update CLIError raises for missing/invalid API key or unauthenticated states to include retriable=False. Messages unchanged; control flow unchanged.
Workflows commands
src/mcp_agent/cli/cloud/commands/workflows/.../main.py
src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py, .../describe/main.py, .../list/main.py, .../resume/main.py, .../runs/main.py
Mark unauthenticated error paths as non-retriable by passing retriable=False to CLIError. No other logic changes.
Secrets processing
src/mcp_agent/cli/secrets/processor.py
Add retriable=False to CLIError raises for missing API key and YAML parsing failures; messages unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant RetryUtils as Retry Utils
  participant CLI as CLI Command / Func
  Note over RetryUtils: New retry_with_exponential_backoff / retry_async_with_exponential_backoff

  User->>RetryUtils: call retry_with_exponential_backoff(func)
  RetryUtils->>CLI: invoke func()
  alt func succeeds
    CLI-->>RetryUtils: result
    RetryUtils-->>User: result
  else func raises
    CLI-->>RetryUtils: Exception e
    Note over RetryUtils: Check retryability\n- if UnauthenticatedError => non-retryable\n- if CLIError => use e.retriable\n- else => retryable
    alt retryable
      RetryUtils->>RetryUtils: wait (exponential backoff)
      RetryUtils->>CLI: invoke func() (retry)
    else non-retryable
      RetryUtils-->>User: re-raise e
    end
  end
  alt attempts exhausted
    RetryUtils-->>User: raise RetryError(original_error, attempts)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • rholinshead
  • petersonbill64
  • saqadri

Poem

A rabbit taps keys soft and light,
Marks which errors take a flight.
If keys are lost, we won’t repeat,
Backoff waits with patient feet.
Hops of code — concise and bright. 🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately and concisely summarizes the primary changes in the PR: adding a retry utility and marking certain errors as non-retriable, which matches the PR objectives and the file-level changes (new retry module and retriable=False on CLIError raises). It is a single clear sentence, specific about the main intent, and free of noisy wording, so a reviewer scanning history will understand the primary change.
Docstring Coverage ✅ Passed Docstring coverage is 85.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 09-15-create_retry_utility_and_mark_certain_errors_as_non-retriable

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor Author

jtcorbett commented Sep 15, 2025

@jtcorbett jtcorbett marked this pull request as ready for review September 15, 2025 19:10
@jtcorbett jtcorbett force-pushed the 09-15-create_retry_utility_and_mark_certain_errors_as_non-retriable branch 2 times, most recently from bd0f3de to ddb1062 Compare September 15, 2025 19:20
@jtcorbett jtcorbett force-pushed the 09-15-create_retry_utility_and_mark_certain_errors_as_non-retriable branch from ddb1062 to 79b1e41 Compare September 15, 2025 19:26
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/mcp_agent/cli/cloud/commands/workflows/list/main.py (1)

44-44: LGTM: unauthenticated path marked non‑retriable.

This prevents futile retries on missing credentials and matches the new CLIError.retriable contract. Consider centralizing this common message/raise in a small helper to avoid drift across commands.

src/mcp_agent/cli/cloud/commands/workflows/describe/main.py (1)

47-47: Auth failures mostly non‑retriable — fix logger/tail inconsistency

Most auth checks (utils.py and workflows/*) already raise CLIError(..., retriable=False); however src/mcp_agent/cli/cloud/commands/logger/tail/main.py raises "Authentication failed. Try running 'mcp-agent login'" without retriable=False (around lines 237 and 242). Change those raises to include retriable=False and optionally add an early effective_api_key check to fail fast and avoid unnecessary network calls.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65929ef and ddb1062.

📒 Files selected for processing (12)
  • src/mcp_agent/cli/cloud/commands/app/status/main.py (2 hunks)
  • src/mcp_agent/cli/cloud/commands/auth/login/main.py (2 hunks)
  • src/mcp_agent/cli/cloud/commands/auth/whoami/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/utils.py (3 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/describe/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/list/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/resume/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/runs/main.py (1 hunks)
  • src/mcp_agent/cli/exceptions.py (1 hunks)
  • src/mcp_agent/cli/secrets/processor.py (3 hunks)
  • src/mcp_agent/cli/utils/retry.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • src/mcp_agent/cli/exceptions.py
  • src/mcp_agent/cli/cloud/commands/workflows/runs/main.py
  • src/mcp_agent/cli/cloud/commands/auth/whoami/main.py
  • src/mcp_agent/cli/secrets/processor.py
  • src/mcp_agent/cli/cloud/commands/app/status/main.py
  • src/mcp_agent/cli/cloud/commands/utils.py
  • src/mcp_agent/cli/cloud/commands/workflows/resume/main.py
  • src/mcp_agent/cli/cloud/commands/auth/login/main.py
  • src/mcp_agent/cli/utils/retry.py
🧰 Additional context used
🧬 Code graph analysis (3)
src/mcp_agent/cli/cloud/commands/workflows/list/main.py (1)
src/mcp_agent/cli/exceptions.py (1)
  • CLIError (4-10)
src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py (1)
src/mcp_agent/cli/exceptions.py (1)
  • CLIError (4-10)
src/mcp_agent/cli/cloud/commands/workflows/describe/main.py (1)
src/mcp_agent/cli/exceptions.py (1)
  • CLIError (4-10)
🔇 Additional comments (1)
src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py (1)

42-42: LGTM: cancel requires auth and is correctly non‑retriable on missing login.

Change aligns with retry behavior; no issues spotted.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/mcp_agent/cli/cloud/commands/workflows/list/main.py (1)

72-80: Don’t swallow errors; re-raise as CLIError and preserve retriable flag

The inner except prints and returns success (exit code 0), masking failures and bypassing retry classification. Propagate CLIError as-is and wrap unknown exceptions with retriable=True so upstream retry logic can engage.

-            except Exception as e:
-                print_error(
-                    f"Error listing workflows for server {server_id_or_url}: {str(e)}"
-                )
+            except CLIError:
+                # Already classified; let it bubble up.
+                raise
+            except Exception as e:
+                # Treat as transient by default; retry utility can decide.
+                raise CLIError(
+                    f"Error listing workflows for server {server_id_or_url}: {e}",
+                    retriable=True,
+                ) from e
@@
-    except Exception as e:
-        raise CLIError(
-            f"Error listing workflows for server {server_id_or_url}: {str(e)}"
-        ) from e
+    except CLIError:
+        raise
+    except Exception as e:
+        raise CLIError(
+            f"Error listing workflows for server {server_id_or_url}: {e}",
+            retriable=True,
+        ) from e
🧹 Nitpick comments (2)
src/mcp_agent/cli/cloud/commands/workflows/list/main.py (2)

34-36: Mark “server not deployed/has no URL” as non-retriable

This is a deterministic validation/state error, not a transient failure. Recommend setting retriable=False to avoid pointless retries.

-            raise CLIError(
-                f"Server '{server_id_or_url}' is not deployed or has no server URL"
-            )
+            raise CLIError(
+                f"Server '{server_id_or_url}' is not deployed or has no server URL",
+                retriable=False,
+            )

39-39: Also mark missing server URL as non-retriable

Same reasoning: this won’t succeed by retrying.

-            raise CLIError(f"No server URL found for server '{server_id_or_url}'")
+            raise CLIError(
+                f"No server URL found for server '{server_id_or_url}'",
+                retriable=False,
+            )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ddb1062 and 79b1e41.

📒 Files selected for processing (12)
  • src/mcp_agent/cli/cloud/commands/app/status/main.py (2 hunks)
  • src/mcp_agent/cli/cloud/commands/auth/login/main.py (2 hunks)
  • src/mcp_agent/cli/cloud/commands/auth/whoami/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/utils.py (3 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/describe/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/list/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/resume/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/runs/main.py (1 hunks)
  • src/mcp_agent/cli/exceptions.py (1 hunks)
  • src/mcp_agent/cli/secrets/processor.py (3 hunks)
  • src/mcp_agent/cli/utils/retry.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
  • src/mcp_agent/cli/cloud/commands/workflows/describe/main.py
  • src/mcp_agent/cli/secrets/processor.py
  • src/mcp_agent/cli/cloud/commands/utils.py
  • src/mcp_agent/cli/exceptions.py
  • src/mcp_agent/cli/cloud/commands/auth/login/main.py
  • src/mcp_agent/cli/cloud/commands/app/status/main.py
  • src/mcp_agent/cli/utils/retry.py
  • src/mcp_agent/cli/cloud/commands/workflows/runs/main.py
  • src/mcp_agent/cli/cloud/commands/auth/whoami/main.py
  • src/mcp_agent/cli/cloud/commands/workflows/resume/main.py
  • src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/mcp_agent/cli/cloud/commands/workflows/list/main.py (1)
src/mcp_agent/cli/exceptions.py (1)
  • CLIError (4-10)
🔇 Additional comments (1)
src/mcp_agent/cli/cloud/commands/workflows/list/main.py (1)

44-44: Auth error correctly marked non-retriable — LGTM

This aligns with the retry policy and prevents wasted retries when no API key is present.

@jtcorbett jtcorbett force-pushed the 09-15-create_retry_utility_and_mark_certain_errors_as_non-retriable branch from 79b1e41 to a3ed6b7 Compare September 15, 2025 20:51
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
src/mcp_agent/cli/cloud/commands/workflows/runs/main.py (1)

78-86: Avoid swallowing inner exceptions; let the outer handler show one clean error.

Current block prints an error but completes with exit code 0. Re‑raise so the outer except wraps it in a CLIError.

Apply:

-            except Exception as e:
-                print_error(
-                    f"Error listing workflow runs for server {server_id_or_url}: {str(e)}"
-                )
+            except Exception:
+                # Let the outer handler format a single, clean error message and exit non-zero
+                raise
src/mcp_agent/cli/cloud/commands/utils.py (1)

46-51: Prefer Typer’s parameter error for invalid format.

This is a CLI option validation case; Typer’s BadParameter renders better UX and exit codes.

Apply:

+import typer
@@
-    if format not in valid_formats:
-        raise CLIError(
-            f"Invalid format '{format}'. Valid options are: {', '.join(valid_formats)}",
-            retriable=False
-        )
+    if format not in valid_formats:
+        raise typer.BadParameter(
+            f"Invalid format '{format}'. Valid options are: {', '.join(valid_formats)}"
+        )
src/mcp_agent/cli/utils/retry.py (4)

68-73: Validate inputs to avoid surprising RuntimeError when max_attempts < 1.

Guard early.

Apply:

     if retryable_check is None:
         retryable_check = is_retryable_error

+    if max_attempts < 1:
+        raise ValueError("max_attempts must be >= 1")
+    if initial_delay <= 0:
+        raise ValueError("initial_delay must be > 0")
+    if backoff_multiplier < 1.0:
+        raise ValueError("backoff_multiplier must be >= 1.0")
+
     last_exception = None
     delay = initial_delay

Repeat the same validation in the async variant.


85-86: Add light jitter to backoff to reduce thundering herd.

Small randomization helps when many clients retry simultaneously.

Apply (and add import random at top):

-            time.sleep(delay)
+            # Add jitter: 0.5x–1.5x
+            time.sleep(delay * (0.5 + random.random()))
             delay = min(delay * backoff_multiplier, max_delay)

Mirror this change in the async function (await asyncio.sleep(...)).


126-131: Async: add the same input validation as sync.

Apply:

     if retryable_check is None:
         retryable_check = is_retryable_error

+    if max_attempts < 1:
+        raise ValueError("max_attempts must be >= 1")
+    if initial_delay <= 0:
+        raise ValueError("initial_delay must be > 0")
+    if backoff_multiplier < 1.0:
+        raise ValueError("backoff_multiplier must be >= 1.0")
+
     last_exception = None
     delay = initial_delay

143-144: Async: apply jitter as in sync.

Apply:

-            await asyncio.sleep(delay)
+            await asyncio.sleep(delay * (0.5 + random.random()))
             delay = min(delay * backoff_multiplier, max_delay)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79b1e41 and a3ed6b7.

📒 Files selected for processing (12)
  • src/mcp_agent/cli/cloud/commands/app/status/main.py (2 hunks)
  • src/mcp_agent/cli/cloud/commands/auth/login/main.py (2 hunks)
  • src/mcp_agent/cli/cloud/commands/auth/whoami/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/utils.py (3 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/describe/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/list/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/resume/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/runs/main.py (1 hunks)
  • src/mcp_agent/cli/exceptions.py (1 hunks)
  • src/mcp_agent/cli/secrets/processor.py (3 hunks)
  • src/mcp_agent/cli/utils/retry.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • src/mcp_agent/cli/cloud/commands/workflows/describe/main.py
  • src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py
  • src/mcp_agent/cli/secrets/processor.py
  • src/mcp_agent/cli/cloud/commands/workflows/list/main.py
  • src/mcp_agent/cli/cloud/commands/auth/whoami/main.py
  • src/mcp_agent/cli/cloud/commands/auth/login/main.py
  • src/mcp_agent/cli/cloud/commands/workflows/resume/main.py
🧰 Additional context used
🧬 Code graph analysis (3)
src/mcp_agent/cli/cloud/commands/workflows/runs/main.py (1)
src/mcp_agent/cli/exceptions.py (1)
  • CLIError (4-10)
src/mcp_agent/cli/cloud/commands/utils.py (1)
src/mcp_agent/cli/exceptions.py (1)
  • CLIError (4-10)
src/mcp_agent/cli/utils/retry.py (3)
src/mcp_agent/cli/core/api_client.py (1)
  • UnauthenticatedError (9-12)
src/mcp_agent/cli/exceptions.py (1)
  • CLIError (4-10)
src/mcp_agent/cli/utils/ux.py (1)
  • print_warning (66-77)
🔇 Additional comments (8)
src/mcp_agent/cli/cloud/commands/workflows/runs/main.py (1)

49-51: Correctly marked as non‑retriable.

Not having an API key should fail fast; setting retriable=False is correct.

src/mcp_agent/cli/cloud/commands/app/status/main.py (2)

60-62: Fail‑fast on missing auth is correct.

retriable=False matches intended behavior.


96-98: Non‑retriable invalid key is correct.

Mapping UnauthenticatedError to CLIError(retriable=False) is consistent with the retry policy.

src/mcp_agent/cli/cloud/commands/utils.py (2)

31-33: Good: unauthenticated path is non‑retriable.


100-104: Good: invalid API key mapped to non‑retriable.

src/mcp_agent/cli/utils/retry.py (2)

15-18: RetryError looks good.

Carries context and a clear message.


30-36: Retryability predicate is sensible and composable.

Default rules (no retries for UnauthenticatedError, defer to CLIError.retriable, otherwise retry) align with PR goals.

src/mcp_agent/cli/exceptions.py (1)

7-10: Approve — repo scan clear.

Default-True retriable flag is sensible and backward-compatible. Repository scan found no instantiations passing a boolean as the 2nd positional argument and no CLIError/CLIException calls with 3+ positional args; usages are single-positional or use keyword args (examples: src/mcp_agent/cli/cloud/commands/auth/whoami/main.py, src/mcp_agent/cli/secrets/processor.py, src/mcp_agent/cli/cloud/commands/utils.py).

@jtcorbett jtcorbett changed the base branch from main to graphite-base/476 September 15, 2025 21:50
@jtcorbett jtcorbett force-pushed the 09-15-create_retry_utility_and_mark_certain_errors_as_non-retriable branch from a3ed6b7 to 17040ab Compare September 15, 2025 21:50
@jtcorbett jtcorbett changed the base branch from graphite-base/476 to update-uv September 15, 2025 21:50
@jtcorbett jtcorbett mentioned this pull request Sep 15, 2025
@jtcorbett jtcorbett changed the base branch from update-uv to graphite-base/476 September 15, 2025 21:59
@jtcorbett jtcorbett force-pushed the 09-15-create_retry_utility_and_mark_certain_errors_as_non-retriable branch from 17040ab to 5b8c790 Compare September 15, 2025 22:00
@jtcorbett jtcorbett changed the base branch from graphite-base/476 to main September 15, 2025 22:00
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/mcp_agent/cli/utils/retry.py (1)

88-94: Finalization logic looks correct (fix addresses prior bot comment)

Break/raise paths now re-raise non‑retriables and wrap retriable failures in RetryError. Good.

Also applies to: 146-152

🧹 Nitpick comments (5)
src/mcp_agent/cli/utils/retry.py (5)

21-36: Also treat cancellations as non‑retriable

Avoid retrying on asyncio cancellations to preserve cooperative cancel semantics.

Apply:

 def is_retryable_error(error: Exception) -> bool:
@@
-    if isinstance(error, UnauthenticatedError):
+    if isinstance(error, UnauthenticatedError):
         return False
+
+    # Do not retry cancellations
+    if isinstance(error, asyncio.CancelledError):
+        return False

39-48: Stronger typing for callables (ParamSpec/TypeVar) and async signature

Improves IDE/type-checker help; keeps return types precise.

Apply:

-import asyncio
-import time
-from typing import Any, Callable, Optional
+import asyncio
+import time
+from typing import Any, Callable, Optional, TypeVar, ParamSpec, Awaitable
@@
-def retry_with_exponential_backoff(
-    func: Callable,
+P = ParamSpec("P")
+R = TypeVar("R")
+
+def retry_with_exponential_backoff(
+    func: Callable[P, R],
@@
-    *args,
-    **kwargs
-) -> Any:
+    *args: P.args,
+    **kwargs: P.kwargs,
+) -> R:
@@
-async def retry_async_with_exponential_backoff(
-    func: Callable,
+async def retry_async_with_exponential_backoff(
+    func: Callable[P, Awaitable[R]],
@@
-    *args,
-    **kwargs
-) -> Any:
+    *args: P.args,
+    **kwargs: P.kwargs,
+) -> R:

Also applies to: 97-106, 3-5


68-73: Validate retry parameters early

Prevents silent misconfiguration (e.g., negative delays, zero attempts).

Apply:

     if retryable_check is None:
         retryable_check = is_retryable_error
 
     last_exception = None
     delay = initial_delay
+
+    if max_attempts < 1:
+        raise ValueError("max_attempts must be >= 1")
+    if initial_delay < 0:
+        raise ValueError("initial_delay must be >= 0")
+    if backoff_multiplier < 1.0:
+        raise ValueError("backoff_multiplier must be >= 1.0")
+    if max_delay < 0:
+        raise ValueError("max_delay must be >= 0")
+    if max_delay < delay:
+        delay = max_delay

83-87: Add jitter to backoff to avoid sync retries

Reduces herd effects when many clients retry simultaneously.

Apply:

-import asyncio
+import asyncio
+import random
@@
-            print_warning(f"Attempt {attempt}/{max_attempts} failed: {e}. Retrying in {delay:.1f}s...")
-
-            time.sleep(delay)
+            sleep_for = random.uniform(0, delay)
+            print_warning(f"Attempt {attempt}/{max_attempts} failed: {e}. Retrying in {sleep_for:.1f}s...")
+            time.sleep(sleep_for)
             delay = min(delay * backoff_multiplier, max_delay)
@@
-            print_warning(f"Attempt {attempt}/{max_attempts} failed: {e}. Retrying in {delay:.1f}s...")
-
-            await asyncio.sleep(delay)
+            sleep_for = random.uniform(0, delay)
+            print_warning(f"Attempt {attempt}/{max_attempts} failed: {e}. Retrying in {sleep_for:.1f}s...")
+            await asyncio.sleep(sleep_for)
             delay = min(delay * backoff_multiplier, max_delay)

Also applies to: 141-145, 3-4


64-67: Clarify Raises docstring to match behavior

Currently RetryError is only raised when max_attempts > 1. Reflect that to avoid confusion.

Apply:

-        RetryError: If all attempts fail with a retryable error
+        RetryError: If attempts > 1 and all attempts fail with a retryable error
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3ed6b7 and 5b8c790.

📒 Files selected for processing (12)
  • src/mcp_agent/cli/cloud/commands/app/status/main.py (2 hunks)
  • src/mcp_agent/cli/cloud/commands/auth/login/main.py (2 hunks)
  • src/mcp_agent/cli/cloud/commands/auth/whoami/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/utils.py (3 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/describe/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/list/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/resume/main.py (1 hunks)
  • src/mcp_agent/cli/cloud/commands/workflows/runs/main.py (1 hunks)
  • src/mcp_agent/cli/exceptions.py (1 hunks)
  • src/mcp_agent/cli/secrets/processor.py (3 hunks)
  • src/mcp_agent/cli/utils/retry.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
  • src/mcp_agent/cli/cloud/commands/workflows/describe/main.py
  • src/mcp_agent/cli/cloud/commands/workflows/list/main.py
  • src/mcp_agent/cli/exceptions.py
  • src/mcp_agent/cli/cloud/commands/auth/whoami/main.py
  • src/mcp_agent/cli/cloud/commands/workflows/runs/main.py
  • src/mcp_agent/cli/secrets/processor.py
  • src/mcp_agent/cli/cloud/commands/auth/login/main.py
  • src/mcp_agent/cli/cloud/commands/workflows/cancel/main.py
  • src/mcp_agent/cli/cloud/commands/utils.py
  • src/mcp_agent/cli/cloud/commands/workflows/resume/main.py
  • src/mcp_agent/cli/cloud/commands/app/status/main.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/mcp_agent/cli/utils/retry.py (4)
src/mcp_agent/cli/core/api_client.py (1)
  • UnauthenticatedError (9-12)
src/mcp_agent/cli/exceptions.py (1)
  • CLIError (4-10)
src/mcp_agent/cli/utils/ux.py (1)
  • print_warning (66-77)
src/mcp_agent/logging/logger.py (1)
  • error (291-299)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks / test
🔇 Additional comments (1)
src/mcp_agent/cli/utils/retry.py (1)

12-19: Well-scoped RetryError with cause retention — LGTM

Carries original error and attempts; preserves cause with "from". Good.

Copy link
Contributor Author

jtcorbett commented Sep 15, 2025

Merge activity

  • Sep 15, 10:43 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Sep 15, 10:43 PM UTC: @jtcorbett merged this pull request with Graphite.

@jtcorbett jtcorbett merged commit fe2567c into main Sep 15, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants