-
Notifications
You must be signed in to change notification settings - Fork 112
Differentiate retryable and non-retryable indexing errors #220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Differentiate retryable and non-retryable indexing errors #220
Conversation
📝 WalkthroughWalkthroughRepo indexing now returns consistent, structured failure responses: HTTP 429, 5xx, other non-200 responses, ValueError, aiohttp.ClientError, and unexpected exceptions are mapped to explicit error_code and retryable flags; error text is truncated and stored in the repo record. (49 words) Changes
Sequence Diagram(s)(omitted — changes are focused on error mapping/handling within a single service function and do not introduce a multi-component sequential flow that requires visualization.) Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/services/codegraph/repo_service.py (1)
153-169: Consider handling 429 (rate limiting) as retryable.The current logic treats all non-5xx failures as non-retryable, but HTTP 429 (Too Many Requests) indicates rate limiting and is inherently retryable after a backoff period. Classifying it as
INVALID_REQUESTwithretryable: falsecould cause permanent failures for transient conditions.🐛 Suggested fix
+ elif response.status == 429: + error_text = await response.text() + error_msg = error_text[:500] + + await self.supabase.table("indexed_repositories").update({ + "indexing_status": "failed", + "last_error": error_msg, + "updated_at": datetime.now().isoformat() + }).eq("repository_full_name", repo_info['full_name']).eq( + "is_deleted", False + ).execute() + + return { + "status": "error", + "error_code": "RATE_LIMITED", + "retryable": True, + "message": "Rate limit exceeded. Please try again later." + } else: error_text = await response.text()
🧹 Nitpick comments (2)
backend/app/services/codegraph/repo_service.py (2)
140-145: Missingupdated_atfield in database update.The update at line 83 includes
updated_atwhen changing status to"pending", but this update (and the one at lines 157-162) omits it when marking as"failed". This inconsistency causes stale timestamps.♻️ Suggested fix
await self.supabase.table("indexed_repositories").update({ "indexing_status": "failed", - "last_error": error_msg + "last_error": error_msg, + "updated_at": datetime.now().isoformat() }).eq("repository_full_name", repo_info['full_name']).eq( "is_deleted", False ).execute()Apply the same fix to lines 157-162.
177-184: Correct retryable classification; minor logging cleanup suggested.Network errors are correctly marked as retryable. Per static analysis (Ruff TRY401):
logging.exception()already captures the exception, so including{e}in the message is redundant.♻️ Optional cleanup
except aiohttp.ClientError as e: - logger.exception(f"Network error indexing {repo_input}: {e}") + logger.exception(f"Network error indexing {repo_input}") return {
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/services/codegraph/repo_service.py
🧰 Additional context used
🪛 Ruff (0.14.10)
backend/app/services/codegraph/repo_service.py
178-178: Redundant exception object included in logging.exception call
(TRY401)
🔇 Additional comments (2)
backend/app/services/codegraph/repo_service.py (2)
170-176: LGTM!Correct classification—invalid repository format errors are user input issues and appropriately marked as non-retryable.
185-192: LGTM!Defaulting unknown exceptions to non-retryable is a safe, conservative choice that prevents infinite retry loops for persistent failures.
|
Yeah, you're right. 429 should indeed be treated as a retryable rate-limiting scenario. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/services/codegraph/repo_service.py (1)
117-117: Critical bug: Response body consumed twice.The ternary expression first calls
await response.text()(which consumes the response body), then conditionally callsawait response.json()(which requires the body to still be available). In aiohttp, a response body can only be read once, so this would fail when trying to parse JSON.🐛 Proposed fix
- data = await response.json() if await response.text() else {} + text = await response.text() + data = await response.json() if text else {}Or better yet, since you're expecting JSON:
- data = await response.json() if await response.text() else {} + try: + data = await response.json() + except (aiohttp.ContentTypeError, ValueError): + data = {}
🧹 Nitpick comments (2)
backend/app/services/codegraph/repo_service.py (2)
194-201: LGTM with optional logging improvement.The ClientError handling correctly marks network failures as retryable and provides appropriate user feedback.
♻️ Optional: Remove redundant exception from log message
The
logger.exception()method automatically includes exception details, so explicitly including{e}in the message is redundant (as flagged by static analysis).- logger.exception(f"Network error indexing {repo_input}: {e}") + logger.exception(f"Network error indexing {repo_input}")
136-169: Optional: Consider reducing duplication between error handlers.The 429 and 5xx error handling blocks follow an identical pattern (extract error text, update DB, return structured response). While the current implementation is clear and explicit, you could optionally reduce duplication with a helper method.
♻️ Optional refactoring approach
async def _handle_indexing_error( self, response, repo_full_name: str, error_code: str, retryable: bool, message: str ) -> Dict[str, Any]: """Helper to handle indexing errors consistently.""" error_text = await response.text() error_msg = error_text[:500] await self.supabase.table("indexed_repositories").update({ "indexing_status": "failed", "last_error": error_msg }).eq("repository_full_name", repo_full_name).eq( "is_deleted", False ).execute() return { "status": "error", "error_code": error_code, "retryable": retryable, "message": message }Then use it like:
elif response.status == 429: return await self._handle_indexing_error( response, repo_info['full_name'], "RATE_LIMITED", True, "Rate limit exceeded. Please try again later." ) elif 500 <= response.status < 600: return await self._handle_indexing_error( response, repo_info['full_name'], "BACKEND_ERROR", True, "Indexing service encountered an internal error. Please try again later." )
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/services/codegraph/repo_service.py
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/services/codegraph/repo_service.py (2)
frontend/src/lib/supabaseClient.ts (1)
supabase(12-12)backend/app/agents/devrel/github/github_toolkit.py (1)
execute(101-141)
🪛 Ruff (0.14.10)
backend/app/services/codegraph/repo_service.py
195-195: Redundant exception object included in logging.exception call
(TRY401)
🔇 Additional comments (5)
backend/app/services/codegraph/repo_service.py (5)
136-152: LGTM! Proper handling for rate limits.The 429 handling correctly marks the error as retryable, truncates the error message for storage, and provides a clear user-facing message. The database status update ensures failed attempts are tracked.
153-169: LGTM! Appropriate handling for server errors.The 5xx handling correctly marks server errors as retryable and provides appropriate feedback. The implementation is consistent with the rate limit handling.
170-186: LGTM! Improved handling for invalid requests.The error response now correctly extracts and stores the error text, and provides a structured response with
retryable=False, which is appropriate for client errors (4xx status codes).
187-193: LGTM! Clear handling for invalid repository formats.The ValueError handling correctly marks parsing failures as non-retryable and includes the specific error message to help users correct their input.
202-209: LGTM! Safe fallback for unexpected errors.The general exception handler provides a safe default by marking unexpected failures as non-retryable, which is appropriate since these errors would typically require investigation rather than automatic retry.
|
Good catch — fixed the response body consumption by safely parsing JSON without reading the body twice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @backend/app/services/codegraph/repo_service.py:
- Around line 191-196: The ValueError handler that returns the error dict must
also mark the repository DB record out of "pending" so retries aren't blocked;
before returning in the except ValueError block, call the same repository status
update routine used earlier to set status to a terminal state (e.g., "failed" or
"error") using repo_info when available or falling back to repo_input (or
parsing owner/repo from repo_input) so the DB is updated even when parsing
fails, then return the error dict.
🧹 Nitpick comments (3)
backend/app/services/codegraph/repo_service.py (3)
156-172: LGTM: Correct handling of server errors.Server errors are appropriately marked as retryable. The database update and error response structure are consistent with other error paths.
Minor: Remove trailing whitespace
Line 159 appears to have trailing whitespace.
- error_msg = error_text[:500] - + error_msg = error_text[:500] +
198-198: Optional: Remove redundant exception object from logging.exception.The
logging.exceptionmethod automatically includes exception details, making{e}redundant in the log message.♻️ Proposed refactor
- logger.exception(f"Network error indexing {repo_input}: {e}") + logger.exception(f"Network error indexing {repo_input}")Based on static analysis hints.
139-189: Optional: Extract repeated error text extraction.The pattern of extracting and truncating error text is repeated three times. Consider extracting it to reduce duplication.
♻️ Proposed refactor
Add a helper method near the top of the class:
async def _extract_error_text(self, response: aiohttp.ClientResponse, max_length: int = 500) -> str: """Extract and truncate error text from response.""" error_text = await response.text() return error_text[:max_length]Then use it in the error handling blocks:
elif response.status == 429: - error_text = await response.text() - error_msg = error_text[:500] + error_msg = await self._extract_error_text(response) elif 500 <= response.status < 600: - error_text = await response.text() - error_msg = error_text[:500] + error_msg = await self._extract_error_text(response) else: - error_text = await response.text() - error_msg = error_text[:500] + error_msg = await self._extract_error_text(response)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/services/codegraph/repo_service.py
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/services/codegraph/repo_service.py (2)
frontend/src/lib/supabaseClient.ts (1)
supabase(12-12)backend/app/agents/devrel/github/github_toolkit.py (1)
execute(101-141)
🪛 Ruff (0.14.10)
backend/app/services/codegraph/repo_service.py
198-198: Redundant exception object included in logging.exception call
(TRY401)
🔇 Additional comments (3)
backend/app/services/codegraph/repo_service.py (3)
117-120: LGTM: Resilient JSON parsing.The fallback to an empty dict when JSON parsing fails is appropriate and allows the indexing to complete successfully even if the backend returns malformed JSON.
139-155: LGTM: Appropriate rate limiting handling.The 429 response is correctly marked as retryable, and the database is properly updated with the failed status and error message.
173-189: LGTM: Appropriate handling of client errors.Non-retryable client errors (4xx) are correctly distinguished from retryable server errors. The database update and error response are consistent.
| return { | ||
| "status": "error", | ||
| "error_code": "INVALID_REPOSITORY", | ||
| "retryable": False, | ||
| "message": str(e) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Database status not updated on ValueError.
If a database record was created or updated to "pending" status (lines 80-94) before the ValueError occurs, it will remain in "pending" state permanently. This breaks retry functionality because lines 71-75 will block subsequent attempts with "indexing in progress."
🐛 Proposed fix: Update database before returning
except ValueError as e:
+ # Update database if record exists
+ try:
+ await self.supabase.table("indexed_repositories").update({
+ "indexing_status": "failed",
+ "last_error": str(e)[:500]
+ }).eq("repository_full_name", repo_info.get('full_name', repo_input)).eq(
+ "is_deleted", False
+ ).execute()
+ except Exception:
+ logger.exception("Failed to update database after ValueError")
+
return {
"status": "error",
"error_code": "INVALID_REPOSITORY",
"retryable": False,
"message": str(e)
}Note: repo_info might not be available if parsing failed, so use repo_input as fallback or extract owner/repo before the try-except if possible.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @backend/app/services/codegraph/repo_service.py around lines 191 - 196, The
ValueError handler that returns the error dict must also mark the repository DB
record out of "pending" so retries aren't blocked; before returning in the
except ValueError block, call the same repository status update routine used
earlier to set status to a terminal state (e.g., "failed" or "error") using
repo_info when available or falling back to repo_input (or parsing owner/repo
from repo_input) so the DB is updated even when parsing fails, then return the
error dict.
| return { | ||
| "status": "error", | ||
| "error_code": "BACKEND_UNAVAILABLE", | ||
| "retryable": True, | ||
| "message": "Indexing service is currently unavailable. Please try again later." | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Database status not updated on ClientError.
Similar to the ValueError case, if a database record was created or updated to "pending" status (lines 80-94) before the ClientError occurs, it will remain in "pending" state permanently, blocking retry attempts.
🐛 Proposed fix: Update database before returning
except aiohttp.ClientError as e:
logger.exception(f"Network error indexing {repo_input}: {e}")
+
+ # Update database if record exists
+ try:
+ error_msg = f"Network error: {str(e)}"[:500]
+ await self.supabase.table("indexed_repositories").update({
+ "indexing_status": "failed",
+ "last_error": error_msg
+ }).eq("repository_full_name", repo_info.get('full_name', repo_input)).eq(
+ "is_deleted", False
+ ).execute()
+ except Exception:
+ logger.exception("Failed to update database after ClientError")
+
return {
"status": "error",
"error_code": "BACKEND_UNAVAILABLE",
"retryable": True,
"message": "Indexing service is currently unavailable. Please try again later."
}Committable suggestion skipped: line range outside the PR's diff.
| return { | ||
| "status": "error", | ||
| "error_code": "INDEXING_FAILED", | ||
| "retryable": False, | ||
| "message": "Indexing failed due to an unexpected error." | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Database status not updated on general Exception.
If a database record was created or updated to "pending" status (lines 80-94) before an unexpected exception occurs, it will remain in "pending" state permanently, blocking retry attempts.
🐛 Proposed fix: Update database before returning
except Exception:
logger.exception(f"Failed to index {repo_input}")
+
+ # Update database if record exists
+ try:
+ await self.supabase.table("indexed_repositories").update({
+ "indexing_status": "failed",
+ "last_error": "Indexing failed due to an unexpected error."
+ }).eq("repository_full_name", repo_info.get('full_name', repo_input)).eq(
+ "is_deleted", False
+ ).execute()
+ except Exception:
+ logger.exception("Failed to update database after unexpected exception")
+
return {
"status": "error",
"error_code": "INDEXING_FAILED",
"retryable": False,
"message": "Indexing failed due to an unexpected error."
}Committable suggestion skipped: line range outside the PR's diff.
|
Good catch — this is a valid issue. The pending-state inconsistency on exceptions pre-existed and affects broader state-management logic in Happy to work on that separately. |
|
Hey @Aditya30ag @smokeyScraper During review, a pre-existing critical bug was also surfaced, which I fixed: the response body was being consumed twice ( CodeRabbit also flagged a broader issue where the DB state may remain You can review the changes. |
|
@the-asmit can you please solve the Coderabbit ai suggestions |
|
Hey @Aditya30ag, thanks for the suggestion — I understand the concern.
The remaining CodeRabbit suggestion regarding updating DB state on all exception paths affects broader indexing state-management logic (e.g. handling of I’m happy to take that up in a follow-up PR. |
📝 Description
Currently, network and backend errors during repository indexing are returned as generic errors, preventing agents from distinguishing retryable failures from permanent ones.
This PR improves error signalling during repository indexing by making retryability explicit, while keeping the existing indexing flow unchanged.
🔧 Changes Made
error_codevalues for common failure scenariosretryableflag in error responses to enable agent-side decision-making📷 Screenshots or Visual Changes (if applicable)
N/A — backend-only changes.
✅ Checklist
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.