Differentiate retryable and non-retryable indexing errors #220

the-asmit · 2026-01-07T17:55:20Z

📝 Description

Currently, network and backend errors during repository indexing are returned as generic errors, preventing agents from distinguishing retryable failures from permanent ones.

This PR improves error signalling during repository indexing by making retryability explicit, while keeping the existing indexing flow unchanged.

🔧 Changes Made

Distinguished retryable and non-retryable failures during repository indexing
Added explicit error_code values for common failure scenarios
Included a retryable flag in error responses to enable agent-side decision-making
Preserved existing indexing logic and database updates
Fixed a pre-existing bug where the response body could be consumed twice when parsing JSON

📷 Screenshots or Visual Changes (if applicable)

N/A — backend-only changes.

✅ Checklist

I have read the contributing guidelines.
I have added tests that prove my fix is effective or that my feature works.
I have added necessary documentation (if applicable).
Any dependent changes have been merged and published in downstream modules.

Summary by CodeRabbit

Bug Fixes
- More consistent, structured error reporting during repository indexing with explicit error codes and retryable flags.
- Better handling of rate limits and backend failures (recoverable cases marked retryable).
- Resilient JSON parsing with safe fallback to avoid crashes on malformed payloads.
- Clearer distinction between recoverable and non-recoverable indexing failures and improved, truncated error messages stored for diagnostics.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-07T17:55:33Z

📝 Walkthrough

Walkthrough

Repo indexing now returns consistent, structured failure responses: HTTP 429, 5xx, other non-200 responses, ValueError, aiohttp.ClientError, and unexpected exceptions are mapped to explicit error_code and retryable flags; error text is truncated and stored in the repo record. (49 words)

Changes

Cohort / File(s)	Summary
Error handling updates `backend/app/services/codegraph/repo_service.py`	Added resilient JSON parsing on 200; added handling for 429 (RATE_LIMITED, retryable), 5xx (BACKEND_ERROR, retryable), other non-200 (INVALID_REQUEST, non-retryable); mapped `ValueError` → INVALID_REPOSITORY (non-retryable); `aiohttp.ClientError` → BACKEND_UNAVAILABLE (retryable); generic exceptions → INDEXING_FAILED (non-retryable). Error text is truncated (500 chars) and saved to DB.

Sequence Diagram(s)

(omitted — changes are focused on error mapping/handling within a single service function and do not introduce a multi-component sequential flow that requires visualization.)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇
I hopped through bytes and named each fall,
Gave every error a code and call,
Truncated tales tucked in a nest,
Retry flags set for those that rest,
A quiet burrow for logs, blessed. 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Differentiate retryable and non-retryable indexing errors' accurately summarizes the main change: adding explicit error_code and retryable flag to distinguish between retryable and non-retryable failures during repository indexing.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/services/codegraph/repo_service.py (1)

153-169: Consider handling 429 (rate limiting) as retryable.

The current logic treats all non-5xx failures as non-retryable, but HTTP 429 (Too Many Requests) indicates rate limiting and is inherently retryable after a backoff period. Classifying it as INVALID_REQUEST with retryable: false could cause permanent failures for transient conditions.

🐛 Suggested fix

+                    elif response.status == 429:
+                        error_text = await response.text()
+                        error_msg = error_text[:500]
+
+                        await self.supabase.table("indexed_repositories").update({
+                            "indexing_status": "failed",
+                            "last_error": error_msg,
+                            "updated_at": datetime.now().isoformat()
+                        }).eq("repository_full_name", repo_info['full_name']).eq(
+                            "is_deleted", False
+                        ).execute()
+
+                        return {
+                            "status": "error",
+                            "error_code": "RATE_LIMITED",
+                            "retryable": True,
+                            "message": "Rate limit exceeded. Please try again later."
+                        }
                     else:
                         error_text = await response.text()

🧹 Nitpick comments (2)

backend/app/services/codegraph/repo_service.py (2)
140-145: Missing updated_at field in database update.

The update at line 83 includes updated_at when changing status to "pending", but this update (and the one at lines 157-162) omits it when marking as "failed". This inconsistency causes stale timestamps.
♻️ Suggested fix
                         await self.supabase.table("indexed_repositories").update({
                             "indexing_status": "failed",
-                            "last_error": error_msg
+                            "last_error": error_msg,
+                            "updated_at": datetime.now().isoformat()
                         }).eq("repository_full_name", repo_info['full_name']).eq(
                             "is_deleted", False
                         ).execute()
Apply the same fix to lines 157-162.
177-184: Correct retryable classification; minor logging cleanup suggested.

Network errors are correctly marked as retryable. Per static analysis (Ruff TRY401): logging.exception() already captures the exception, so including {e} in the message is redundant.
♻️ Optional cleanup
         except aiohttp.ClientError as e:
-            logger.exception(f"Network error indexing {repo_input}: {e}")
+            logger.exception(f"Network error indexing {repo_input}")
             return {

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa66519 and ff7320f.

📒 Files selected for processing (1)

backend/app/services/codegraph/repo_service.py

🧰 Additional context used

🪛 Ruff (0.14.10)

backend/app/services/codegraph/repo_service.py

178-178: Redundant exception object included in logging.exception call

(TRY401)

🔇 Additional comments (2)

backend/app/services/codegraph/repo_service.py (2)

170-176: LGTM!

Correct classification—invalid repository format errors are user input issues and appropriately marked as non-retryable.

185-192: LGTM!

Defaulting unknown exceptions to non-retryable is a safe, conservative choice that prevents infinite retry loops for persistent failures.

the-asmit · 2026-01-07T18:14:21Z

Yeah, you're right. 429 should indeed be treated as a retryable rate-limiting scenario.
I've added explicit handling for HTTP 429 with a retryable error signal.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/services/codegraph/repo_service.py (1)
117-117: Critical bug: Response body consumed twice.

The ternary expression first calls await response.text() (which consumes the response body), then conditionally calls await response.json() (which requires the body to still be available). In aiohttp, a response body can only be read once, so this would fail when trying to parse JSON.
🐛 Proposed fix
-                    data = await response.json() if await response.text() else {}
+                    text = await response.text()
+                    data = await response.json() if text else {}
Or better yet, since you're expecting JSON:
-                    data = await response.json() if await response.text() else {}
+                    try:
+                        data = await response.json()
+                    except (aiohttp.ContentTypeError, ValueError):
+                        data = {}

🧹 Nitpick comments (2)

backend/app/services/codegraph/repo_service.py (2)

194-201: LGTM with optional logging improvement.

The ClientError handling correctly marks network failures as retryable and provides appropriate user feedback.

♻️ Optional: Remove redundant exception from log message

The logger.exception() method automatically includes exception details, so explicitly including {e} in the message is redundant (as flagged by static analysis).

-            logger.exception(f"Network error indexing {repo_input}: {e}")
+            logger.exception(f"Network error indexing {repo_input}")

136-169: Optional: Consider reducing duplication between error handlers.

The 429 and 5xx error handling blocks follow an identical pattern (extract error text, update DB, return structured response). While the current implementation is clear and explicit, you could optionally reduce duplication with a helper method.

♻️ Optional refactoring approach

async def _handle_indexing_error(
    self, 
    response, 
    repo_full_name: str, 
    error_code: str, 
    retryable: bool, 
    message: str
) -> Dict[str, Any]:
    """Helper to handle indexing errors consistently."""
    error_text = await response.text()
    error_msg = error_text[:500]
    
    await self.supabase.table("indexed_repositories").update({
        "indexing_status": "failed",
        "last_error": error_msg
    }).eq("repository_full_name", repo_full_name).eq(
        "is_deleted", False
    ).execute()
    
    return {
        "status": "error",
        "error_code": error_code,
        "retryable": retryable,
        "message": message
    }

Then use it like:

elif response.status == 429:
    return await self._handle_indexing_error(
        response, repo_info['full_name'],
        "RATE_LIMITED", True,
        "Rate limit exceeded. Please try again later."
    )
elif 500 <= response.status < 600:
    return await self._handle_indexing_error(
        response, repo_info['full_name'],
        "BACKEND_ERROR", True,
        "Indexing service encountered an internal error. Please try again later."
    )

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff7320f and 9893bd8.

📒 Files selected for processing (1)

backend/app/services/codegraph/repo_service.py

🧰 Additional context used

🧬 Code graph analysis (1)

backend/app/services/codegraph/repo_service.py (2)

frontend/src/lib/supabaseClient.ts (1)

supabase (12-12)

backend/app/agents/devrel/github/github_toolkit.py (1)

execute (101-141)

🪛 Ruff (0.14.10)

backend/app/services/codegraph/repo_service.py

195-195: Redundant exception object included in logging.exception call

(TRY401)

🔇 Additional comments (5)

backend/app/services/codegraph/repo_service.py (5)

136-152: LGTM! Proper handling for rate limits.

The 429 handling correctly marks the error as retryable, truncates the error message for storage, and provides a clear user-facing message. The database status update ensures failed attempts are tracked.

153-169: LGTM! Appropriate handling for server errors.

The 5xx handling correctly marks server errors as retryable and provides appropriate feedback. The implementation is consistent with the rate limit handling.

170-186: LGTM! Improved handling for invalid requests.

The error response now correctly extracts and stores the error text, and provides a structured response with retryable=False, which is appropriate for client errors (4xx status codes).

187-193: LGTM! Clear handling for invalid repository formats.

The ValueError handling correctly marks parsing failures as non-retryable and includes the specific error message to help users correct their input.

202-209: LGTM! Safe fallback for unexpected errors.

The general exception handler provides a safe default by marking unexpected failures as non-retryable, which is appropriate since these errors would typically require investigation rather than automatic retry.

the-asmit · 2026-01-07T18:22:37Z

Good catch — fixed the response body consumption by safely parsing JSON without reading the body twice.

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In @backend/app/services/codegraph/repo_service.py:
- Around line 191-196: The ValueError handler that returns the error dict must
also mark the repository DB record out of "pending" so retries aren't blocked;
before returning in the except ValueError block, call the same repository status
update routine used earlier to set status to a terminal state (e.g., "failed" or
"error") using repo_info when available or falling back to repo_input (or
parsing owner/repo from repo_input) so the DB is updated even when parsing
fails, then return the error dict.

🧹 Nitpick comments (3)

backend/app/services/codegraph/repo_service.py (3)

156-172: LGTM: Correct handling of server errors.

Server errors are appropriately marked as retryable. The database update and error response structure are consistent with other error paths.

Minor: Remove trailing whitespace

Line 159 appears to have trailing whitespace.

-                        error_msg = error_text[:500]
-                    
+                        error_msg = error_text[:500]
+

198-198: Optional: Remove redundant exception object from logging.exception.

The logging.exception method automatically includes exception details, making {e} redundant in the log message.

♻️ Proposed refactor

-            logger.exception(f"Network error indexing {repo_input}: {e}")
+            logger.exception(f"Network error indexing {repo_input}")

Based on static analysis hints.

139-189: Optional: Extract repeated error text extraction.

The pattern of extracting and truncating error text is repeated three times. Consider extracting it to reduce duplication.

♻️ Proposed refactor

Add a helper method near the top of the class:

async def _extract_error_text(self, response: aiohttp.ClientResponse, max_length: int = 500) -> str:
    """Extract and truncate error text from response."""
    error_text = await response.text()
    return error_text[:max_length]

Then use it in the error handling blocks:

                     elif response.status == 429:
-                        error_text = await response.text()
-                        error_msg = error_text[:500]
+                        error_msg = await self._extract_error_text(response)

                     elif 500 <= response.status < 600:
-                        error_text = await response.text()
-                        error_msg = error_text[:500]
+                        error_msg = await self._extract_error_text(response)
                     
                     else:
-                        error_text = await response.text()
-                        error_msg = error_text[:500]
+                        error_msg = await self._extract_error_text(response)

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9893bd8 and 0ef76da.

📒 Files selected for processing (1)

backend/app/services/codegraph/repo_service.py

🧰 Additional context used

🧬 Code graph analysis (1)

backend/app/services/codegraph/repo_service.py (2)

frontend/src/lib/supabaseClient.ts (1)

supabase (12-12)

backend/app/agents/devrel/github/github_toolkit.py (1)

execute (101-141)

🪛 Ruff (0.14.10)

backend/app/services/codegraph/repo_service.py

198-198: Redundant exception object included in logging.exception call

(TRY401)

🔇 Additional comments (3)

backend/app/services/codegraph/repo_service.py (3)

117-120: LGTM: Resilient JSON parsing.

The fallback to an empty dict when JSON parsing fails is appropriate and allows the indexing to complete successfully even if the backend returns malformed JSON.

139-155: LGTM: Appropriate rate limiting handling.

The 429 response is correctly marked as retryable, and the database is properly updated with the failed status and error message.

173-189: LGTM: Appropriate handling of client errors.

Non-retryable client errors (4xx) are correctly distinguished from retryable server errors. The database update and error response are consistent.

coderabbitai · 2026-01-07T18:26:18Z

backend/app/services/codegraph/repo_service.py

+            return {
+                "status": "error",
+                "error_code": "INVALID_REPOSITORY",
+                "retryable": False,
+                "message": str(e)
+            }


⚠️ Potential issue | 🔴 Critical

Critical: Database status not updated on ValueError.

If a database record was created or updated to "pending" status (lines 80-94) before the ValueError occurs, it will remain in "pending" state permanently. This breaks retry functionality because lines 71-75 will block subsequent attempts with "indexing in progress."

🐛 Proposed fix: Update database before returning

except ValueError as e: + # Update database if record exists + try: + await self.supabase.table("indexed_repositories").update({ + "indexing_status": "failed", + "last_error": str(e)[:500] + }).eq("repository_full_name", repo_info.get('full_name', repo_input)).eq( + "is_deleted", False + ).execute() + except Exception: + logger.exception("Failed to update database after ValueError") + return { "status": "error", "error_code": "INVALID_REPOSITORY", "retryable": False, "message": str(e) }

Note: repo_info might not be available if parsing failed, so use repo_input as fallback or extract owner/repo before the try-except if possible.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In @backend/app/services/codegraph/repo_service.py around lines 191 - 196, The ValueError handler that returns the error dict must also mark the repository DB record out of "pending" so retries aren't blocked; before returning in the except ValueError block, call the same repository status update routine used earlier to set status to a terminal state (e.g., "failed" or "error") using repo_info when available or falling back to repo_input (or parsing owner/repo from repo_input) so the DB is updated even when parsing fails, then return the error dict.

coderabbitai · 2026-01-07T18:26:18Z

backend/app/services/codegraph/repo_service.py

+            return {
+                "status": "error",
+                "error_code": "BACKEND_UNAVAILABLE",
+                "retryable": True,
+                "message": "Indexing service is currently unavailable. Please try again later."
+            }


⚠️ Potential issue | 🔴 Critical

Critical: Database status not updated on ClientError.

Similar to the ValueError case, if a database record was created or updated to "pending" status (lines 80-94) before the ClientError occurs, it will remain in "pending" state permanently, blocking retry attempts.

🐛 Proposed fix: Update database before returning

except aiohttp.ClientError as e: logger.exception(f"Network error indexing {repo_input}: {e}") + + # Update database if record exists + try: + error_msg = f"Network error: {str(e)}"[:500] + await self.supabase.table("indexed_repositories").update({ + "indexing_status": "failed", + "last_error": error_msg + }).eq("repository_full_name", repo_info.get('full_name', repo_input)).eq( + "is_deleted", False + ).execute() + except Exception: + logger.exception("Failed to update database after ClientError") + return { "status": "error", "error_code": "BACKEND_UNAVAILABLE", "retryable": True, "message": "Indexing service is currently unavailable. Please try again later." }

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2026-01-07T18:26:18Z

backend/app/services/codegraph/repo_service.py

+            return {
+                "status": "error",
+                "error_code": "INDEXING_FAILED",
+                "retryable": False,
+                "message": "Indexing failed due to an unexpected error."
+            }


⚠️ Potential issue | 🔴 Critical

Critical: Database status not updated on general Exception.

If a database record was created or updated to "pending" status (lines 80-94) before an unexpected exception occurs, it will remain in "pending" state permanently, blocking retry attempts.

🐛 Proposed fix: Update database before returning

except Exception: logger.exception(f"Failed to index {repo_input}") + + # Update database if record exists + try: + await self.supabase.table("indexed_repositories").update({ + "indexing_status": "failed", + "last_error": "Indexing failed due to an unexpected error." + }).eq("repository_full_name", repo_info.get('full_name', repo_input)).eq( + "is_deleted", False + ).execute() + except Exception: + logger.exception("Failed to update database after unexpected exception") + return { "status": "error", "error_code": "INDEXING_FAILED", "retryable": False, "message": "Indexing failed due to an unexpected error." }

Committable suggestion skipped: line range outside the PR's diff.

the-asmit · 2026-01-07T18:32:12Z

Good catch — this is a valid issue.

The pending-state inconsistency on exceptions pre-existed and affects broader state-management logic in index_repo. Since this PR is focused on clarifying retryable vs non-retryable error signaling, I’d prefer to address this in a follow-up PR (or tracked issue) to keep the scope contained and reviewable.

Happy to work on that separately.

the-asmit · 2026-01-07T18:50:02Z

Hey @Aditya30ag @smokeyScraper
This PR focuses on clarifying retryable vs permanent failure modes during repository indexing (including explicit handling for HTTP 429).

During review, a pre-existing critical bug was also surfaced, which I fixed: the response body was being consumed twice (response.text() followed by response.json()), which can break JSON parsing in aiohttp. That fix is now included.

CodeRabbit also flagged a broader issue where the DB state may remain pending on certain exceptions. This behavior pre-existed and affects core state-management logic, so I’ve intentionally not expanded the scope here. Happy to address that in a follow-up PR.

You can review the changes.

Aditya30ag · 2026-01-08T13:19:46Z

@the-asmit can you please solve the Coderabbit ai suggestions

the-asmit · 2026-01-08T14:05:17Z

Hey @Aditya30ag, thanks for the suggestion — I understand the concern.
This PR currently addresses:

Clear differentiation between retryable and permanent indexing failures (including explicit handling for HTTP 429), and
Fixes a pre-existing bug where the response body could be consumed twice when parsing JSON in aiohttp.

The remaining CodeRabbit suggestion regarding updating DB state on all exception paths affects broader indexing state-management logic (e.g. handling of pending → failed transitions) and was present prior to this PR. Addressing it would expand the scope significantly and likely deserves a focused change on its own.

I’m happy to take that up in a follow-up PR.
You can merge this PR for now. And in the meantime, I'll be addressing the issue that Codebrabbit suggested separately.

Differentiate retryable and non-retryable indexing errors

ff7320f

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

Handle rate-limited indexing requests as retryable

9893bd8

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

Fix response body consumption when parsing JSON

0ef76da

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

Differentiate retryable and non-retryable indexing errors #220

Are you sure you want to change the base?

Differentiate retryable and non-retryable indexing errors #220

Conversation

the-asmit commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Description

🔧 Changes Made

📷 Screenshots or Visual Changes (if applicable)

✅ Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

the-asmit commented Jan 7, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

the-asmit commented Jan 7, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

the-asmit commented Jan 7, 2026

Uh oh!

the-asmit commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aditya30ag commented Jan 8, 2026

Uh oh!

the-asmit commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

the-asmit commented Jan 7, 2026 •

edited

Loading

coderabbitai bot commented Jan 7, 2026 •

edited

Loading

the-asmit commented Jan 7, 2026 •

edited

Loading