Skip to content

Phase 3: Exception Handling — Core Services#240

Draft
paultranvan wants to merge 19 commits intodevfrom
hardening/phase-3
Draft

Phase 3: Exception Handling — Core Services#240
paultranvan wants to merge 19 commits intodevfrom
hardening/phase-3

Conversation

@paultranvan
Copy link
Collaborator

Summary

  • Replace 51 broad except Exception handlers in components and pipeline with typed exceptions
  • Catch MilvusException → wrap in VDBError subclasses for vectordb operations
  • Distinguish retrieval failures, LLM failures, and data errors in RAG pipeline

Changes

  • vectordb.py, utils.py: 17 handlers — MilvusExceptionVDBError subclasses, SQLAlchemy → VDBError
  • indexer.py, embeddings/openai.py, chunker.py: 10 handlers — OSError, VDBError, EmbeddingError
  • 7 loaders (base, image, media, pptx, eml, serializer, marker): 19 handlers
  • pipeline.py, map_reduce.py, reranker.py, llm.py: 5 handlers — httpx errors, RayTaskError

Test plan

  • All 98 existing tests pass
  • ruff check openrag/ passes
  • VLM captioning failures degrade gracefully (empty string)
  • LLM streaming catches asyncio.CancelledError first

🤖 Generated with Claude Code

- Replace httpx.Timeout(timeout=httpx.Timeout(4 * 60.0)) with httpx.Timeout(4 * 60.0)
- Fixed in auth_callback (line 69) and on_chat_start (line 134)
- Eliminates TypeError from incorrectly nested Timeout objects
- Add Pydantic schema for file upload metadata validation
- Define mimetype (str | None) and domains (list[str]) fields
- Add field_validator to ensure domains contains only non-empty strings
- Configure model_config with extra: "allow" for backward compatibility
- Imports: Field and field_validator from pydantic
- Add SQLAlchemy URL import
- Replace f-string interpolation with URL.create() for database URL
- Properly escape special characters in credentials (username, password)
- Pass URL object to PartitionFileManager instead of f-string
…env.py

- Add SQLAlchemy URL to import
- Replace f-string interpolation with URL.create() for database URL
- Convert URL object to string with str() for set_main_option()
- Properly escape special characters in credentials
- Import FileMetadataSchema and ValidationError in routers/utils.py
- Update validate_metadata() to validate parsed JSON against schema
- Add ValidationError exception handling with formatted error messages
- Return validated.model_dump() dict for backward compatibility
- Maintain JSON syntax error handling separately from schema validation
@coderabbitai
Copy link

coderabbitai bot commented Feb 11, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hardening/phase-3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
paultranvan and others added 11 commits February 17, 2026 15:19
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace scattered HTTPException(500) raises with typed OpenRAGError
subclasses that flow through the global handler in api.py:

- Add common.py: RayActorError, FileStorageError, ToolExecutionError,
  UnexpectedError
- All routers use `except OpenRAGError: raise` passthrough pattern
- RayTaskError/TaskCancelledError → RayActorError with `from e` chaining
- OSError → FileStorageError for disk I/O failures
- Generic Exception → UnexpectedError as catch-all
- Streaming errors use _make_sse_error() helper for consistent SSE format
- utils.py: add httpx.TimeoutException→504, httpx.HTTPError→503 for
  LLM health checks (appropriate as HTTPException for external services)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FileStorageError, RayActorError, ToolExecutionError, and UnexpectedError
are not raised in this phase — they belong in phase-3 (core services).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Pipeline catches VDBError and RayTaskError separately from generic LLM errors
- Map-reduce gracefully degrades on per-chunk failures with warning log
- Added imports for VDBError and RayTaskError
- Improved error context logging in both components
…tx loaders

- base.py: Catch ValueError and binascii.Error for base64 decode failures
- base.py: VLM captioning preserves graceful degradation with BadRequestError
- image.py: Catch OSError for file I/O errors, UnidentifiedImageError for invalid formats
- media_loader.py: Catch openai.APIError for transcription API failures
- media_loader.py: Catch langdetect.LangDetectException for language detection
- pptx_loader.py: Catch AttributeError/IndexError for chart structure errors
- Reranker logs context (query snippet, doc count) on failures
- LLM streaming catches httpx.HTTPStatusError and httpx.RequestError separately
- asyncio.CancelledError caught first in streaming per Phase 2 decision
- Generic error messages prevent internal detail exposure
- Added asyncio import for cancellation handling
… VDBError subclasses

- Replace 13 broad Exception handlers with specific exception types
- Milvus operations now catch MilvusException and raise VDBError subclasses
- Exception catch-all handlers use generic non-exposing message: "An unexpected database error occurred"
- Add MilvusException handling to async_add_documents for insert operations
- Update file_exists and partition_exists to propagate VDBError
- Update list_partitions to use UnexpectedVDBError instead of re-raise
- All 93 existing tests continue passing
…ceptions

- Catch OSError, VDBError, EmbeddingError separately in add_file()
- Add specific handlers for delete_file(), update_file_metadata(), copy_file()
- Preserve cleanup logic in finally block with nested try/except
- Import VDBError and EmbeddingError from utils.exceptions
…r loaders

- eml_loader.py: Catch email.errors.MessageError for parsing, UnicodeDecodeError for text
- eml_loader.py: Catch OSError for file I/O, UnidentifiedImageError for images
- serializer.py: Catch OSError for file operations, wrap others in RuntimeError
- marker.py: Catch asyncio.CancelledError FIRST (cancellation), OSError for file I/O
- marker.py: Wrap Marker library errors in RuntimeError with generic message
- Catch openai.APIError in embedding_dimension property
- Use generic message for UnexpectedEmbeddingError
- Catch VLM APITimeoutError and APIError separately in chunker
- Use logger.exception for unexpected errors in contextualize_chunks
…BError subclasses

- Replace 4 broad Exception handlers in PartitionFileManager with specific VDBError subclasses
- Database connection errors use VDBConnectionError with generic message
- Insert operations (add_file_to_partition, set_file_domains) raise VDBInsertError
- Delete operations (remove_file_from_partition) raise VDBDeleteError
- All handlers use generic message: "An unexpected database error occurred"
- All 93 existing tests continue passing
@paultranvan paultranvan changed the base branch from hardening/phase-2 to dev February 17, 2026 16:09
paultranvan and others added 2 commits February 17, 2026 17:22
…asses

Replace all 12 RuntimeError raises introduced by phase-3 with proper
OpenRAGError subclasses that flow through the global handler:

- indexer.py: RuntimeError → UnexpectedError, VDBError/EmbeddingError
  catches widened to OpenRAGError
- llm.py: RuntimeError → UnexpectedError for streaming failures
- reranker.py: RuntimeError → UnexpectedError
- serializer.py: RuntimeError → FileStorageError (I/O) / UnexpectedError
- marker.py: RuntimeError → FileStorageError (I/O) / UnexpectedError
- ray_utils.py: RuntimeError → RayActorError (wraps RayTaskError)
- Fix pre-existing lint issues (unused vars, unsorted imports)

All exception raises now use `from e` chaining for proper tracebacks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FileStorageError, RayActorError, ToolExecutionError, and UnexpectedError
moved here from phase-2 since this is where they are first raised.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant