Skip to content

Conversation

@rishit-codes
Copy link

@rishit-codes rishit-codes commented Jan 7, 2026

Problem
Intermittent SQLite concurrency issues (e.g. locked database errors) could occur under concurrent access and reload scenarios due to unsafe direct connections and missing timeout configuration.

Fixes #913

Changes

  • Introduced a centralized SQLite connection context manager with WAL mode and busy timeout enabled
  • Ensured consistent concurrency settings across affected database access paths
  • Replaced unsafe direct connections in image-related operations

Testing

  • Ran backend and sync microservice locally under reload mode
  • Verified image, folder, and face-cluster endpoints
  • No locking or startup issues observed

Summary by CodeRabbit

  • Bug Fixes

    • Improved error handling: database operations now surface errors and ensure proper rollback/cleanup instead of silently failing.
  • Refactor

    • Redesigned database connection and transaction management for better timeouts, concurrency, and reliability; reinforced integrity constraints and streamlined lifecycle handling across DB operations.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

⚠️ No issue was linked in the PR description.
Please make sure to link an issue (e.g., 'Fixes #issue_number')

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 7, 2026

📝 Walkthrough

Walkthrough

Reworked the SQLite connection manager to set a 30s timeout, enable WAL and concurrency/performance PRAGMAs, enforce relational integrity PRAGMAs, and add lifecycle guards. Refactored image-related DB functions to use the unified context manager and changed error paths to raise exceptions.

Changes

Cohort / File(s) Summary
Connection manager
backend/app/database/connection.py
Rewrote get_db_connection to open connections with 30s timeout, enable WAL, set synchronous/cache/tempstore/mmap_size/busy_timeout PRAGMAs, enforce FOREIGN KEYS and immediate checks, and add null-safety guards in exception/finally blocks; docstring updated.
Images DB — connection migration
backend/app/database/images.py
Replaced direct sqlite3.connect/_connect usage with get_db_connection context manager across CRUD functions (db_create_images_table, db_bulk_insert_images, db_get_all_images, db_get_untagged_images, db_update_image_tagged_status, db_insert_image_classes_batch, db_get_images_by_folder_ids, db_delete_images_by_ids, db_toggle_image_favourite_status); unified transaction patterns and shifted error handling to raise exceptions instead of silent/boolean fallbacks.
Behavioral/error-handling changes
backend/app/database/images.py (various functions)
Error paths now log and re-raise instead of returning False or swallowing errors; transactions and commits handled via the context manager scope.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

bug, backend

Suggested reviewers

  • rahulharpal1603

Poem

🐰 I hopped into code with a curious twitch,
Set timeouts long and PRAGMAs rich,
WAL keeps the rhythm, checks keep things right,
Connections guarded through day and night,
Hooray—no locks to spoil our delight! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: centralized SQLite connection context manager with WAL mode and timeout configuration to fix concurrency issues.
Linked Issues check ✅ Passed All coding objectives from issue #913 are met: WAL mode enabled, busy timeout configured, connection handling centralized via get_db_connection context manager across database operations.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing SQLite concurrency: connection manager setup, PRAGMA configuration, and refactoring of database operations to use the centralized context manager.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @backend/app/database/images.py:
- Around line 408-426: The file mixes the new get_db_connection() context
manager (used in db_toggle_image_favourite_status) and legacy _connect() calls
causing inconsistent concurrency/locking behavior; migrate the remaining
image-related DB functions (e.g., db_create_images_table, db_bulk_insert_images,
db_get_all_images, db_get_untagged_images, db_update_image_tagged_status,
db_insert_image_classes_batch, db_get_images_by_folder_ids,
db_delete_images_by_ids and any others still calling _connect) to use
get_db_connection() as a context manager, replace direct _connect() usages with
"with get_db_connection() as conn: cursor = conn.cursor()", ensure WAL
mode/timeout are applied by get_db_connection, call conn.commit() after writes
where needed, preserve existing SQL and return semantics, and align exception
logging/raising to the pattern used in db_toggle_image_favourite_status
(logger.error and re-raise).
- Around line 47-53: The file mixes two connection strategies: the legacy
_connect() and the context-managed get_db_connection(); migrate all functions
that currently call _connect() (the eight functions referenced in the review) to
use the get_db_connection() context manager like the existing
db_toggle_image_favourite_status does. Replace patterns like conn = _connect();
cur = conn.cursor() ... conn.commit(); conn.close() with a with
get_db_connection() as conn: block, use conn.cursor() inside it, remove explicit
conn.close(), and keep commit/rollback semantics consistent (commit when
mutating, let exceptions propagate for rollback). Ensure you do not re-apply
PRAGMA settings already handled by get_db_connection() and update any
docstrings/comments to reflect the unified connection strategy.
🧹 Nitpick comments (3)
backend/app/database/connection.py (3)

20-28: Both timeout parameter and busy_timeout PRAGMA are redundant.

Setting both timeout=30.0 (line 20) and PRAGMA busy_timeout = 30000 (line 28) is redundant—both configure the same 30-second wait period for locked databases. The timeout parameter is sufficient and slightly more efficient.

♻️ Simplify by removing redundant PRAGMA
-        conn.execute("PRAGMA busy_timeout = 30000;")  # 30 second timeout for locks

The connection-level timeout parameter already handles this.


23-23: Setting WAL mode on every connection is inefficient.

WAL mode is a database-level persistent setting, not a connection-level setting. Once enabled, it remains active for all subsequent connections until explicitly changed. Executing PRAGMA journal_mode = WAL on every connection adds unnecessary overhead.

Consider setting WAL mode once during database initialization (e.g., in a migration script or startup routine) rather than on every connection.


22-28: Consider the performance overhead of setting multiple PRAGMAs on every connection.

Executing 7 PRAGMA statements on every connection has measurable overhead, especially for short-lived connections or high-frequency operations. While these settings optimize concurrency, the setup cost may outweigh the benefits for quick queries.

For frequently-used settings like WAL mode, synchronous, and foreign keys, consider:

  • Setting persistent PRAGMAs (WAL, synchronous) once at database initialization
  • Applying connection-specific PRAGMAs (cache_size, mmap_size) only for long-lived or high-load connections
  • Profiling whether all 7 PRAGMAs provide measurable benefit in your workload
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3af96c and 6654ee6.

📒 Files selected for processing (2)
  • backend/app/database/connection.py
  • backend/app/database/images.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 570
File: backend/app/database/connection.py:16-24
Timestamp: 2025-10-31T17:00:50.132Z
Learning: In PictoPy backend, the user prefers not to use database connection retry logic or extended busy timeouts in the centralized get_db_connection() context manager, even though the app has concurrent access patterns via ProcessPoolExecutor and FastAPI.
📚 Learning: 2025-10-31T17:00:50.132Z
Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 570
File: backend/app/database/connection.py:16-24
Timestamp: 2025-10-31T17:00:50.132Z
Learning: In PictoPy backend, the user prefers not to use database connection retry logic or extended busy timeouts in the centralized get_db_connection() context manager, even though the app has concurrent access patterns via ProcessPoolExecutor and FastAPI.

Applied to files:

  • backend/app/database/connection.py
🧬 Code graph analysis (1)
backend/app/database/images.py (1)
backend/app/database/connection.py (1)
  • get_db_connection (8-43)
🔇 Additional comments (4)
backend/app/database/connection.py (3)

20-20: Verify: This contradicts prior team preferences on timeout configuration.

Based on learnings, user Hemil36 previously preferred not to use extended busy timeouts in get_db_connection(), even with concurrent access patterns. The addition of timeout=30.0 here contradicts that preference.

Please confirm whether the team's position on timeout configuration has changed, or if this should be adjusted to align with the previous decision.


25-27: Verify that large cache and mmap sizes are appropriate for your deployment environment.

The settings allocate significant memory per connection:

  • cache_size = -64000 → 64MB cache
  • mmap_size = 268435456 → 256MB memory-mapped I/O

For applications with many concurrent connections or memory-constrained environments, these values could lead to excessive memory usage or OOM errors.

Please verify:

  1. Whether these sizes align with your deployment environment's memory capacity
  2. Whether connection pooling limits are in place to prevent excessive memory consumption
  3. Whether these values have been tested under your expected peak concurrency

38-39: LGTM: Connection lifecycle guards prevent errors.

The addition of if conn: guards before rollback() and close() properly handles cases where connection initialization fails, preventing potential AttributeError exceptions.

Also applies to: 42-43

backend/app/database/images.py (1)

60-96: Improved exception handling in table creation.

The refactored db_create_images_table() now properly wraps operations in try/except/finally blocks with explicit commit and rollback, which is a good defensive programming practice.

However, note that this function still uses _connect() instead of get_db_connection(), which means it lacks the full concurrency optimizations described in the PR objectives. See the related comment on lines 47-53.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/app/database/images.py (2)

110-196: Consider raising exceptions for consistency with write operations.

This read operation returns an empty list on error (line 196), while write operations (db_bulk_insert_images, db_update_image_tagged_status, db_delete_images_by_ids) raise exceptions. This inconsistency makes it difficult for callers to distinguish between "no results found" and "database error occurred."

For better error visibility and debugging, consider propagating exceptions from read operations as well. If graceful degradation is intentional for reads, document this design decision in the docstring.

♻️ Proposed refactor for consistent error handling
         except Exception as e:
             logger.error(f"Error getting all images: {e}")
-            return []
+            raise

294-325: Consider raising exceptions for consistency with write operations.

Similar to db_get_all_images, this read operation returns an empty list on error (line 325) while write operations raise exceptions. This masks database errors from callers who cannot distinguish between "no images in these folders" and "database error."

For consistency and better error visibility, consider propagating exceptions.

♻️ Proposed refactor for consistent error handling
         except Exception as e:
             logger.error(f"Error getting images by folder IDs: {e}")
-            return []
+            raise
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6654ee6 and 9142348.

📒 Files selected for processing (1)
  • backend/app/database/images.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 570
File: backend/app/database/connection.py:16-24
Timestamp: 2025-10-31T17:00:50.132Z
Learning: In PictoPy backend, the user prefers not to use database connection retry logic or extended busy timeouts in the centralized get_db_connection() context manager, even though the app has concurrent access patterns via ProcessPoolExecutor and FastAPI.
📚 Learning: 2025-10-31T17:00:50.132Z
Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 570
File: backend/app/database/connection.py:16-24
Timestamp: 2025-10-31T17:00:50.132Z
Learning: In PictoPy backend, the user prefers not to use database connection retry logic or extended busy timeouts in the centralized get_db_connection() context manager, even though the app has concurrent access patterns via ProcessPoolExecutor and FastAPI.

Applied to files:

  • backend/app/database/images.py
🧬 Code graph analysis (1)
backend/app/database/images.py (2)
backend/app/database/connection.py (1)
  • get_db_connection (8-43)
backend/app/utils/images.py (1)
  • image_util_parse_metadata (496-513)
🔇 Additional comments (8)
backend/app/database/images.py (8)

9-9: Verify that the 30-second timeout aligns with current architectural preferences.

Based on learnings, a previous team member (Hemil36, PR 570) preferred not to use extended busy timeouts in the centralized connection manager. The current PR introduces a 30-second timeout to address issue #913. Please confirm this approach change has been reviewed and aligns with the team's updated concurrency strategy.


47-79: LGTM! Clean table creation with proper foreign key constraints.

The function correctly uses the context manager for both table creations, with appropriate IF NOT EXISTS guards and CASCADE deletes for referential integrity. The lack of explicit error handling is acceptable here since table creation errors should propagate to the caller for proper initialization flow handling.


81-107: LGTM! Proper bulk upsert with conflict resolution.

The function correctly uses the context manager and implements appropriate upsert logic. The CASE statement for isTagged ensures that once an image is tagged, it remains tagged even during re-insertion. Error handling (log and raise) is consistent with write operation patterns.


199-238: LGTM! Clean query with proper error propagation.

The function correctly uses the context manager, filters images needing AI tagging with an appropriate JOIN, and lets exceptions propagate naturally. The metadata parsing and None-handling for folder_id are both correct.


241-262: LGTM! Proper update with rowcount validation.

The function correctly uses the context manager and returns a boolean based on whether any rows were affected. Exception handling (log and raise) is consistent with other write operations.

Note: The function returns False when image_id doesn't exist (rowcount=0), not an exception. This is a valid design choice but callers should be aware that False indicates "no rows affected" rather than an error.


265-291: LGTM! Efficient batch insert with proper conflict handling.

The function correctly uses the context manager and INSERT OR IGNORE to handle duplicate pairs gracefully. The early return for empty lists and consistent error handling (log and raise) align with other write operations.


328-355: LGTM! Proper bulk delete with CASCADE awareness.

The function correctly uses the context manager and documents the CASCADE behavior for image_classes records. The deletion count logging is valuable for audit trails, and error handling (log and raise) is consistent with other write operations.


357-376: LGTM! Proper toggle implementation with existence check.

The function correctly uses the context manager, checks for image existence before attempting the toggle, and uses the CASE expression to flip the boolean value. Error handling is consistent with other write operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Improve SQLite concurrency handling to prevent database is locked errors

1 participant