Fix SQLite concurrency issues with WAL mode and timeouts #982

rishit-codes · 2026-01-07T08:45:51Z

Problem
Intermittent SQLite concurrency issues (e.g. locked database errors) could occur under concurrent access and reload scenarios due to unsafe direct connections and missing timeout configuration.

Fixes #913

Changes

Introduced a centralized SQLite connection context manager with WAL mode and busy timeout enabled
Ensured consistent concurrency settings across affected database access paths
Replaced unsafe direct connections in image-related operations

Testing

Ran backend and sync microservice locally under reload mode
Verified image, folder, and face-cluster endpoints
No locking or startup issues observed

Summary by CodeRabbit

Bug Fixes
- Improved error handling: database operations now surface errors and ensure proper rollback/cleanup instead of silently failing.
Refactor
- Redesigned database connection and transaction management for better timeouts, concurrency, and reliability; reinforced integrity constraints and streamlined lifecycle handling across DB operations.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2026-01-07T08:46:11Z

⚠️ No issue was linked in the PR description.
Please make sure to link an issue (e.g., 'Fixes #issue_number')

coderabbitai · 2026-01-07T08:46:11Z

📝 Walkthrough

Walkthrough

Reworked the SQLite connection manager to set a 30s timeout, enable WAL and concurrency/performance PRAGMAs, enforce relational integrity PRAGMAs, and add lifecycle guards. Refactored image-related DB functions to use the unified context manager and changed error paths to raise exceptions.

Changes

Cohort / File(s)	Summary
Connection manager `backend/app/database/connection.py`	Rewrote `get_db_connection` to open connections with 30s timeout, enable WAL, set synchronous/cache/tempstore/mmap_size/busy_timeout PRAGMAs, enforce FOREIGN KEYS and immediate checks, and add null-safety guards in exception/finally blocks; docstring updated.
Images DB — connection migration `backend/app/database/images.py`	Replaced direct `sqlite3.connect`/_connect usage with `get_db_connection` context manager across CRUD functions (`db_create_images_table`, `db_bulk_insert_images`, `db_get_all_images`, `db_get_untagged_images`, `db_update_image_tagged_status`, `db_insert_image_classes_batch`, `db_get_images_by_folder_ids`, `db_delete_images_by_ids`, `db_toggle_image_favourite_status`); unified transaction patterns and shifted error handling to raise exceptions instead of silent/boolean fallbacks.
Behavioral/error-handling changes `backend/app/database/images.py` (various functions)	Error paths now log and re-raise instead of returning False or swallowing errors; transactions and commits handled via the context manager scope.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Fix: Ensure atomic DB operations and consistency in global reclustering (#562) #570 — Similar refactor: introduces/hardens get_db_connection and updates DB modules (albums/faces/metadata) to use the shared context manager.

Suggested labels

bug, backend

Suggested reviewers

rahulharpal1603

Poem

🐰 I hopped into code with a curious twitch,
Set timeouts long and PRAGMAs rich,
WAL keeps the rhythm, checks keep things right,
Connections guarded through day and night,
Hooray—no locks to spoil our delight! 🎉

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: centralized SQLite connection context manager with WAL mode and timeout configuration to fix concurrency issues.
Linked Issues check	✅ Passed	All coding objectives from issue #913 are met: WAL mode enabled, busy timeout configured, connection handling centralized via get_db_connection context manager across database operations.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing SQLite concurrency: connection manager setup, PRAGMA configuration, and refactoring of database operations to use the centralized context manager.
Docstring Coverage	✅ Passed	Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @backend/app/database/images.py:
- Around line 408-426: The file mixes the new get_db_connection() context
manager (used in db_toggle_image_favourite_status) and legacy _connect() calls
causing inconsistent concurrency/locking behavior; migrate the remaining
image-related DB functions (e.g., db_create_images_table, db_bulk_insert_images,
db_get_all_images, db_get_untagged_images, db_update_image_tagged_status,
db_insert_image_classes_batch, db_get_images_by_folder_ids,
db_delete_images_by_ids and any others still calling _connect) to use
get_db_connection() as a context manager, replace direct _connect() usages with
"with get_db_connection() as conn: cursor = conn.cursor()", ensure WAL
mode/timeout are applied by get_db_connection, call conn.commit() after writes
where needed, preserve existing SQL and return semantics, and align exception
logging/raising to the pattern used in db_toggle_image_favourite_status
(logger.error and re-raise).
- Around line 47-53: The file mixes two connection strategies: the legacy
_connect() and the context-managed get_db_connection(); migrate all functions
that currently call _connect() (the eight functions referenced in the review) to
use the get_db_connection() context manager like the existing
db_toggle_image_favourite_status does. Replace patterns like conn = _connect();
cur = conn.cursor() ... conn.commit(); conn.close() with a with
get_db_connection() as conn: block, use conn.cursor() inside it, remove explicit
conn.close(), and keep commit/rollback semantics consistent (commit when
mutating, let exceptions propagate for rollback). Ensure you do not re-apply
PRAGMA settings already handled by get_db_connection() and update any
docstrings/comments to reflect the unified connection strategy.

🧹 Nitpick comments (3)

backend/app/database/connection.py (3)
20-28: Both timeout parameter and busy_timeout PRAGMA are redundant.

Setting both timeout=30.0 (line 20) and PRAGMA busy_timeout = 30000 (line 28) is redundant—both configure the same 30-second wait period for locked databases. The timeout parameter is sufficient and slightly more efficient.
♻️ Simplify by removing redundant PRAGMA
-        conn.execute("PRAGMA busy_timeout = 30000;")  # 30 second timeout for locks
The connection-level timeout parameter already handles this.
23-23: Setting WAL mode on every connection is inefficient.

WAL mode is a database-level persistent setting, not a connection-level setting. Once enabled, it remains active for all subsequent connections until explicitly changed. Executing PRAGMA journal_mode = WAL on every connection adds unnecessary overhead.

Consider setting WAL mode once during database initialization (e.g., in a migration script or startup routine) rather than on every connection.

22-28: Consider the performance overhead of setting multiple PRAGMAs on every connection.

Executing 7 PRAGMA statements on every connection has measurable overhead, especially for short-lived connections or high-frequency operations. While these settings optimize concurrency, the setup cost may outweigh the benefits for quick queries.

For frequently-used settings like WAL mode, synchronous, and foreign keys, consider:

Setting persistent PRAGMAs (WAL, synchronous) once at database initialization

Applying connection-specific PRAGMAs (cache_size, mmap_size) only for long-lived or high-load connections

Profiling whether all 7 PRAGMAs provide measurable benefit in your workload

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3af96c and 6654ee6.

📒 Files selected for processing (2)

backend/app/database/connection.py
backend/app/database/images.py

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 570
File: backend/app/database/connection.py:16-24
Timestamp: 2025-10-31T17:00:50.132Z
Learning: In PictoPy backend, the user prefers not to use database connection retry logic or extended busy timeouts in the centralized get_db_connection() context manager, even though the app has concurrent access patterns via ProcessPoolExecutor and FastAPI.

📚 Learning: 2025-10-31T17:00:50.132Z

Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 570
File: backend/app/database/connection.py:16-24
Timestamp: 2025-10-31T17:00:50.132Z
Learning: In PictoPy backend, the user prefers not to use database connection retry logic or extended busy timeouts in the centralized get_db_connection() context manager, even though the app has concurrent access patterns via ProcessPoolExecutor and FastAPI.

Applied to files:

backend/app/database/connection.py

🧬 Code graph analysis (1)

backend/app/database/images.py (1)

backend/app/database/connection.py (1)

get_db_connection (8-43)

🔇 Additional comments (4)

backend/app/database/connection.py (3)

20-20: Verify: This contradicts prior team preferences on timeout configuration.

Based on learnings, user Hemil36 previously preferred not to use extended busy timeouts in get_db_connection(), even with concurrent access patterns. The addition of timeout=30.0 here contradicts that preference.

Please confirm whether the team's position on timeout configuration has changed, or if this should be adjusted to align with the previous decision.

25-27: Verify that large cache and mmap sizes are appropriate for your deployment environment.

The settings allocate significant memory per connection:

cache_size = -64000 → 64MB cache

mmap_size = 268435456 → 256MB memory-mapped I/O

For applications with many concurrent connections or memory-constrained environments, these values could lead to excessive memory usage or OOM errors.

Please verify:

Whether these sizes align with your deployment environment's memory capacity

Whether connection pooling limits are in place to prevent excessive memory consumption

Whether these values have been tested under your expected peak concurrency

38-39: LGTM: Connection lifecycle guards prevent errors.

The addition of if conn: guards before rollback() and close() properly handles cases where connection initialization fails, preventing potential AttributeError exceptions.

Also applies to: 42-43

backend/app/database/images.py (1)

60-96: Improved exception handling in table creation.

The refactored db_create_images_table() now properly wraps operations in try/except/finally blocks with explicit commit and rollback, which is a good defensive programming practice.

However, note that this function still uses _connect() instead of get_db_connection(), which means it lacks the full concurrency optimizations described in the PR objectives. See the related comment on lines 47-53.

backend/app/database/images.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

backend/app/database/images.py (2)
110-196: Consider raising exceptions for consistency with write operations.

This read operation returns an empty list on error (line 196), while write operations (db_bulk_insert_images, db_update_image_tagged_status, db_delete_images_by_ids) raise exceptions. This inconsistency makes it difficult for callers to distinguish between "no results found" and "database error occurred."

For better error visibility and debugging, consider propagating exceptions from read operations as well. If graceful degradation is intentional for reads, document this design decision in the docstring.
♻️ Proposed refactor for consistent error handling
         except Exception as e:
             logger.error(f"Error getting all images: {e}")
-            return []
+            raise
294-325: Consider raising exceptions for consistency with write operations.

Similar to db_get_all_images, this read operation returns an empty list on error (line 325) while write operations raise exceptions. This masks database errors from callers who cannot distinguish between "no images in these folders" and "database error."

For consistency and better error visibility, consider propagating exceptions.
♻️ Proposed refactor for consistent error handling
         except Exception as e:
             logger.error(f"Error getting images by folder IDs: {e}")
-            return []
+            raise

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6654ee6 and 9142348.

📒 Files selected for processing (1)

backend/app/database/images.py

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 570
File: backend/app/database/connection.py:16-24
Timestamp: 2025-10-31T17:00:50.132Z
Learning: In PictoPy backend, the user prefers not to use database connection retry logic or extended busy timeouts in the centralized get_db_connection() context manager, even though the app has concurrent access patterns via ProcessPoolExecutor and FastAPI.

📚 Learning: 2025-10-31T17:00:50.132Z

Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 570
File: backend/app/database/connection.py:16-24
Timestamp: 2025-10-31T17:00:50.132Z
Learning: In PictoPy backend, the user prefers not to use database connection retry logic or extended busy timeouts in the centralized get_db_connection() context manager, even though the app has concurrent access patterns via ProcessPoolExecutor and FastAPI.

Applied to files:

backend/app/database/images.py

🧬 Code graph analysis (1)

backend/app/database/images.py (2)

backend/app/database/connection.py (1)

get_db_connection (8-43)

backend/app/utils/images.py (1)

image_util_parse_metadata (496-513)

🔇 Additional comments (8)

backend/app/database/images.py (8)

9-9: Verify that the 30-second timeout aligns with current architectural preferences.

Based on learnings, a previous team member (Hemil36, PR 570) preferred not to use extended busy timeouts in the centralized connection manager. The current PR introduces a 30-second timeout to address issue #913. Please confirm this approach change has been reviewed and aligns with the team's updated concurrency strategy.

47-79: LGTM! Clean table creation with proper foreign key constraints.

The function correctly uses the context manager for both table creations, with appropriate IF NOT EXISTS guards and CASCADE deletes for referential integrity. The lack of explicit error handling is acceptable here since table creation errors should propagate to the caller for proper initialization flow handling.

81-107: LGTM! Proper bulk upsert with conflict resolution.

The function correctly uses the context manager and implements appropriate upsert logic. The CASE statement for isTagged ensures that once an image is tagged, it remains tagged even during re-insertion. Error handling (log and raise) is consistent with write operation patterns.

199-238: LGTM! Clean query with proper error propagation.

The function correctly uses the context manager, filters images needing AI tagging with an appropriate JOIN, and lets exceptions propagate naturally. The metadata parsing and None-handling for folder_id are both correct.

241-262: LGTM! Proper update with rowcount validation.

The function correctly uses the context manager and returns a boolean based on whether any rows were affected. Exception handling (log and raise) is consistent with other write operations.

Note: The function returns False when image_id doesn't exist (rowcount=0), not an exception. This is a valid design choice but callers should be aware that False indicates "no rows affected" rather than an error.

265-291: LGTM! Efficient batch insert with proper conflict handling.

The function correctly uses the context manager and INSERT OR IGNORE to handle duplicate pairs gracefully. The early return for empty lists and consistent error handling (log and raise) align with other write operations.

328-355: LGTM! Proper bulk delete with CASCADE awareness.

The function correctly uses the context manager and documents the CASCADE behavior for image_classes records. The deletion count logging is valuable for audit trails, and error handling (log and raise) is consistent with other write operations.

357-376: LGTM! Proper toggle implementation with existence check.

The function correctly uses the context manager, checks for image existence before attempting the toggle, and uses the CASE expression to flip the boolean value. Error handling is consistent with other write operations.

rishit-codes added 4 commits January 7, 2026 11:56

Fix SQLite concurrency issues with WAL mode and timeouts

7e3f43b

Fix SQLite concurrency issues with WAL mode and timeouts

71bc949

Fix SQLite concurrency issues with WAL mode and timeouts

72a69c1

Fix SQLite concurrency issues with WAL mode and timeouts

6654ee6

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

backend/app/database/images.py Outdated Show resolved Hide resolved

backend/app/database/images.py Show resolved Hide resolved

Changes to be committed after reviewing.

9142348

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

rishit-codes mentioned this pull request Jan 7, 2026

BUG: Improve SQLite concurrency handling to prevent database is locked errors #913

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix SQLite concurrency issues with WAL mode and timeouts #982

Fix SQLite concurrency issues with WAL mode and timeouts #982

Uh oh!

rishit-codes commented Jan 7, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

coderabbitai bot commented Jan 7, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix SQLite concurrency issues with WAL mode and timeouts #982

Are you sure you want to change the base?

Fix SQLite concurrency issues with WAL mode and timeouts #982

Uh oh!

Conversation

rishit-codes commented Jan 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

coderabbitai bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rishit-codes commented Jan 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 7, 2026 •

edited

Loading