-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Metadata
- File(s):
frappe_gmail_thread/frappe_gmail_thread/doctype/gmail_thread/gmail_thread.py:222,249,272,372 - Category: Database / Background Jobs
- Severity: High
- Effort to Fix: Medium
- Estimated Performance Gain: 50-70%
Problem Description
The Gmail sync function (sync()) calls frappe.db.commit() after processing each individual email message. For users with active inboxes receiving 50-100 new emails, a single sync operation will execute 50-100 separate database commits.
Each commit:
- Forces a disk flush to the database
- Acquires and releases database locks
- Creates transaction overhead
This significantly slows down the sync process and increases database load.
Code Location
Initial sync - commit per email (line 222):
gmail_thread.append("emails", email)
gmail_thread.save(ignore_permissions=True)
frappe.db.commit() # nosemgrep <-- Commit after EACH email
frappe.db.set_value(
"Gmail Thread",
gmail_thread.name,
"modified",
email.date_and_time,
update_modified=False,
)End of initial sync label loop (line 249):
gmail_account.reload()
gmail_account.last_historyid = max_history_id
gmail_account.save(ignore_permissions=True)
frappe.db.commit() # nosemgrepHistory not found error handling (line 272):
if error.get("reason") == "notFound":
gmail_account.last_historyid = 0
gmail_account.save(ignore_permissions=True)
frappe.db.commit()
returnEnd of incremental sync (line 372):
gmail_account.reload()
gmail_account.last_historyid = max_history_id
gmail_account.save(ignore_permissions=True)
frappe.db.commit() # nosemgrepRoot Cause
The developer likely added per-email commits to ensure data persistence in case of mid-sync failures. However, this approach:
- Creates excessive I/O overhead
- Negates the benefits of batch processing
- Causes the sync to take much longer than necessary
- Places unnecessary load on the database during the 30-minute sync cycle
Proposed Solution
Batch commits - either commit once at the end of processing all emails for a label, or commit every N emails (e.g., 10-20) to balance durability with performance:
def sync(user=None):
if user:
frappe.set_user(user)
user = frappe.session.user
gmail_account = frappe.get_doc("Gmail Account", {"linked_user": user})
# ... validation code ...
BATCH_COMMIT_SIZE = 20 # Commit every N emails
emails_processed = 0
for label_id in label_ids:
try:
if not last_history_id:
# Initial sync
threads = gmail.users().threads().list(userId="me", labelIds=label_id).execute()
if "threads" not in threads:
continue
for thread in threads["threads"][::-1]:
# ... process thread and emails ...
for message in thread_data["messages"]:
# ... process email ...
gmail_thread.append("emails", email)
gmail_thread.save(ignore_permissions=True)
emails_processed += 1
if emails_processed % BATCH_COMMIT_SIZE == 0:
frappe.db.commit()
# Update history ID periodically for crash recovery
gmail_account.reload()
gmail_account.last_historyid = max_history_id
gmail_account.save(ignore_permissions=True)
# Final commit for remaining emails
frappe.db.commit()
gmail_account.reload()
gmail_account.last_historyid = max_history_id
gmail_account.save(ignore_permissions=True)
else:
# Incremental sync - similar batching approach
# ...
except Exception:
frappe.db.rollback()
frappe.log_error(frappe.get_traceback(), "Gmail Thread Sync Error")
continueImplementation Steps
- Add a batch commit size constant (e.g., 20 emails)
- Track the number of emails processed in a counter
- Only commit when counter reaches batch size
- After processing all emails, perform a final commit
- Update
last_historyidduring batch commits for crash recovery - Add
frappe.db.rollback()in exception handler to clean up failed transactions - Test with accounts that have many new emails
Additional Notes
- The
# nosemgrepcomment suggests the developer was aware this is a pattern to avoid but added it intentionally - Consider using
frappe.db.savepoint()for finer-grained transaction control if needed - With 10 Gmail accounts syncing every 30 minutes, this optimization could reduce database commits from 1000+ to ~50 per sync cycle