Skip to content

Eliminate N+1 queries in scheduled tasks#51

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/optimize-get-doc-usage
Draft

Eliminate N+1 queries in scheduled tasks#51
Copilot wants to merge 2 commits intomainfrom
copilot/optimize-get-doc-usage

Conversation

Copy link
Contributor

Copilot AI commented Jan 13, 2026

Both sync_emails() and enable_pubsub_everyday() fetched account names with get_all() then immediately loaded full documents with get_doc() in a loop. With sync_emails() running every 30 minutes, this creates unnecessary database load.

Changes

sync.py - Eliminated get_doc() entirely

# Before: Only fetch name, then get_doc() for each account
gmail_accounts = frappe.get_all("Gmail Account", filters={"gmail_enabled": 1}, fields=["name"])
for gmail_account in gmail_accounts:
    gaccount = frappe.get_doc("Gmail Account", gmail_account.name)
    if gaccount.refresh_token:
        user = gaccount.linked_user

# After: Fetch needed fields directly
gmail_accounts = frappe.get_all("Gmail Account", filters={"gmail_enabled": 1}, 
                                fields=["name", "refresh_token", "linked_user"])
for gmail_account in gmail_accounts:
    if gmail_account.refresh_token:
        user = gmail_account.linked_user

daily.py - Filter at query time instead of in Python

# Before: Load all enabled accounts
filters={"gmail_enabled": 1}

# After: Filter accounts without tokens upfront
filters={"gmail_enabled": 1, "refresh_token": ["is", "set"]}

Note: daily.py still requires get_doc() since enable_pubsub() accesses the labels child table, but the filter reduces unnecessary iterations.

Original prompt

This section details on the original issue you should resolve

<issue_title>get_doc() in Loop for Sync and Daily Tasks</issue_title>
<issue_description>

Metadata

  • File(s):
    • frappe_gmail_thread/tasks/sync.py:12
    • frappe_gmail_thread/tasks/daily.py:20
  • Category: Database / Scheduled Tasks
  • Severity: Medium
  • Effort to Fix: Low
  • Estimated Performance Gain: 20-40%

Problem Description

Both scheduled tasks (sync_emails and enable_pubsub_everyday) fetch Gmail Account names with get_all(), then immediately load the full document with get_doc() in a loop. This creates unnecessary database queries when only specific fields are needed.

The sync_emails task runs every 30 minutes, and enable_pubsub_everyday runs daily. With many Gmail accounts, these N+1 query patterns add up.

Code Location

sync.py (line 12):

def sync_emails():
    gmail_accounts = frappe.get_all(
        "Gmail Account",
        filters={"gmail_enabled": 1},
        fields=["name"],  # Only fetches name
    )
    for gmail_account in gmail_accounts:
        gaccount = frappe.get_doc("Gmail Account", gmail_account.name)  # Full doc load
        if gaccount.refresh_token:
            user = gaccount.linked_user
            # ...

daily.py (line 20):

def enable_pubsub_everyday():
    # ...
    gmail_accounts = frappe.get_all(
        "Gmail Account", filters={"gmail_enabled": 1}, fields=["name"]
    )
    for gmail_account in gmail_accounts:
        gaccount = frappe.get_doc("Gmail Account", gmail_account.name)  # Full doc load
        try:
            enable_pubsub(gaccount)
        # ...

Root Cause

The pattern of:

  1. Fetch list with minimal fields
  2. Loop and load full document for each item

This is inefficient because:

  • Initial query returns only name
  • Each get_doc() requires a separate database round-trip
  • Full document loads all fields, child tables, and runs controller logic

Proposed Solution

For sync.py - Only need refresh_token and linked_user:

def sync_emails():
    gmail_accounts = frappe.get_all(
        "Gmail Account",
        filters={"gmail_enabled": 1},
        fields=["name", "refresh_token", "linked_user"],  # Fetch needed fields
    )
    for gmail_account in gmail_accounts:
        if gmail_account.refresh_token:
            user = gmail_account.linked_user
            job_name = f"gmail_thread_sync_{user}"
            if not is_job_enqueued(job_name):
                frappe.enqueue(
                    "frappe_gmail_thread.frappe_gmail_thread.doctype.gmail_thread.gmail_thread.sync",
                    user=user,
                    queue="long",
                    job_name=job_name,
                    job_id=job_name,
                )

For daily.py - enable_pubsub needs full doc, but can batch check first:

If enable_pubsub() truly needs the full document, at minimum filter accounts that have the necessary tokens:

def enable_pubsub_everyday():
    google_settings = frappe.get_single("Google Settings")
    if not google_settings.enable:
        return
    if (
        not google_settings.custom_gmail_sync_in_realtime
        or not google_settings.custom_gmail_pubsub_topic
    ):
        return

    # Only fetch accounts with refresh tokens (they're the only ones that can use pubsub)
    gmail_accounts = frappe.get_all(
        "Gmail Account", 
        filters={
            "gmail_enabled": 1,
            "refresh_token": ["is", "set"]  # Filter out accounts without tokens
        }, 
        fields=["name"]
    )
    for gmail_account in gmail_accounts:
        gaccount = frappe.get_doc("Gmail Account", gmail_account.name)
        try:
            enable_pubsub(gaccount)
        except Exception as e:
            frappe.log_error(title="PubSub Error", message=e)
            continue

Implementation Steps

  1. sync.py:

    • Update fields to include ["name", "refresh_token", "linked_user"]
    • Remove get_doc() call
    • Access fields directly from the dict returned by get_all()
  2. daily.py:

    • Add filter for refresh_token to reduce iterations
    • If enable_pubsub() can be refactored to accept specific fields, do so
    • Otherwise, the filter optimization still reduces unnecessary iterations
  3. Test both scheduled tasks with multiple Gmail accounts

Additional Notes

  • The sync() function called by enqueue() does need the full document, but that's inside the background job
  • The scheduler only needs to check if tokens exist and get the user email
  • This pattern appears across multiple rtCamp repos (frappe-slack-connector, frappe-github-connector)</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- sync.py: Fetch refresh_token and linked_user fields directly, eliminating N get_doc() calls
- daily.py: Filter accounts by refresh_token to reduce unnecessary document loads

Co-authored-by: mrrobot47 <25586785+mrrobot47@users.noreply.github.com>
Copilot AI changed the title [WIP] Optimize get_doc() usage in sync and daily tasks Eliminate N+1 queries in scheduled tasks Jan 13, 2026
Copilot AI requested a review from mrrobot47 January 13, 2026 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

get_doc() in Loop for Sync and Daily Tasks

2 participants