Eliminate N+1 queries in scheduled tasks by Copilot · Pull Request #51 · rtCamp/frappe-gmail-thread

Copilot · 2026-01-13T19:20:48Z

Both sync_emails() and enable_pubsub_everyday() fetched account names with get_all() then immediately loaded full documents with get_doc() in a loop. With sync_emails() running every 30 minutes, this creates unnecessary database load.

Changes

sync.py - Eliminated get_doc() entirely

# Before: Only fetch name, then get_doc() for each account
gmail_accounts = frappe.get_all("Gmail Account", filters={"gmail_enabled": 1}, fields=["name"])
for gmail_account in gmail_accounts:
    gaccount = frappe.get_doc("Gmail Account", gmail_account.name)
    if gaccount.refresh_token:
        user = gaccount.linked_user

# After: Fetch needed fields directly
gmail_accounts = frappe.get_all("Gmail Account", filters={"gmail_enabled": 1}, 
                                fields=["name", "refresh_token", "linked_user"])
for gmail_account in gmail_accounts:
    if gmail_account.refresh_token:
        user = gmail_account.linked_user

daily.py - Filter at query time instead of in Python

# Before: Load all enabled accounts
filters={"gmail_enabled": 1}

# After: Filter accounts without tokens upfront
filters={"gmail_enabled": 1, "refresh_token": ["is", "set"]}

Note: daily.py still requires get_doc() since enable_pubsub() accesses the labels child table, but the filter reduces unnecessary iterations.

Original prompt

This section details on the original issue you should resolve

<issue_title>get_doc() in Loop for Sync and Daily Tasks</issue_title>
<issue_description>

Metadata

File(s):
- frappe_gmail_thread/tasks/sync.py:12
- frappe_gmail_thread/tasks/daily.py:20
Category: Database / Scheduled Tasks
Severity: Medium
Effort to Fix: Low
Estimated Performance Gain: 20-40%

Problem Description

Both scheduled tasks (sync_emails and enable_pubsub_everyday) fetch Gmail Account names with get_all(), then immediately load the full document with get_doc() in a loop. This creates unnecessary database queries when only specific fields are needed.

The sync_emails task runs every 30 minutes, and enable_pubsub_everyday runs daily. With many Gmail accounts, these N+1 query patterns add up.

Code Location

sync.py (line 12):

def sync_emails():
    gmail_accounts = frappe.get_all(
        "Gmail Account",
        filters={"gmail_enabled": 1},
        fields=["name"],  # Only fetches name
    )
    for gmail_account in gmail_accounts:
        gaccount = frappe.get_doc("Gmail Account", gmail_account.name)  # Full doc load
        if gaccount.refresh_token:
            user = gaccount.linked_user
            # ...

daily.py (line 20):

def enable_pubsub_everyday():
    # ...
    gmail_accounts = frappe.get_all(
        "Gmail Account", filters={"gmail_enabled": 1}, fields=["name"]
    )
    for gmail_account in gmail_accounts:
        gaccount = frappe.get_doc("Gmail Account", gmail_account.name)  # Full doc load
        try:
            enable_pubsub(gaccount)
        # ...

Root Cause

The pattern of:

Fetch list with minimal fields
Loop and load full document for each item

This is inefficient because:

Initial query returns only name
Each get_doc() requires a separate database round-trip
Full document loads all fields, child tables, and runs controller logic

Proposed Solution

For sync.py - Only need `refresh_token` and `linked_user`:

def sync_emails():
    gmail_accounts = frappe.get_all(
        "Gmail Account",
        filters={"gmail_enabled": 1},
        fields=["name", "refresh_token", "linked_user"],  # Fetch needed fields
    )
    for gmail_account in gmail_accounts:
        if gmail_account.refresh_token:
            user = gmail_account.linked_user
            job_name = f"gmail_thread_sync_{user}"
            if not is_job_enqueued(job_name):
                frappe.enqueue(
                    "frappe_gmail_thread.frappe_gmail_thread.doctype.gmail_thread.gmail_thread.sync",
                    user=user,
                    queue="long",
                    job_name=job_name,
                    job_id=job_name,
                )

For daily.py - `enable_pubsub` needs full doc, but can batch check first:

If enable_pubsub() truly needs the full document, at minimum filter accounts that have the necessary tokens:

def enable_pubsub_everyday():
    google_settings = frappe.get_single("Google Settings")
    if not google_settings.enable:
        return
    if (
        not google_settings.custom_gmail_sync_in_realtime
        or not google_settings.custom_gmail_pubsub_topic
    ):
        return

    # Only fetch accounts with refresh tokens (they're the only ones that can use pubsub)
    gmail_accounts = frappe.get_all(
        "Gmail Account", 
        filters={
            "gmail_enabled": 1,
            "refresh_token": ["is", "set"]  # Filter out accounts without tokens
        }, 
        fields=["name"]
    )
    for gmail_account in gmail_accounts:
        gaccount = frappe.get_doc("Gmail Account", gmail_account.name)
        try:
            enable_pubsub(gaccount)
        except Exception as e:
            frappe.log_error(title="PubSub Error", message=e)
            continue

Implementation Steps

sync.py:
- Update fields to include ["name", "refresh_token", "linked_user"]
- Remove get_doc() call
- Access fields directly from the dict returned by get_all()
daily.py:
- Add filter for refresh_token to reduce iterations
- If enable_pubsub() can be refactored to accept specific fields, do so
- Otherwise, the filter optimization still reduces unnecessary iterations
Test both scheduled tasks with multiple Gmail accounts

Additional Notes

The sync() function called by enqueue() does need the full document, but that's inside the background job
The scheduler only needs to check if tokens exist and get the user email
This pattern appears across multiple rtCamp repos (frappe-slack-connector, frappe-github-connector)</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes get_doc() in Loop for Sync and Daily Tasks #45

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- sync.py: Fetch refresh_token and linked_user fields directly, eliminating N get_doc() calls - daily.py: Filter accounts by refresh_token to reduce unnecessary document loads Co-authored-by: mrrobot47 <25586785+mrrobot47@users.noreply.github.com>

Initial plan

192a56f

Copilot AI assigned Copilot and mrrobot47 Jan 13, 2026

Copilot started work on behalf of mrrobot47 January 13, 2026 19:21 View session

Copilot AI changed the title ~~[WIP] Optimize get_doc() usage in sync and daily tasks~~ Eliminate N+1 queries in scheduled tasks Jan 13, 2026

Copilot AI requested a review from mrrobot47 January 13, 2026 19:26

Copilot finished work on behalf of mrrobot47 January 13, 2026 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate N+1 queries in scheduled tasks#51

Eliminate N+1 queries in scheduled tasks#51
Copilot wants to merge 2 commits intomainfrom
copilot/optimize-get-doc-usage

Copilot AI commented Jan 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Metadata

Problem Description

Code Location

Root Cause

Proposed Solution

For sync.py - Only need refresh_token and linked_user:

For daily.py - enable_pubsub needs full doc, but can batch check first:

Implementation Steps

Additional Notes

Comments on the Issue (you are @copilot in this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 13, 2026 •

edited

Loading

For sync.py - Only need `refresh_token` and `linked_user`:

For daily.py - `enable_pubsub` needs full doc, but can batch check first: