Skip to content

Fix N+1 query pattern in get_linked_gmail_threads API#47

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/fix-n-plus-one-query-issue
Draft

Fix N+1 query pattern in get_linked_gmail_threads API#47
Copilot wants to merge 3 commits intomainfrom
copilot/fix-n-plus-one-query-issue

Conversation

Copy link
Contributor

Copilot AI commented Jan 12, 2026

Fixes #43

The get_linked_gmail_threads() API exhibits a severe N+1 query pattern. For 5 threads with 10 emails each having 2 attachments, it executes 106 database queries: 1 for thread names, 5 get_doc() calls, and 100 individual get_value() calls for attachment URLs.

Changes

Batch attachment fetching

  • Replace get_attachments_data(email) with get_attachments_data_batch(emails)
  • Single query fetches all file URLs using WHERE name IN (...) instead of N queries

Batch email fetching

  • Replace get_doc("Gmail Thread") loop with single get_all("Single Email CT") query
  • Fetch only required fields instead of full document hydration

Eliminate controller load

  • Replace thread.get_url() with direct URL construction: f"/app/gmail-thread/{thread.name}"

Before/After

# Before: 106 queries
for thread in gmail_threads:
    thread = frappe.get_doc("Gmail Thread", thread.name)  # N queries
    for email in thread.emails:
        get_attachments_data(email)  # N*M queries for attachments

# After: 3 queries
gmail_threads = frappe.get_all("Gmail Thread", fields=[...])  # 1 query
all_emails = frappe.get_all("Single Email CT", filters={"parent": ["in", ...]})  # 1 query
file_urls = frappe.get_all("File", filters={"name": ["in", ...]})  # 1 query

Query reduction: 106 → 3 (~97% reduction)

Original prompt

This section details on the original issue you should resolve

<issue_title>N+1 Query Pattern in get_linked_gmail_threads API</issue_title>
<issue_description>

Metadata

  • File(s): frappe_gmail_thread/api/activity.py:28-63
  • Category: Database / API
  • Severity: High
  • Effort to Fix: Medium
  • Estimated Performance Gain: 60-80%

Problem Description

The get_linked_gmail_threads() API endpoint is called from the timeline of any document linked to Gmail threads. It exhibits a severe N+1 query pattern:

  1. Fetches all Gmail Thread names linked to a document
  2. Loads the full document for each thread with get_doc()
  3. For each email in each thread, calls get_attachments_data() which does a get_value() per attachment

For a document with 5 linked threads, each containing 10 emails with 2 attachments each, this results in:

  • 1 initial query (get thread names)
  • 5 get_doc() calls (full document loads with child tables)
  • 100 get_value() calls for attachment URLs (5 threads × 10 emails × 2 attachments)
  • Total: 106 queries for a single timeline load

Code Location

Main API endpoint (lines 18-82):

@frappe.whitelist()
def get_linked_gmail_threads(doctype, docname):
    gmail_threads = frappe.get_all(
        "Gmail Thread",
        filters={
            "reference_doctype": doctype,
            "reference_name": docname,
        },
    )
    data = []
    for thread in gmail_threads:
        thread = frappe.get_doc("Gmail Thread", thread.name)  # N+1: Full doc per thread
        for email in thread.emails:
            t_data = {
                # ... build timeline data ...
                "attachments": get_attachments_data(email),  # N+1: Query per attachment
                # ...
            }
            data.append(t_data)
    return data

Attachment lookup (lines 7-15):

def get_attachments_data(email):
    attachments_data = json.loads(email.attachments_data)
    for attachment in attachments_data:
        file_doc_name = attachment.get("file_doc_name")
        if file_doc_name:
            file_url = frappe.db.get_value("File", file_doc_name, "file_url")  # Query per attachment
            attachment["file_url"] = file_url
    return attachments_data

Root Cause

  1. Using get_all() to get names then get_doc() in a loop instead of fetching data in a single query
  2. Loading full documents when only specific fields are needed
  3. Nested loop for attachments with individual get_value() calls
  4. This API is called on every timeline view, making it a hot path

Proposed Solution

Batch fetch all data upfront and process in memory:

import json
import frappe

def get_attachments_data_batch(emails):
    """Batch fetch attachment URLs for all emails at once."""
    all_file_names = []
    email_attachments_map = {}
    
    for email in emails:
        attachments_data = json.loads(email.attachments_data or "[]")
        email_attachments_map[email.name] = attachments_data
        for att in attachments_data:
            if att.get("file_doc_name"):
                all_file_names.append(att["file_doc_name"])
    
    # Single query for all file URLs
    if all_file_names:
        file_urls = frappe.get_all(
            "File",
            filters={"name": ["in", all_file_names]},
            fields=["name", "file_url"]
        )
        file_url_map = {f.name: f.file_url for f in file_urls}
        
        # Update attachments with URLs
        for email_name, attachments in email_attachments_map.items():
            for att in attachments:
                file_name = att.get("file_doc_name")
                if file_name and file_name in file_url_map:
                    att["file_url"] = file_url_map[file_name]
    
    return email_attachments_map


@frappe.whitelist()
def get_linked_gmail_threads(doctype, docname):
    # Fetch threads with needed fields in single query
    gmail_threads = frappe.get_all(
        "Gmail Thread",
        filters={
            "reference_doctype": doctype,
            "reference_name": docname,
        },
        fields=["name", "reference_doctype", "reference_name", "_liked_by"]
    )
    
    if not gmail_threads:
        return []
    
    thread_names = [t.name for t in gmail_threads]
    thread_map = {t.name: t for t in gmail_threads}
    
    # Batch fetch all emails for all threads
    all_emails = frappe.get_all(
        "Single Email CT",  # Child table DocType
        filters={"parent": ["in", thread_names]},
        fields=[
            "name", "parent", "creation", "subject", "content", "sender",
            "sender_full_name", "cc", "bcc", "recipients", "sent_or_received",
            "read_by_recipient", "date_and_time", "attachments_data"
        ],
        order_by="creation asc"
    )
    
    # Batch fetch all attachment URLs
    attachments_map = get_attachments_data_batch(all_emails)
    
    data = []
    for email in all_emails:
        thread = th...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes rtCamp/frappe-gmail-thread#43

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

Copilot AI and others added 2 commits January 12, 2026 15:38
- Replace get_attachments_data() with get_attachments_data_batch() for batch fetching
- Fetch threads with specific fields instead of using get_doc() loop
- Batch fetch all child emails in single query
- Batch fetch all attachment URLs in single query
- Replace thread.get_url() with direct URL construction
- Reduce queries from 106 to ~3 for typical use case

Co-authored-by: mrrobot47 <25586785+mrrobot47@users.noreply.github.com>
- Address code review feedback to make comment more descriptive
- Clarify that Single Email CT is the Gmail Thread Email child table

Co-authored-by: mrrobot47 <25586785+mrrobot47@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix N+1 query pattern in get_linked_gmail_threads API Fix N+1 query pattern in get_linked_gmail_threads API Jan 12, 2026
Copilot AI requested a review from mrrobot47 January 12, 2026 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

N+1 Query Pattern in get_linked_gmail_threads API

2 participants