Skip to content

Conversation

@stygmate
Copy link
Contributor

@stygmate stygmate commented Nov 5, 2025

Summary
This PR substantially improves the Django optimizer to better support polymorphic models, nested reverse relations, and Relay connections. It introduces a post-fetch prefetching pass that batches reverse relations after queryset evaluation and injects them into Django’s _prefetched_objects_cache. It also adds dedicated handling for InheritanceManager (django-polymorphic) and enhances Relay connection resolution to avoid extra queries (including totalCount).

Key changes

  • Add post-fetch prefetching
    • New module strawberry_django/postfetch.py and integration into default_qs_hook.
    • Defers certain reverse lookups to a postfetch phase, batching by FK and filling nested caches.
    • Supports both child-level and parent-level postfetch branches collected during optimization.
  • Polymorphism and InheritanceManager
    • Rewrite prefetch paths after select_subclasses so subclass instances get correct lookups and avoid redundant queries.
    • Collect and merge postfetch hints across interface/union selections and concrete types.
  • Relay/pagination
    • If a queryset is optimized-by-prefetching, use the specialized connection path, otherwise compute totalCount via a window Count(1) in the main query when requested, avoiding a separate count() query.
    • Materialize querysets that carry postfetch hints before connection construction to reuse caches in nested connections.
  • Field and fetch hooks
    • Connection/list resolvers can reuse an existing _prefetched_objects_cache on the source instance.
    • Single-object fields avoid .get() when a prefetched result cache is present to prevent LIMIT queries discarding prefetched data.
  • Tests and docs
    • Extensive new tests for polymorphism, Relay, and InheritanceManager, including postfetch branches and query behavior.
    • Documentation updates in docs/guide/optimizer.md.

Configuration

  • OptimizerConfig retains existing flags (enable_only, enable_select_related, enable_prefetch_related, enable_annotate, enable_nested_relations_prefetch) and adds support pathways for postfetch_prefetch and parent-branch hints carried via OptimizerStore.
  • No breaking changes to the public API; the postfetch mechanism is internal and triggered automatically by the optimizer.

Performance and behavior

  • Reduces N+1s on nested reverse relations (particularly under polymorphic models and nested connections) by batching and caching after evaluation.
  • Avoids redundant or path-mismatched prefetches when using InheritanceManager.select_subclasses by rewriting prefetch paths relative to the subclass.
  • Prevents extra count() queries for totalCount by using SQL window functions when the field is selected.

Related issues

Notes

  • The PR keeps the optimizer conservative on already-evaluated querysets and lists (no re-optimization, respects result_cache).
  • The new logic is exercised by the added test suites, especially under tests/polymorphism*, tests/polymorphism_relay*, and tests/polymorphism_inheritancemanager*.

Review note
This one’s a heavyweight and touches the optimizer’s core. It needs a reviewer who knows the library well and can take the time for a slow, careful pass @bellini666 ☕🙂. Also, if anyone wants to experiment with this version in their projects, please do—real-world mileage reports are pure gold.

…subtype-specific fields to validate query structure
…els and schema for ProjectNote and ArtProjectNote (contain missing optimisations)
…s, including edge cases for unknown relations and skipped hints
…eliable postfetch batching with polymorphic relations
…ry optimization for nested polymorphic relations
…nhance postfetch optimization for polymorphic relations
…r polymorphic relations with inline fragments and FK chaining
…ss nested relations and improve polymorphic relation handling
… subclasses with selected fields/prefetches are included, and enhance nested prefetch path resolution
…polymorphic relations with inline fragments and FK chains
…ith English equivalents for consistency and clarity in polymorphic relation tests
…fy `default_qs_hook` logic by delegating postfetch optimizations
…tests to validate optimized query handling and N+1 prevention across various polymorphic relations
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @stygmate, your pull request is larger than the review limit of 150000 diff characters

@codecov
Copy link

codecov bot commented Nov 5, 2025

Codecov Report

❌ Patch coverage is 72.60504% with 163 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.60%. Comparing base (6456a7b) to head (cc7c87d).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
strawberry_django/postfetch.py 62.36% 105 Missing ⚠️
strawberry_django/optimizer.py 76.92% 51 Missing ⚠️
strawberry_django/pagination.py 80.00% 4 Missing ⚠️
strawberry_django/fields/field.py 94.64% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #808      +/-   ##
==========================================
- Coverage   89.71%   87.60%   -2.11%     
==========================================
  Files          45       46       +1     
  Lines        4298     4882     +584     
==========================================
+ Hits         3856     4277     +421     
- Misses        442      605     +163     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@stygmate
Copy link
Contributor Author

stygmate commented Nov 6, 2025

I realized that under certain conditions, evaluating a queryset was loading all the records from the tables. I’ll look for fixes.

benjaminderei and others added 9 commits November 6, 2025 09:46
…full, first, last) to validate query optimization and ensure stable query counts
…al connection pages and remove redundant queryset evaluation
…d postfetch logic

Streamline error handling by replacing generic exceptions with specific ones where applicable, utilize `contextlib.suppress` for cleaner code, and improve import clarity for postfetch utilities.
…c and reduced code duplication

Simplify optimizer by moving nested logic into helper functions and reusing shared utilities for prefetch path extraction, rewriting, and inheritance handling. Streamline postfetch logic with a dedicated `__prefetch_child_root` function.
@stygmate
Copy link
Contributor Author

stygmate commented Nov 6, 2025

I fixed the bug I detected earlier today. I also did quite a bit of refactoring to meet Pyright and Ruff requirements and to make my changes clearer.

@patrick91 patrick91 requested a review from bellini666 November 7, 2025 09:42
@stygmate
Copy link
Contributor Author

stygmate commented Nov 7, 2025

I have code branches that aren’t covered by tests. Given the complexity of handling optimizations related to polymorphism, I worked iteratively (hence the large number of commits), and some parts of the code may have become unnecessary. That said, I’m seeing very promising results.

benjaminderei and others added 7 commits November 9, 2025 08:25
… implement tests for excessive materialization and postfetch optimization in both Relay and standard polymorphism scenarios.
…er usage

Simplify error handling with `contextlib.suppress`, replace private `_db` attribute with public `db` property, and enhance code clarity in pagination and total count resolution logic.
…up docstrings in polymorphism tests

Normalize variable naming (`N` to `n`), enhance assertions with additional checks for data presence, and refine docstring formatting for clarity and consistency across test modules.
…-database setups

Improve handling of database aliases (`db_alias`) in postfetch logic to ensure correct querying across multiple databases. Adjust related instance grouping, manager usage, and `_prefetched_objects_cache` injection logic for robustness.
…tfetch logic

Replace exception handling with conditional checks for `using` method and streamline alias assignment logic by removing redundant `contextlib.suppress` blocks in database alias determination. Improve code clarity and maintainability.
Copy link
Member

@bellini666 bellini666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some initial comments, but still need to take another look after.

Also, I hate doing this, but would it be possible to split this into multiple PRs? We have so much going on here that it is really hard to understand what is going on in all the changes

# If a projects query exists, ensure it does NOT batch across multiple company ids.
# It's acceptable that no projects query is executed if data was served from cache
# after page-level postfetch populated it.
if projects_sql:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: does this if makes sense in a test context? I mean, this should be deterministic, unless this was a pytest parametrized value

Comment on lines +65 to +84
# Collect all details.text under ArtProjectType nodes
details_texts = set()
for c_edge in companies:
company_node = c_edge.get("node") or {}
projects_conn = company_node.get("projects") or {}
for p_edge in projects_conn.get("edges", []):
node = (p_edge or {}).get("node") or {}
if node.get("__typename") != "ArtProjectType":
continue
art_notes_conn = node.get("artNotes") or {}
for n_edge in art_notes_conn.get("edges", []):
note_node = (n_edge or {}).get("node") or {}
details_conn = note_node.get("details") or {}
for d_edge in details_conn.get("edges", []):
d_node = (d_edge or {}).get("node") or {}
text = d_node.get("text")
if text:
details_texts.add(text)

assert {"d11", "d12", "d21"}.issubset(details_texts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: maybe we should just do a simpler assert result.data == {...}? Seems easier to visualize the output, and the code is less complex

# If a projects query exists, ensure it does NOT batch across multiple company ids.
# It's acceptable that no projects query is executed if data was served from cache
# after page-level postfetch populated it.
if projects_sql:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I started reviews from bottom to top. Ditto from same suggestion in the other similar test

Comment on lines +55 to +66
assert isinstance(companies, list)
assert companies
art_projects = [
p for p in companies[0]["projects"] if p["__typename"] == "ArtProjectType"
]
details_texts = {
d["text"]
for p in art_projects
for n in p.get("artNotes", [])
for d in n.get("details", [])
}
assert {"d11", "d12", "d21"}.issubset(details_texts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I commented about this below, I think just doing result.data == {...} is better

# If a projects query exists, ensure it does NOT batch across multiple company ids.
# It's acceptable that no projects query is executed if data was served from cache
# after page-level postfetch populated it.
if projects_sql:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: ditto below

Comment on lines +345 to +351
from strawberry_django.queryset import (
get_queryset_config as _get_qs_cfg,
)

cfg = _get_qs_cfg(qs)
if not getattr(cfg, "postfetch_prefetch", None):
return cache[use_cache_key]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Ditto

Also, can't we just import those at module-level? (don't remember if they would cause import loops)


# Helper to apply early SQL pagination for simple forward root connections
def _apply_early_pagination(qs: models.QuerySet):
import contextlib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: this should be imported at module level

Comment on lines +548 to +549
# Build ORDER BY expressions as Django does for window pagination
from django.db import DEFAULT_DB_ALIAS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: ditto

Comment on lines +556 to +558
# Annotate row number and total count (global partition)
from django.db.models import Count, Window
from django.db.models.functions import RowNumber
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: ditto

Comment on lines +560 to +570
qs = qs.annotate(
_strawberry_row_number=Window(
RowNumber(),
partition_by=None,
order_by=order_by_exprs,
),
_strawberry_total_count=Window(
Count(1),
partition_by=None,
),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: this is duplicating some code that we have on pagination module. Also, maybe this whole function should be there, as it isnot related to the base field itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants