Improve optimizer for polymorphism and connections: post-fetch prefetching, InheritanceManager support, and fewer N+1s #808

stygmate · 2025-11-05T11:57:46Z

Summary
This PR substantially improves the Django optimizer to better support polymorphic models, nested reverse relations, and Relay connections. It introduces a post-fetch prefetching pass that batches reverse relations after queryset evaluation and injects them into Django’s _prefetched_objects_cache. It also adds dedicated handling for InheritanceManager (django-polymorphic) and enhances Relay connection resolution to avoid extra queries (including totalCount).

Key changes

Add post-fetch prefetching
- New module strawberry_django/postfetch.py and integration into default_qs_hook.
- Defers certain reverse lookups to a postfetch phase, batching by FK and filling nested caches.
- Supports both child-level and parent-level postfetch branches collected during optimization.
Polymorphism and InheritanceManager
- Rewrite prefetch paths after select_subclasses so subclass instances get correct lookups and avoid redundant queries.
- Collect and merge postfetch hints across interface/union selections and concrete types.
Relay/pagination
- If a queryset is optimized-by-prefetching, use the specialized connection path, otherwise compute totalCount via a window Count(1) in the main query when requested, avoiding a separate count() query.
- Materialize querysets that carry postfetch hints before connection construction to reuse caches in nested connections.
Field and fetch hooks
- Connection/list resolvers can reuse an existing _prefetched_objects_cache on the source instance.
- Single-object fields avoid .get() when a prefetched result cache is present to prevent LIMIT queries discarding prefetched data.
Tests and docs
- Extensive new tests for polymorphism, Relay, and InheritanceManager, including postfetch branches and query behavior.
- Documentation updates in docs/guide/optimizer.md.

Configuration

OptimizerConfig retains existing flags (enable_only, enable_select_related, enable_prefetch_related, enable_annotate, enable_nested_relations_prefetch) and adds support pathways for postfetch_prefetch and parent-branch hints carried via OptimizerStore.
No breaking changes to the public API; the postfetch mechanism is internal and triggered automatically by the optimizer.

Performance and behavior

Reduces N+1s on nested reverse relations (particularly under polymorphic models and nested connections) by batching and caching after evaluation.
Avoids redundant or path-mismatched prefetches when using InheritanceManager.select_subclasses by rewriting prefetch paths relative to the subclass.
Prevents extra count() queries for totalCount by using SQL window functions when the field is selected.

Related issues

Likely fixes or improves: Polymorphic interfaces break prefetch when a nested field has filters applied #593, N+1 when using a prefetch_related without explicit ordering #772, N+1 queries due to queryset.count() if cache is empty #788.
Likely improves generic N+1 patterns around connections/fragments: N+1 if query with a fragment and custom connection class (introduced in 0.59.0) #771.
Possibly affected (needs validation): Wrong connection total_count when use DISTINCT #792 (DISTINCT counts), Optimizer: annotations do not work in nested fields when select_related is used. #743/Query Optimizer Annotate not working when used in nested query #549 (nested annotations).

Notes

The PR keeps the optimizer conservative on already-evaluated querysets and lists (no re-optimization, respects result_cache).
The new logic is exercised by the added test suites, especially under tests/polymorphism*, tests/polymorphism_relay*, and tests/polymorphism_inheritancemanager*.

Review note
This one’s a heavyweight and touches the optimizer’s core. It needs a reviewer who knows the library well and can take the time for a slow, careful pass @bellini666 ☕🙂. Also, if anyone wants to experiment with this version in their projects, please do—real-world mileage reports are pure gold.

…s in optimizer

… prefetch path rewriting

…ery optimization

…alidation

…subtype-specific fields to validate query structure

…els and schema for ProjectNote and ArtProjectNote (contain missing optimisations)

…d ensure valid query paths

…ions and subtype-specific prefetching

…s, including edge cases for unknown relations and skipped hints

…orphic relations with postfetch handling

…eliable postfetch batching with polymorphic relations

… optimizations

…ance manager relay

…d connections and managers

…ry optimization for nested polymorphic relations

…nhance postfetch optimization for polymorphic relations

…r polymorphic relations with inline fragments and FK chaining

…ss nested relations and improve polymorphic relation handling

…g with new polymorphic relations

… subclasses with selected fields/prefetches are included, and enhance nested prefetch path resolution

…polymorphic relations with inline fragments and FK chains

…ith English equivalents for consistency and clarity in polymorphic relation tests

…fy `default_qs_hook` logic by delegating postfetch optimizations

…tests to validate optimized query handling and N+1 prevention across various polymorphic relations

sourcery-ai

Sorry @stygmate, your pull request is larger than the review limit of 150000 diff characters

for more information, see https://pre-commit.ci

codecov · 2025-11-05T12:00:18Z

Codecov Report

❌ Patch coverage is 72.60504% with 163 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.60%. Comparing base (6456a7b) to head (cc7c87d).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
strawberry_django/postfetch.py	62.36%	105 Missing ⚠️
strawberry_django/optimizer.py	76.92%	51 Missing ⚠️
strawberry_django/pagination.py	80.00%	4 Missing ⚠️
strawberry_django/fields/field.py	94.64%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #808      +/-   ##
==========================================
- Coverage   89.71%   87.60%   -2.11%     
==========================================
  Files          45       46       +1     
  Lines        4298     4882     +584     
==========================================
+ Hits         3856     4277     +421     
- Misses        442      605     +163

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

stygmate · 2025-11-06T07:53:37Z

I realized that under certain conditions, evaluating a queryset was loading all the records from the tables. I’ll look for fixes.

…full, first, last) to validate query optimization and ensure stable query counts

…al connection pages and remove redundant queryset evaluation

…e_optimizer

for more information, see https://pre-commit.ci

…d postfetch logic Streamline error handling by replacing generic exceptions with specific ones where applicable, utilize `contextlib.suppress` for cleaner code, and improve import clarity for postfetch utilities.

…c and reduced code duplication Simplify optimizer by moving nested logic into helper functions and reusing shared utilities for prefetch path extraction, rewriting, and inheritance handling. Streamline postfetch logic with a dedicated `__prefetch_child_root` function.

…tests and improve docstring clarity

…d upgrade Relay nodes for consistency

for more information, see https://pre-commit.ci

stygmate · 2025-11-06T15:52:43Z

I fixed the bug I detected earlier today. I also did quite a bit of refactoring to meet Pyright and Ruff requirements and to make my changes clearer.

…orphic relation tests to improve clarity and simplify code

…using backslash for multi-line statement

for more information, see https://pre-commit.ci

…tion tests for clarity and consistency

stygmate · 2025-11-07T15:23:57Z

I have code branches that aren’t covered by tests. Given the complexity of handling optimizations related to polymorphism, I worked iteratively (hence the large number of commits), and some parts of the code may have become unnecessary. That said, I’m seeing very promising results.

… implement tests for excessive materialization and postfetch optimization in both Relay and standard polymorphism scenarios.

… resolution strategies

…er usage Simplify error handling with `contextlib.suppress`, replace private `_db` attribute with public `db` property, and enhance code clarity in pagination and total count resolution logic.

…up docstrings in polymorphism tests Normalize variable naming (`N` to `n`), enhance assertions with additional checks for data presence, and refine docstring formatting for clarity and consistency across test modules.

…-database setups Improve handling of database aliases (`db_alias`) in postfetch logic to ensure correct querying across multiple databases. Adjust related instance grouping, manager usage, and `_prefetched_objects_cache` injection logic for robustness.

…tfetch logic Replace exception handling with conditional checks for `using` method and streamline alias assignment logic by removing redundant `contextlib.suppress` blocks in database alias determination. Improve code clarity and maintainability.

for more information, see https://pre-commit.ci

bellini666

Left some initial comments, but still need to take another look after.

Also, I hate doing this, but would it be possible to split this into multiple PRs? We have so much going on here that it is really hard to understand what is going on in all the changes

bellini666 · 2025-11-22T11:50:01Z

tests/polymorphism_relay/test_excessive_materialization.py

+    # If a projects query exists, ensure it does NOT batch across multiple company ids.
+    # It's acceptable that no projects query is executed if data was served from cache
+    # after page-level postfetch populated it.
+    if projects_sql:


question: does this if makes sense in a test context? I mean, this should be deterministic, unless this was a pytest parametrized value

bellini666 · 2025-11-22T11:52:27Z

tests/polymorphism_inheritancemanager_relay/test_parent_postfetch.py

+    # Collect all details.text under ArtProjectType nodes
+    details_texts = set()
+    for c_edge in companies:
+        company_node = c_edge.get("node") or {}
+        projects_conn = company_node.get("projects") or {}
+        for p_edge in projects_conn.get("edges", []):
+            node = (p_edge or {}).get("node") or {}
+            if node.get("__typename") != "ArtProjectType":
+                continue
+            art_notes_conn = node.get("artNotes") or {}
+            for n_edge in art_notes_conn.get("edges", []):
+                note_node = (n_edge or {}).get("node") or {}
+                details_conn = note_node.get("details") or {}
+                for d_edge in details_conn.get("edges", []):
+                    d_node = (d_edge or {}).get("node") or {}
+                    text = d_node.get("text")
+                    if text:
+                        details_texts.add(text)
+
+    assert {"d11", "d12", "d21"}.issubset(details_texts)


suggestion: maybe we should just do a simpler assert result.data == {...}? Seems easier to visualize the output, and the code is less complex

bellini666 · 2025-11-22T11:55:28Z

tests/polymorphism_inheritancemanager_relay/test_excessive_materialization.py

+    # If a projects query exists, ensure it does NOT batch across multiple company ids.
+    # It's acceptable that no projects query is executed if data was served from cache
+    # after page-level postfetch populated it.
+    if projects_sql:


suggestion: I started reviews from bottom to top. Ditto from same suggestion in the other similar test

bellini666 · 2025-11-22T12:23:13Z

tests/polymorphism_inheritancemanager/test_parent_postfetch.py

+    assert isinstance(companies, list)
+    assert companies
+    art_projects = [
+        p for p in companies[0]["projects"] if p["__typename"] == "ArtProjectType"
+    ]
+    details_texts = {
+        d["text"]
+        for p in art_projects
+        for n in p.get("artNotes", [])
+        for d in n.get("details", [])
+    }
+    assert {"d11", "d12", "d21"}.issubset(details_texts)


suggestion: I commented about this below, I think just doing result.data == {...} is better

bellini666 · 2025-11-22T12:23:57Z

tests/polymorphism_inheritancemanager/test_excessive_materialization.py

+    # If a projects query exists, ensure it does NOT batch across multiple company ids.
+    # It's acceptable that no projects query is executed if data was served from cache
+    # after page-level postfetch populated it.
+    if projects_sql:


issue: ditto below

bellini666 · 2025-11-22T12:39:51Z