Skip to content

Conversation

@nicktobey
Copy link
Contributor

@nicktobey nicktobey commented Sep 9, 2025

#1470 was supposed to, among other things, add restrictions to when we generate CachedResults nodes to cache subquery results. However, the added check is overly broad, and as a result it became impossible to actually cache subquery results. Currently, there is not a single plan test that contains a CachedResults node in the output plan.

The cacheSubqueryAliasesInJoins function has a comment:

//The left-most child of a join root is an exception that cannot be cached.

No rationale is given for this. Looking at it, it seems like we used to generate CachedResults nodes before we finished resolving references in the query, and now we wait until after. So it's possible that this is the reason for the restriction, and the entire check is no longer necessary.

Either way, the implementation is more restrictive than the comment would suggest. Due to how the algorithm tracks state via function parameters, it doesn't propagate state from a child node to it's parents/siblings, and the flag for recording that it's encountered a subqeury gets unset. As a result, no Subqueries will actually be cached.

This PR fixes the tree walk to correctly remember once it's encountered a subquery and allow subsequent subqueries to be cached. This is still more broad than the comment would suggest, since it doesn't require that this subquery appear in the left-most child of the join.

Among the changed plan tests, we see that CachedResults nodes are now emitted. Most of them are the children of HashLookups, but there are some that are the children of InnerJoin and SemiJoin nodes where one of the things being joined is a subquery.

@nicktobey nicktobey changed the title Generate Cached Relax restriction that was preventing us from caching the result of subqueries. Sep 9, 2025
@nicktobey
Copy link
Contributor Author

nicktobey commented Sep 9, 2025

I experimented in another branch with removing thefoundFirstRel flag from the function entirely, but it resulted in:

  • Plans where the top level node was unnecessarily cached.
  • Test queries where the cache wasn't being disposed afterward.

So there are definitely compelling reasons to not unilaterally cache subqueries. If we have a pass that removes pointless subqueries, it may not handle these cases.

The PR as it currently stands seems objectively more correct, even if I don't fully understand the motivation of the original issue. I can ask Jason for more context when he gets back.

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't find any fault with this, LGTM

…t in the exact same plans, but requires less tracking.
@nicktobey nicktobey merged commit 5fb8788 into main Sep 9, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants