SNOW-1869388 add memoization to to_selectable#2815
Conversation
tests/unit/conftest.py
Outdated
| fake_session = mock.create_autospec(Session) | ||
| fake_session._cte_optimization_enabled = False | ||
| fake_session._analyzer = mock_analyzer | ||
| fake_session._lock = mock.MagicMock() |
There was a problem hiding this comment.
Can we have a more descriptive name to this lock -- similar to _plan_lock below?
There was a problem hiding this comment.
this lock is actually part of Session object and used as a general purpose lock for serialization:
sfc-gh-helmeleegy
left a comment
There was a problem hiding this comment.
Looks good, thanks!
| selectable._is_valid_for_replacement = ( | ||
| snowflake_plan._is_valid_for_replacement | ||
| ) | ||
| self._to_selectable_memo_dict[snowflake_plan._uuid] = selectable |
There was a problem hiding this comment.
we probably should remember the original logical plan id instead of the resolved plan id, because the different snowflake plan might be created for the same orignal plan node
| return self._to_selectable_memo_dict[plan._uuid] | ||
| snowflake_plan = self.resolve(plan) | ||
| selectable = SelectSnowflakePlan(snowflake_plan, analyzer=self) | ||
| selectable._is_valid_for_replacement = ( |
There was a problem hiding this comment.
in fact, it should always be true in query generator instead of propogating from the snowlfake plan
| selectable = SelectSnowflakePlan(snowflake_plan, analyzer=self) | ||
| selectable._is_valid_for_replacement = snowflake_plan._is_valid_for_replacement | ||
| return selectable | ||
| with self.session._lock: |
There was a problem hiding this comment.
why do we need a lock here, but not at other place in query_generator?
There was a problem hiding this comment.
We don't need it. query generator is not shared between threads so we should be good. That's why it is not added in other places.
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1869388
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
The process of adding WithQueryBlocks during repeated subquery elimination optimization takes place as follows:
When parents for the WithQueryBlocks are updated, if the direct parent is a Selectable, we convert the SnowflakePlan into a Selectable using to_selectable. In case the WithQueryBlock has multiple parents, which by construction will always be true, a new Selectable object is created for each parent. See example below where new selectable nodes are highlighted in gray:
When large query block is applied on this plan, and the WithQueryBlock is chosen at eligible, since all these blocks have same statistics, they will all be eligible together and possible be materialized we create redundant materializations. The correct construction for such a case should look like the plan shown below: