Skip to content

Comments

optimizer: Make EQProp call MSE::reduce only after a change in reduce_expr#31979

Merged
ggevay merged 2 commits intoMaterializeInc:mainfrom
ggevay:eqprop-less-reduce
Mar 21, 2025
Merged

optimizer: Make EQProp call MSE::reduce only after a change in reduce_expr#31979
ggevay merged 2 commits intoMaterializeInc:mainfrom
ggevay:eqprop-less-reduce

Conversation

@ggevay
Copy link
Contributor

@ggevay ggevay commented Mar 21, 2025

We have a CPU profile where a user running some very complex ad-hoc queries made envd quite busy with optimization. (Slack discussion.) The profile is showing that most of the time is spent in MirScalarExpr::reduce calls from EquivalencePropagation.

This PR modifies EquivalencePropagation to call reduce only if there was a recent change in the expression. Specifically, all of EQProp's calls to reduce are just after a reduce_expr, which conveniently returns a bool that indicates whether it made any changes.

There are no EXPLAIN changes across all of our slts. I've also done a run on all slts where I added a temporary assert that the reduces that we are skipping wouldn't do anything. This failed only one time across all of our slts: in joins.slt:1275. But looking at the EXPLAINs of this query on main vs. the PR, the final plan does not change even for this query.

Added a feature flag, enabled by default.

The first commit is the actual functionality, the second just adds the feature flag and wires it up.

Motivation

  • This PR speeds up the optimizer.

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ggevay ggevay requested a review from frankmcsherry March 21, 2025 14:52
@ggevay ggevay requested review from a team as code owners March 21, 2025 14:52
@ggevay ggevay requested a review from aljoscha March 21, 2025 14:52
@ggevay ggevay force-pushed the eqprop-less-reduce branch from f8baa21 to 763d310 Compare March 21, 2025 14:54
@ggevay ggevay added the A-CLUSTER Topics related to the CLUSTER layer label Mar 21, 2025
@ggevay ggevay marked this pull request as draft March 21, 2025 14:56
@ggevay ggevay force-pushed the eqprop-less-reduce branch from 763d310 to 3b3fdea Compare March 21, 2025 15:00
@ggevay ggevay marked this pull request as ready for review March 21, 2025 15:01
Copy link
Contributor

@frankmcsherry frankmcsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really tell what this does to the overall behavior, but I think MSE::reduce doesn't have specific behavior in any case (e.g. not necessarily idempotent; may or may not perform various simplifications) so it should be equally correct w/o calling it, and the behavior of the whole thing can't rely on it being called or not. E.g. no clue whether we produce the same plans but have to run EQProp more often to tickle reductions we miss for some reason.

@ggevay
Copy link
Contributor Author

ggevay commented Mar 21, 2025

My thinking is that we do call reduce in the same fixpoint loops with EQProp, as well as shortly before those fixpoint loops. So, when EQProp itself is not making any changes, then these other calls should be enough.

@ggevay ggevay merged commit 48a3645 into MaterializeInc:main Mar 21, 2025
83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-CLUSTER Topics related to the CLUSTER layer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants