-
Notifications
You must be signed in to change notification settings - Fork 488
adapter: fix update sorting in apply #34714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adapter: fix update sorting in apply #34714
Conversation
The previous apply update sorting logic assumed that ID order equals dependency order for "derived items". That's not correct anymore: An altered materialized view can depend on items with greater catalog IDs. To ensure Materialize doesn't panic during startup when trying to parse such altered materialized views, this commit changes the item update sorting to use a topological sort for derived items instead.
The topological sorting in apply was using an inefficient quadratic algorithm. That was fine when we were only sorting connection items but is not fine anymore now that all most catalog items need to be sorted topologically. This commit changes the topo sort implementation to use Kahn's algorithm (https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm), which does the job in time linear in the number of items and dependency edges.
384632f to
eab89da
Compare
| sort_items_topological(&mut connections); | ||
| sort_items_topological(&mut derived_items); | ||
|
|
||
| // Other groups we can simply sort by ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this bite us again at some point? 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be! Also the grouping might become wrong in the future. As the comment at the top says, ideally we would just throw everything into a single topo search. The comment says we don't know how to find the dependencies, which I'm not quite sure what that means. But also we'd have to do parse the SQL of each object for that, which makes me hesitant to try doing more than we need here.
(Random thought: Would be nice if we didn't store the SQL but a parsed format that we don't have to re-parse all the time...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had that random thought as well, but then again you're now stuck with a format that you have to stick to or have an evolvability problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd imagine this would be encoded as part of mz-catalog-protos, for which we already have a migration framework in place to evolve that format.
|
TFTR! |
The previous apply update sorting logic assumed that ID order equals dependency order for "derived items". That's not correct anymore: An altered materialized view can depend on items with greater catalog IDs.
To ensure Materialize doesn't panic during startup when trying to parse such altered materialized views, this PR changes the item update sorting to use a topological sort for derived items instead.
Additionally, I noticed that the existing topo sort implementation was inefficient, which seemed problematic given the larger amount of items it now has to process. Thus I added a second commit that replaces it by a more efficient implementation using Kahn's algorithm.
Motivation
Fixes https://github.com/MaterializeInc/database-issues/issues/10018
Tips for reviewer
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.