-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: Rewrite joins from SQL as query-level join hints #9561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #9561 +/- ##
==========================================
+ Coverage 84.09% 84.10% +0.01%
==========================================
Files 230 230
Lines 84395 84409 +14
==========================================
+ Hits 70969 70991 +22
+ Misses 13426 13418 -8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's disallow to run "external" queries that contain join hints. Join hints should only be accepted for queries that have an "internal" origin, i.e., initiated by Cube as part of the SQL API processing. Consequently, a REST API query with join hints should always fail with a clear error message.
You can discuss this with @ovr.
CubeScan with empty members just should not manifest during rewrites: either it's AllMembers, or specific members, but never empty
CROSS JOIN does not guarantee same ordering as inputs
…an, without CrossJoin
529d615 to
6943d50
Compare
We discussed it with @paveltiunov, and decided that it’s ok to land it like this, and decide how to properly expose it in public API later |
This allows to keep join structure in some cases. Consider query like this: ``` SELECT cube2.dim, cube3.dim FROM cube1 JOIN cube2 ON cube1.__cubeJoinField = cube2.__cubeJoinField JOIN cube3 ON cube1.__cubeJoinField = cube3.__cubeJoinField GROUP BY 1, 2 ``` It references members only from `cube2` and `cube3`, but join structure actually has `cube1` as root, and it is completely missing from query. Now every join will generate separate join hint during rewrite, and then they will get to JS side as a query-level join hints. Supporting changes: * Remove unused MergedMembersReplacer * Remove unused rewrite for CrossJoin on empty CubeScan CubeScan with empty members just should not manifest during rewrites: either it's `AllMembers`, or specific members, but never empty * Simplify CrossJoin to CubeScan rewrite * Drop ordering from CubeScan during CrossJoin rewrite CROSS JOIN does not guarantee same ordering as inputs * Rewrite join on `__cubeJoinField` directly to CubeScan, without intermediate CrossJoin
Check List
Description of Changes Made (if issue reference is not provided)
This allows to keep join structure in some cases. Consider query like this:
It references members only from
cube2andcube3, but join structure actually hascube1as root, and it is completely missing from query.Now every join will generate separate join hint during rewrite, and then they will get to JS side as a query-level join hints.