-
Notifications
You must be signed in to change notification settings - Fork 14
Add DistributedPlanError::NonDistributable rule and do not distribute SHOW COLUMNS #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
adriangb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! In theory I think we could partially distribute plans, e.g. if there's a join and only one subtree has one of these non distributable plans we can still distribute the other subtree. But I don't have a use case for that at the moment and it can just be a future improvement.
What I found challenging is to send |
| CoalesceBatchesExec: target_batch_size=8192 | ||
| FilterExec: table_name@2 = weather | ||
| RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
| StreamingTableExec: partition_sizes=1, projection=[table_catalog, table_schema, table_name, column_name, is_nullable, data_type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a non-distributed plan can still include NetworkCoalesceExec? I had assumed it was exclusive to distributed plans. Was the intended fix to ensure that plans with StreamingTableExec remain non-distributed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, you are right, there's something wrong here. Let me fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! 👍
… COLUMNS operations
8baa2e8 to
3c7e489
Compare
Right, agreed, we shouldn't even try to serialize a We could distribute this as: But I don't think this should be a priority until we have some use case where there is a non-serializable node and a subtree that is expensive / benefits from distributing. |
#191.
There are certain nodes we cannot distribute, like the ones related to table introspection. This PR adds a new controlled error in the distributed planner that prompts it to not distribute the query at all in a controlled manner, and uses it for
StreamingTableExec.