-
Notifications
You must be signed in to change notification settings - Fork 14
Support for substrait plans #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3d380ae to
5808d24
Compare
src/flight.rs
Outdated
| request: Request<FlightDescriptor>, | ||
| ) -> Result<Response<FlightInfo>, Status>; | ||
|
|
||
| async fn get_flight_info_substrait( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also name this get_flight_info_substrait_plan got it to match with the name in FlightSqlService?
| async fn do_get_statement( | ||
| &self, | ||
| ticket: arrow_flight::sql::TicketStatementQuery, | ||
| ticket: TicketStatementQuery, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| schema, | ||
| explain_data: None, | ||
| }) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactor
| let error_msg = format!("{:?}", result.unwrap_err()); | ||
| assert!(error_msg.contains("worker") || error_msg.contains("address")); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test
NGA-TRAN
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is very nice, Lia.
Have you considered supporting EXPLAIN for Substrait input as well? It could return the logical, physical, and distributed plans (including execution stages) without actually running them. Might be a great follow-up PR if you're aiming for transparency and introspection tooling around Substrait ingestion.
I'll give it a try 👍 |
|
Okay, I was going to do it in this PR, but it ended up being a bigger refactor than expected. I'll leave it for a follow-up PR. I'll leave some work on this branch |
This PR adds support to input substrait plans, the substrait plan
select_onecorresponding toselect 1 as test_coltranslates to the distributed plan:Even if the distributed plan is built, most substrait queries will fail when the plan is assigned to the workers, because distributed-datafusion doesn’t support serializing DataSourceExec physical nodes. (see here).
For a standard sql string query that executes successfully, the distributed plan for the same query would be:
I had trouble finding a Substrait plan in which triggred a physical plan with
PlaceholderRowExecinstead ofDataSourceExec. Because of the way the consumer handles a virtual table, it’s quite difficult to force the physical plan to include PlaceholderRowExec rather than DataSourceExec. I think we can account for this issue in a separate PR, for now I'm not sure if it should be a fix in Datafusion or in datafusion-distributed.