-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Feat : added truncate table support #19633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@ethan-tyler, please review this PR. Thank you! |
ethan-tyler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding TRUNCATE @Nachiket-Roy! This threads through WriteOp, TableProvider, physical planner, and protobuf correctly. Looking nicely done.
Couple things I noticed:
Schema: DmlStatement is constructed with an empty schema instead of using DmlStatement::new(), which sets {count: UInt64}. Breaks the DML contract and causes a proto roundtrip mismatch.
Silent option ignoring: Options like CASCADE, RESTART IDENTITY, PARTITION are dropped silently via ... TRUNCATE TABLE t CASCADE would succeed without actually cascading. Worth rejecting explicitly.
Once those are in, adding TRUNCATE to roundtrip_logical_plan_dml and some negative tests for unsupported options would round it out.
datafusion/sql/src/statement.rs
Outdated
| Ok(LogicalPlan::Dml(DmlStatement { | ||
| table_name: table.clone(), | ||
| target: source, | ||
| op: WriteOp::Truncate, | ||
| input: Arc::new(LogicalPlan::EmptyRelation(EmptyRelation { | ||
| produce_one_row: false, | ||
| schema: DFSchemaRef::new(DFSchema::empty()), | ||
| })), | ||
| output_schema: DFSchemaRef::new(DFSchema::empty()), | ||
| })) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use DmlStatement::new() rather than constructing the struct directly. The constructor sets output_schema: make_count_schema() which gives you the {count: UInt64} that DML ops return.
Also noticed this causes a proto roundtrip mismatch: encode uses the struct directly but decode uses the constructor (at mod.rs:978), so you'd get different plans before/after serialization.
Something like:
| Ok(LogicalPlan::Dml(DmlStatement { | |
| table_name: table.clone(), | |
| target: source, | |
| op: WriteOp::Truncate, | |
| input: Arc::new(LogicalPlan::EmptyRelation(EmptyRelation { | |
| produce_one_row: false, | |
| schema: DFSchemaRef::new(DFSchema::empty()), | |
| })), | |
| output_schema: DFSchemaRef::new(DFSchema::empty()), | |
| })) | |
| } | |
| Ok(LogicalPlan::Dml(DmlStatement::new( | |
| table.clone(), | |
| source, | |
| WriteOp::Truncate, | |
| Arc::new(LogicalPlan::EmptyRelation(EmptyRelation { | |
| produce_one_row: false, | |
| schema: DFSchemaRef::new(DFSchema::empty()), | |
| })), | |
| ))) |
datafusion/sql/src/statement.rs
Outdated
| Statement::Truncate { table_names, .. } => { | ||
| if table_names.len() != 1 { | ||
| return not_impl_err!( | ||
| "TRUNCATE with multiple tables is not supported" | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .. here silently drops partitions, identity, cascade, on_cluster, and target.only. If someone writes TRUNCATE TABLE t CASCADE expecting cascade behavior, they'd get a normal truncate with no indication the option was ignored.
Worth rejecting these explicitly? Something like:
| Statement::Truncate { table_names, .. } => { | |
| if table_names.len() != 1 { | |
| return not_impl_err!( | |
| "TRUNCATE with multiple tables is not supported" | |
| ); | |
| } | |
| Statement::Truncate { table_names, partitions, identity, cascade, on_cluster, .. } => { | |
| if table_names.len() != 1 { | |
| return not_impl_err!("TRUNCATE with multiple tables is not supported"); | |
| } | |
| let target = &table_names[0]; | |
| if target.only { | |
| return not_impl_err!("TRUNCATE with ONLY is not supported"); | |
| } | |
| if partitions.is_some() { | |
| return not_impl_err!("TRUNCATE with PARTITION is not supported"); | |
| } | |
| if identity.is_some() { | |
| return not_impl_err!("TRUNCATE with RESTART/CONTINUE IDENTITY is not supported"); | |
| } | |
| if cascade.is_some() { | |
| return not_impl_err!("TRUNCATE with CASCADE/RESTRICT is not supported"); | |
| } | |
| if on_cluster.is_some() { | |
| return not_impl_err!("TRUNCATE with ON CLUSTER is not supported"); | |
| } | |
| // ... rest | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add the option rejection, some negative tests would round it out:
statement error TRUNCATE with CASCADE/RESTRICT is not supported
TRUNCATE TABLE t1 CASCADE;
statement error TRUNCATE with multiple tables is not supported
TRUNCATE TABLE t1, t2;
datafusion/catalog/src/table.rs
Outdated
| /// Returns an [`ExecutionPlan`] producing a single row with `count` (UInt64), | ||
| /// representing the number of rows removed. | ||
| async fn truncate(&self, _state: &dyn Session) -> Result<Arc<dyn ExecutionPlan>> { | ||
| not_impl_err!("TRUNCATE not supported for {}", self.table_type()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: DELETE/UPDATE messages say "... for {} table" but this one drops "table"
|
@ethan-tyler Thanks for the thorough review. I've addressed all your comments, Please take another look. |
| WriteOp::Truncate, | ||
| Arc::new(LogicalPlan::EmptyRelation(EmptyRelation { | ||
| produce_one_row: false, | ||
| schema: DFSchemaRef::new(DFSchema::empty()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// The relation that determines the tuples to add/remove/modify the schema must match with table_schema |
the schema must match with table_schema. Is this important here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike DELETE or UPDATE, TRUNCATE is a table-level operation with no filters, assignments. The input relation is only a placeholder to satisfy the DML plan shape and is never inspected or executed for row data. Because no rows or columns are read, the usual “input schema must match table schema” invariant does not apply here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified this and you're right that TRUNCATE doesn't use the input.
Physical planner shows:
- DELETE/UPDATE: extract filters/assignments from input
- TRUNCATE: ignores input entirely, just calls provider.truncate()
So functionally the empty schema is fine. The doc comment is aspirational. Could either keep as-is and update the DmlStatement docs to note TRUNCATE is special, or use the table schema for consistency (extra code, no functional benefit). Former seems fine.
ethan-tyler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the follow-up @Nachiket-Roy DmlStatement::new() fix and option rejection look good.
Few things on closer look:
- Test contract: CaptureTruncateProvider returns EmptyExec (zero rows), it checks the hook was called but doesn't validate the "returns one row with count" contract.
- Proto roundtrip: Adding TRUNCATE to roundtrip_logical_plan_dml would catch regressions on the proto mapping.
- Negative tests: CASCADE and multi-table are covered, PARTITION, ONLY, and IDENTITY are also rejected but not tested.
| async fn truncate(&self, _state: &dyn Session) -> Result<Arc<dyn ExecutionPlan>> { | ||
| *self.truncate_called.lock().unwrap() = true; | ||
|
|
||
| Ok(Arc::new(EmptyExec::new(Arc::new(Schema::new(vec![ | ||
| Field::new("count", DataType::UInt64, false), | ||
| ]))))) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableProvider::truncate docs say it returns an ExecutionPlan producing {count: UInt64}. This test returns EmptyExec which produces zero rows, it validates the hook was called, but not the contract.
Could tighten this to return an actual row and assert on it:
| async fn truncate(&self, _state: &dyn Session) -> Result<Arc<dyn ExecutionPlan>> { | |
| *self.truncate_called.lock().unwrap() = true; | |
| Ok(Arc::new(EmptyExec::new(Arc::new(Schema::new(vec![ | |
| Field::new("count", DataType::UInt64, false), | |
| ]))))) | |
| } | |
| } | |
| // Add these imports: | |
| use arrow::array::UInt64Array; | |
| use arrow::record_batch::RecordBatch; | |
| use datafusion_physical_plan::test::TestMemoryExec; | |
| // Then update the truncate impl: | |
| async fn truncate(&self, _state: &dyn Session) -> Result<Arc<dyn ExecutionPlan>> { | |
| *self.truncate_called.lock().unwrap() = true; | |
| let schema = Arc::new(Schema::new(vec![ | |
| Field::new("count", DataType::UInt64, false), | |
| ])); | |
| let batch = RecordBatch::try_new( | |
| Arc::clone(&schema), | |
| vec![Arc::new(UInt64Array::from(vec![0u64]))], | |
| )?; | |
| Ok(Arc::new(TestMemoryExec::try_new(&[vec![batch]], schema, None)?)) | |
| } |
| WriteOp::Delete => protobuf::dml_node::Type::Delete, | ||
| WriteOp::Update => protobuf::dml_node::Type::Update, | ||
| WriteOp::Ctas => protobuf::dml_node::Type::Ctas, | ||
| WriteOp::Truncate => protobuf::dml_node::Type::Truncate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You added the proto enum and conversions and worth adding a roundtrip test to lock it in.
In datafusion/proto/tests/cases/roundtrip_logical_plan.rs, the roundtrip_logical_plan_dml test covers INSERT, DELETE, UPDATE, CTAS.
Adding TRUNCATE there would catch any future regressions:
// In the queries array:
"TRUNCATE TABLE test_table",
|
|
||
| # Test TRUNCATE with multiple tables | ||
| statement error TRUNCATE with multiple tables is not supported | ||
| TRUNCATE TABLE t1, t2; No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have negative tests for CASCADE and multi-table. The planner also rejects PARTITION, ONLY, IDENTITY, and ON CLUSTER.
Might be worth covering those too:
statement error TRUNCATE with PARTITION is not supported
TRUNCATE TABLE t1 PARTITION (p1);
statement error TRUNCATE with ONLY is not supported
TRUNCATE ONLY t1;
statement error TRUNCATE with RESTART/CONTINUE IDENTITY is not supported
TRUNCATE TABLE t1 RESTART IDENTITY;
|
@ethan-tyler Thanks again for the review! I’ve tightened the CaptureTruncateProvider test to return a single {count: UInt64} row, added negative tests covering PARTITION, ONLY, and IDENTITY (in addition to CASCADE and multi-table), and added TRUNCATE to roundtrip_logical_plan_dml to lock in the proto roundtrip behavior. |
|
@ethan-tyler I tried tightening the test using TestMemoryExec::with_new_children not implementedThis happens because physical optimizer rules run and require |
@Nachiket-Roy My bad - should have validated that suggestion first. TestMemoryExec doesn't implement with_new_children, so it panics during filter pushdown. Better to stick with EmptyExec: This matches what CaptureDeleteProvider and CaptureUpdateProvider already do. These tests just verify the hook gets called and validating provider output is the implementor's responsibility. The "proper" way to return an actual count is DmlResultExec, but it's private to the memory table module. Making it public would be a separate discussion. Sorry for the detour, the original implementation was fine. |
ethan-tyler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - approving, thanks for the contribution!
Which issue does this PR close?
Rationale for this change
DataFusion recently added TableProvider hooks for row-level DML operations such as DELETE and UPDATE, but TRUNCATE TABLE was still unsupported.
What changes are included in this PR?
This PR adds planning and integration support for TRUNCATE TABLE in DataFusion, completing another part of the DML surface alongside existing DELETE and UPDATE support.
Specifically, it includes:
The implementation follows the same structure and conventions as the existing DELETE and UPDATE DML support.
Execution semantics are delegated to individual TableProvider implementations via the new hook.
Are these changes tested?
Yes. The PR includes:
SQL logic tests that verify:
These tests mirror the existing testing strategy used for unsupported DELETE and UPDATE operations.
Are there any user-facing changes?
Yes. Users can now execute TRUNCATE TABLE statements in DataFusion for tables whose TableProvider supports the new truncate() hook. Tables that do not support TRUNCATE will return a clear NotImplemented error.