Conversation
zhuqi-lucas
commented
Sep 3, 2025
- Merge upstream DF 49.0.2, and resolve confilcts
- Merge df-48-stream branch and resolve confilcts.
this is too noisy and not helpful yet, we don't have a fully implemented alternative (cherry picked from commit 0183244) Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
…ec of the correct size (apache#16995) (apache#17068) * apache#16994 Ensure CooperativeExec#maintains_input_order returns a Vec of the correct size * apache#16994 Extend default ExecutionPlan invariant checks Add checks that verify the length of the vectors returned by methods that need to return a value per child. (cherry picked from commit 2968331)
* Add ExecutionPlan::reset_state * Update datafusion/sqllogictest/test_files/cte.slt * Add reference * fmt * add to upgrade guide * add explain plan, implement in more plans * fmt * only explain --------- Co-authored-by: Robert Ream <robert@stably.io>
* Preserve equivalence properties during projection pushdown (apache#17129) * Adds parquet data diffs --------- Co-authored-by: Matthew Kim <38759997+friendlymatthew@users.noreply.github.com>
…17123) (apache#17174) * Pass the input schema to stats_projection for ProjectionExpr * Adds a test * fmt * clippy --------- Co-authored-by: Haresh Khanna <kh.haresh@gmail.com>
* fix: string_agg not respecting ORDER BY * Fix equality of parametrizable ArrayAgg function (apache#17065) The `ArrayAgg` struct is stateful, therefore it must implement `AggregateUDFImpl::equals` and `hash_value` functions. * Implement AggregateUDFImpl::equals and AggregateUDFImpl::hash_value for ArrayAgg * Implement alternative fix * Remove 'use std::any::Any' * Add sqllogictest for string_agg plan * Revert as_any to their original implementations --------- Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Update to version 49.0.1 * Add changelog for 49.0.1 * Fix sqllogictests * update configs * Update with PR * prettier * Fix slt race condition * Tweak release notes
…-features integration-tests --locked
…e#17270) Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…17274) * fix: align `array_has` null buffer for scalar (apache#17272) * fix: align `array_has` null buffer for scalar * merge
* Update versio to 49.0.2 * Add changelog * update configuration docs
…alescePartitionsExec missed fetch)
…eserving_variants_preserves_fetch
* Simplify predicates in filter * add slt test * Use BtreeMap to make tests stable * process edge coner * add doc for simplify_predicates.rs * add as_literal to make code neat * reorgnize file * reduce clone call
…BY clause (apache#16257) * Add order by clause to limit query for consistent results * test: update explain plan
Make CI green
| // We can make sure that `plan` has an ordering because | ||
| // `SortPreservingMergeExec` requires ordering to be constructed. | ||
| // If there is no ordering, `SortPreservingMergeExec::new` will panic | ||
| let ordering = plan.output_ordering().cloned().unwrap(); |
There was a problem hiding this comment.
Here to use unwrap(), cc @xudong963 @matthewmturner i am not sure if it's 100% safe, need another look, before DF 49, we have a Lex default to apply:
if fetch.is_some() {
let ordering = plan
.output_ordering()
.cloned()
.unwrap_or_else(LexOrdering::default);
let plan = Arc::new(
SortPreservingMergeExec::new(ordering, plan).with_fetch(fetch.take()),
);
optimized_distribution_ctx =
DistributionContext::new(plan, data, vec![optimized_distribution_ctx]);
}But after DF 49, we don't have LexOrdering::default now. So i believe, we can use unwrap directly because when we construct it, it should be there.
There was a problem hiding this comment.
Can we use the spm above here? Just give it a fetch
There was a problem hiding this comment.
Good point, will address it.
| 04)------AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))] | ||
| 05)--------ProjectionExec: expr=[] | ||
| 06)----------CoalesceBatchesExec: target_batch_size=8192 | ||
| <<<<<<< HEAD |
There was a problem hiding this comment.
Here we resolve the conflicts which this branch not handling before.
| SortPreservingMergeExec::new(ordering.clone(), Arc::clone(&input.plan)) | ||
| .with_fetch(fetch.take()), | ||
| SortPreservingMergeExec::new(req.clone(), Arc::clone(&input.plan)) | ||
| .with_fetch(*fetch), |
There was a problem hiding this comment.
Why remove the take here?
There was a problem hiding this comment.
Good catch, it's a mistake change, fetch take should have better performance. Addressed in latest PR.
| // We can make sure that `plan` has an ordering because | ||
| // `SortPreservingMergeExec` requires ordering to be constructed. | ||
| // If there is no ordering, `SortPreservingMergeExec::new` will panic | ||
| let ordering = plan.output_ordering().cloned().unwrap(); |
There was a problem hiding this comment.
Can we use the spm above here? Just give it a fetch
|
Thank you @xudong963 for review, addressed comments in latest PR. |