-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Refactor: Update enforce_sorting tests to use insta snapshots for easier updates #17900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
let join = hash_join_exec(left_input, right_input, on, None, &JoinType::Inner)?; | ||
let physical_plan = sort_exec([sort_expr("a", &join.schema())].into(), join); | ||
|
||
let test = EnforceSortingTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see that the plans did not change by turning on whitespace blind diff: https://github.com/apache/datafusion/pull/17900/files?w=1
let expected_optimized: Vec<&str> =optimized_plan_string.trim().lines().collect(); | ||
|
||
// return a string with both input and optimized plan | ||
format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This somewhat crazy formatting is designed to make it easier to review the diff on github
9311077
to
ecfc0f8
Compare
ecfc0f8
to
1cac670
Compare
|
||
/// Runs the enforce sorting test and returns a string with the input and | ||
/// optimized plan as strings for snapshot comparison using insta | ||
fn run(&self) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this mimics / replicates the logic of assert_optimized_plan!
I was not able to remove that macro yet because there are several parameterized plans that use it that I have to migrate somehow separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed!
Since I was also working on that, I'll push a few commits to this branch to speed up the merge. Feel free to drop them if they don't fit |
// should not add a sort at the output of the union, input plan should not be changed | ||
assert_optimized!(expected_input, expected_input, physical_plan, true); | ||
Input / Optimized Plan: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if those two are the same, i think there's no need to duplicate the plans - so i assert just one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also i removed expect_no_change
, because it was duplicated and actually didn't catch some cases (see 0fd01c7)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very clever -- thank you
Input Plan: | ||
SortExec: expr=[a@2 ASC], preserve_partitioning=[false] | ||
HashJoinExec: mode=Partitioned, join_type=Inner, on=[(col_a@0, c@2)] | ||
DataSourceExec: partitions=1, partition_sizes=[0] | ||
DataSourceExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d, e], output_ordering=[a@0 ASC], file_type=parquet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is definitely just a personal preference but I find merged strings easier to follow than array representation with []
and ""
.
Before it would make sense, because it'd allow you to copy and paste the plan - but now this is managed by insta so no need use debug representation in my opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree -- I was using the debug representation to try and make review easier (to see that the plans hadn't changed)
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Split the rstest parameterized test into two separate functions: - test_with_lost_ordering_unbounded - test_with_lost_ordering_bounded Each tests both repartition_sorts modes (false and true) using insta snapshots. Replaced the incomplete existing test_with_lost_ordering_bounded which only tested one mode. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
@@ -2525,1213 +2696,1255 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Result<()> { | |||
"avg".to_string(), | |||
function_arg_unordered, | |||
); | |||
struct TestCase<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this test I asked claude to write a script that confirms before and after are the same:
https://gist.github.com/blaginin/606a2b6d9ba48627d0bdf5b9aa6d1c57
i manually changed something in "after" and it have caught it. also i randomly verified some cases
.with_expected_description( | ||
"// Should adjust the requirement in the third input of the union so\n\ | ||
// that it is not unnecessarily fine.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to ignore, but I think it's not the best idea to mix comments and snapshots - it'll now be repeated twice in the code, making it messier, and it won't be clear for newcomers which one to update
I personally don’t see an issue with just adding the comment after the assert. I did that separately in: blaginin@ecdf6c3 in case you’ll like it🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it! Applied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!! wrote some comments, and you may need to check the changes i pushed here 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @blaginin -- I think this test is now much easier to deal with ❤️
// should not add a sort at the output of the union, input plan should not be changed | ||
assert_optimized!(expected_input, expected_input, physical_plan, true); | ||
Input / Optimized Plan: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very clever -- thank you
.with_expected_description( | ||
"// Should adjust the requirement in the third input of the union so\n\ | ||
// that it is not unnecessarily fine.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it! Applied
@@ -2480,6 +2547,73 @@ async fn test_window_partial_constant_and_set_monotonicity() -> Result<()> { | |||
.into(); | |||
let source = parquet_exec_with_sort(input_schema.clone(), vec![ordering]) as _; | |||
|
|||
// Macro for testing window function optimization with snapshots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thank you very much for helping to push this along @blaginin 🙏 |
Which issue does this PR close?
core
tests toinsta
#15791Rationale for this change
While working on the arrow 57 upgrade,
I had to change quite few tests in enforce_sorting and the fact they are
not yet ported to insta means it was very painful to do.
What changes are included in this PR?
rewrite
enforce_sorting
so it uses insta for plan snapshots rather than hard coded vec<&str>Also avoid the use of a macro for easier maintenance (and hopefully faster compile time)
Are these changes tested?
Yes, by existing tests
Are there any user-facing changes?