Skip to content

Conversation

@marc-pydantic
Copy link
Contributor

This test ensures #205 has the desired effect on distributed execution. I also snuck in a few docs and a single typo fix to make up for the ones I am adding 😬 .

It has a bit of half-life, as the fixes from apache/datafusion#18303 will mask the issue again, but for now, it is better than nothing.

Comment on lines +159 to +162
/// Test that multiple first_value() aggregations work correctly in distributed queries.
// TODO: Once https://github.com/apache/datafusion/pull/18303 is merged, this test will lose
// meaning, since the PR above will mask the underlying problem. Different queries or
// a new approach must be used in this case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just check for duplicate column names directly then? We could make a MemoryExec or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into that as a follow-up? I want to get the remaining issues with distributed activation first, this can be a side-quest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure yes we'll see what the maintainers here think

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. It's still a good test. If you could file and issue and put the number in the todo, that would help us track it :)

Copy link
Collaborator

@jayshrivastava jayshrivastava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for this


// Print them out, the error message from `assert_eq` is otherwise hard to read.
println!("{}", expected_result);
println!("{}", actual_result);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would be preferable to only do this on failure. I think it's fine if we do

if actual_result.to_string() != expected_result {
    println!("{}", expected_result);
    println!("{}", actual_result);
    panic!(...)
}

Comment on lines +159 to +162
/// Test that multiple first_value() aggregations work correctly in distributed queries.
// TODO: Once https://github.com/apache/datafusion/pull/18303 is merged, this test will lose
// meaning, since the PR above will mask the underlying problem. Different queries or
// a new approach must be used in this case.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. It's still a good test. If you could file and issue and put the number in the todo, that would help us track it :)

Copy link
Collaborator

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 makes sense. Thanks!

@gabotechs gabotechs merged commit 1bc4841 into datafusion-contrib:main Oct 29, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants