-
Notifications
You must be signed in to change notification settings - Fork 14
Bigger TPCH tests #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bigger TPCH tests #122
Conversation
| dbg!(&err_proto); | ||
| Some(err_proto.to_datafusion_err()) | ||
| } | ||
| Ok(err_proto) => Some(err_proto.to_datafusion_err()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but saw some dbg! statement pending here.
tests/common.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the contents of this file to tpch_validation_test.rs. As the common module is not hidden behind the "integration" flag, compilations can fail as the functions used by it are do hidden behind the "integration" flag.
As this is not really common code, and it's just used in tpch_validation_test.rs, I just moved it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Thanks!
|
|
||
| // Context 1: Non-distributed execution. | ||
| let config1 = SessionConfig::new().with_target_partitions(3); | ||
| let config1 = SessionConfig::new().with_target_partitions(15); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting keeping the number odd (15) to validate test behavior?
For unit tests like with_target_partitions, I believe we already have good coverage across different partition counts. For this result validation test, though, it makes sense to use a more structured setup—like setting the number of workers to a power of 2 (which you’ve set to 8 👍), and the number of partitions to a multiple of that. So I’d suggest using 16 partitions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of workers does not matter that much here, as all of them are localhost (this test currently uses 2).
The rationale for using 15, and odd number, is to stress out the case when the last task needs to deal with an odd number of partitions (1 in this case), something that was not happening before and we probably should test anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually because these are validation tests, it is good to keep it odd to capture edge bugs. For benchmarking and performance tests, we want the settings that makes more sense
tests/common.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Thanks!
| // TODO: Consider adjusting the partitions and batch sizes. | ||
| $arrow::new($generator::new(sf, part, parts)).with_batch_size(1000), | ||
| &format!("{part}"), | ||
| &data_dir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽
.github/workflows/ci.yml
Outdated
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - uses: ./.github/actions/setup | ||
| - run: cargo test --features integration --features tpch --test tpch_validation_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mega nit: since tpch depends on integration in the Cargo.toml, I think you can remove --features integration here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! I didn't knew you could do that
170b4e6 to
898fbf0
Compare
898fbf0 to
c01c11c
Compare
Uses a bigger scale factor in the TPCH tests so that queries get more heavily distributed.
As these tests are more expensive now, the run only with the
tpchfeature, and there's a dedicated workflow in GitHub actions specific for it.