Skip to content

Conversation

@gabotechs
Copy link
Collaborator

Uses a bigger scale factor in the TPCH tests so that queries get more heavily distributed.

As these tests are more expensive now, the run only with the tpch feature, and there's a dedicated workflow in GitHub actions specific for it.

dbg!(&err_proto);
Some(err_proto.to_datafusion_err())
}
Ok(err_proto) => Some(err_proto.to_datafusion_err()),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but saw some dbg! statement pending here.

tests/common.rs Outdated
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the contents of this file to tpch_validation_test.rs. As the common module is not hidden behind the "integration" flag, compilations can fail as the functions used by it are do hidden behind the "integration" flag.

As this is not really common code, and it's just used in tpch_validation_test.rs, I just moved it there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thanks!


// Context 1: Non-distributed execution.
let config1 = SessionConfig::new().with_target_partitions(3);
let config1 = SessionConfig::new().with_target_partitions(15);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting keeping the number odd (15) to validate test behavior?

For unit tests like with_target_partitions, I believe we already have good coverage across different partition counts. For this result validation test, though, it makes sense to use a more structured setup—like setting the number of workers to a power of 2 (which you’ve set to 8 👍), and the number of partitions to a multiple of that. So I’d suggest using 16 partitions here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of workers does not matter that much here, as all of them are localhost (this test currently uses 2).

The rationale for using 15, and odd number, is to stress out the case when the last task needs to deal with an odd number of partitions (1 in this case), something that was not happening before and we probably should test anyways.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually because these are validation tests, it is good to keep it odd to capture edge bugs. For benchmarking and performance tests, we want the settings that makes more sense

tests/common.rs Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thanks!

// TODO: Consider adjusting the partitions and batch sizes.
$arrow::new($generator::new(sf, part, parts)).with_batch_size(1000),
&format!("{part}"),
&data_dir,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏽

steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/setup
- run: cargo test --features integration --features tpch --test tpch_validation_test
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mega nit: since tpch depends on integration in the Cargo.toml, I think you can remove --features integration here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! I didn't knew you could do that

Base automatically changed from gabrielmusat/improve-explain to main September 5, 2025 05:56
@gabotechs gabotechs force-pushed the gabrielmusat/bigger-tpch-tests branch from 170b4e6 to 898fbf0 Compare September 5, 2025 05:57
@gabotechs gabotechs force-pushed the gabrielmusat/bigger-tpch-tests branch from 898fbf0 to c01c11c Compare September 5, 2025 06:03
@gabotechs gabotechs merged commit ea561d2 into main Sep 5, 2025
4 checks passed
@gabotechs gabotechs deleted the gabrielmusat/bigger-tpch-tests branch September 5, 2025 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants