-
Notifications
You must be signed in to change notification settings - Fork 14
Closed
Description
Our TPC-H scale 1 validation currently depends on Python, datafusion-cli, and the TPC-H data generator, making it unsuitable for CI. With the newly added infrastructure , we're now able to streamline this test and integrate it into CI. We should modify the test so we can run them in CI.
Other changes for this validation:
- We do not need to run
datafusion-clito get result of single node either. We can run the queries directly from SessionContext. See/useexecute_sql_single_nodefunction in the PR above. - I do not think we need a lot of data (scale 1) to validate the result either. I suspect we can generate scale 0.01 (or smaller/larger) which is large enough for the validation but small enough to check the data files in to avoid regenerating data every time running the test (in CI). We can replace tpch_small files in tpch/data/ with these files for different purpose of tests. If tpch-generate CLI cannot generate scale < 1, we can also write a script to reduce scale 1 data files but still include needed data to return meaningful results for all 22 queries (I did this with one file before and happy show how to do this)
Metadata
Metadata
Assignees
Labels
No labels