Skip to content

Commit ce5218b

Browse files
add tpc-ds tests and property-based testing utilities (#231)
* add tpc-ds tests and property-based testing utilities This change introduces a new `property_based.rs` test utility which lets us evaluate correctness using properties. These are useful for evaluating correctness when we do not know the expected output of a test (ex. if we were to fuzz the database with randomized data or randomzed queries, then we can only verify the output using properties). The two oracles are - ResultSetOracle: Compares the result set between single node and distributed datafusion - OrderingOracle: Uses plan properties to figure out the expected ordering and asserts it This change does not introduce a fuzz test, but it introduces a TPC-DS test. This test randomly generates data using the duckdb CLI and runs 99 queries on a distributed cluster. The query outputs are validated against single-node datafusion using test utils in `metamorphic.rs`. This test also randomizes the test cluster parameters - there's no harm in doing so. Next steps: - Add fuzzing - Now that we have property-based testing utils, we can properly fuzz the project using SQLancer - SQLancer produces INSERT and SELECT statements which we could point at a datafusion distributed cluster and verify against single node datafusion - Although it doesn't support nested select statements, 70% of the queries were valid datafusion queries, meaning these are good test cases for us - Add metrics oracle to validate output_rows metric / metrics propagation * add tpcds-randomized-test to ci This commit adds tpcds-randomized-test to ci. It relies on the duckdb cli for tpc-ds database generation. It also saves the artifacts if the test fails so we can reproduce issues.
1 parent f6dfaa6 commit ce5218b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

110 files changed

+7330
-387
lines changed

.github/actions/setup/action.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,4 @@ runs:
2929
- uses: Swatinem/rust-cache@v2
3030
with:
3131
key: ${{ runner.os }}-${{ inputs.targets }}-rust-cache
32+
prefix-key: v1-rust

.github/workflows/ci.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ on:
99
env:
1010
CARGO_TERM_COLOR: always
1111
RUST_BACKTRACE: 1
12+
TPCDS_SCALE_FACTOR: 0.5 # 0.5 scale factor produces a data dir which is 124MB
1213

1314
concurrency:
1415
group: ${{ github.ref }}
@@ -40,6 +41,23 @@ jobs:
4041
- uses: ./.github/actions/setup
4142
- run: cargo test --features tpch --test 'tpch_*'
4243

44+
tpcds-test:
45+
runs-on: ubuntu-latest
46+
steps:
47+
- uses: actions/checkout@v4
48+
with:
49+
lfs: true
50+
- uses: ./.github/actions/setup
51+
- name: Install DuckDB CLI
52+
run: |
53+
curl https://install.duckdb.org | sh
54+
mkdir -p $HOME/.local/bin
55+
mv /home/runner/.duckdb/cli/latest/duckdb $HOME/.local/bin/
56+
echo "$HOME/.local/bin" >> $GITHUB_PATH
57+
- name: Run TPC-DS test
58+
id: test
59+
run: cargo test --features tpcds --test tpcds_test
60+
4361
format-check:
4462
runs-on: ubuntu-latest
4563
steps:

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@
33
/benchmarks/data/
44
testdata/tpch/*
55
!testdata/tpch/queries
6+
testdata/tpcds/data/

0 commit comments

Comments
 (0)