Skip to content

Conversation

@NGA-TRAN
Copy link
Collaborator

@NGA-TRAN NGA-TRAN commented Jul 1, 2025

#11

The TPC-H result validation tests are included in the integration test suite but marked with #[ignore] to prevent them from running during routine cargo test. Once we have a CI pipeline in place, we should make sure these tests are executed as part of it.

q16 is excluded due to this DF bug

Run command:

cargo test --test tpch_validation test_tpch_validation_all_queries -- --ignored --nocapture

Output:

running 1 test
🎯 Starting comprehensive TPC-H validation test
🧹 Cleaning up existing processes on test ports...
🔧 Checking for tpchgen-cli...
✅ tpchgen-cli already installed
🔧 Checking for TPC-H data at /tmp/tpch_s1...
✅ TPC-H data already exists
🔧 Checking Python Flight SQL packages...
✅ Python packages already installed
🚀 Starting distributed DataFusion cluster...
✅ Using existing distributed-datafusion binary
  Starting worker 1 on port 40401...
  Starting worker 2 on port 40402...
  Starting worker 3 on port 40403...
  Starting proxy on port 40400...
✅ Cluster started successfully
⏳ Waiting for cluster to be ready...
✅ Cluster is ready after 1 attempts!
📋 Found 21 TPC-H queries to validate (excluding q16 due to known issues)
🔍 [1/21] Testing q1... ✅ PASS (4/4 rows, DF: 8.47s, Dist: 3.54s)
🔍 [2/21] Testing q10... ✅ PASS (20/20 rows, DF: 3.85s, Dist: 1.98s)
🔍 [3/21] Testing q11... ✅ PASS (665/665 rows, DF: 0.83s, Dist: 0.73s)
🔍 [4/21] Testing q12... ✅ PASS (2/2 rows, DF: 3.02s, Dist: 1.63s)
🔍 [5/21] Testing q13... ✅ PASS (42/42 rows, DF: 3.05s, Dist: 1.29s)
🔍 [6/21] Testing q14... ✅ PASS (1/1 rows, DF: 2.43s, Dist: 1.31s)
🔍 [7/21] Testing q15... ✅ PASS (1/1 rows, DF: 4.15s, Dist: 1.95s)
🔍 [8/21] Testing q17... ✅ PASS (1/1 rows, DF: 4.76s, Dist: 2.71s)
🔍 [9/21] Testing q18... ✅ PASS (9/9 rows, DF: 6.98s, Dist: 3.01s)
🔍 [10/21] Testing q19... test test_tpch_validation_all_queries has been running for over 60 seconds
✅ PASS (1/1 rows, DF: 4.42s, Dist: 2.07s)
🔍 [11/21] Testing q2... ✅ PASS (100/100 rows, DF: 0.96s, Dist: 1.17s)
🔍 [12/21] Testing q20... ✅ PASS (162/162 rows, DF: 2.94s, Dist: 1.57s)
🔍 [13/21] Testing q21... ✅ PASS (100/100 rows, DF: 7.55s, Dist: 2.89s)
🔍 [14/21] Testing q22... ✅ PASS (7/7 rows, DF: 0.89s, Dist: 0.69s)
🔍 [15/21] Testing q3... ✅ PASS (10/10 rows, DF: 2.98s, Dist: 1.50s)
🔍 [16/21] Testing q4... ✅ PASS (5/5 rows, DF: 2.30s, Dist: 1.11s)
🔍 [17/21] Testing q5... ✅ PASS (5/5 rows, DF: 3.68s, Dist: 1.64s)
🔍 [18/21] Testing q6... ✅ PASS (1/1 rows, DF: 2.17s, Dist: 1.20s)
🔍 [19/21] Testing q7... ✅ PASS (4/4 rows, DF: 4.46s, Dist: 2.17s)
🔍 [20/21] Testing q8... ✅ PASS (2/2 rows, DF: 3.93s, Dist: 1.92s)
🔍 [21/21] Testing q9... ✅ PASS (175/175 rows, DF: 5.18s, Dist: 2.44s)

📊 TPC-H Validation Summary:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Passed: 21 / 21 (100.0%)
❌ Failed: 0
⏱️  Total time: 117.56s
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🎉 All TPC-H validation tests passed successfully!
🧹 Cleaning up cluster processes...
✅ Cluster cleanup complete
test test_tpch_validation_all_queries ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 1 filtered out; finished in 121.50s

Notes:

Just a heads-up—don't read too much into the runtimes here. I’m seeing similar execution times regardless of the number of workers (1, 2, or 3), which suggests there might be overhead in how datafusion-cli handles query execution and result processing. These tests aren’t aimed at benchmarking runtime—they’re primarily for functional validation. I’ll be digging into the performance angle separately.

@NGA-TRAN NGA-TRAN requested a review from robtandy July 1, 2025 17:09
@NGA-TRAN NGA-TRAN marked this pull request as draft July 1, 2025 21:23
@NGA-TRAN NGA-TRAN changed the title Add result validation script for all TPC-H queries at SF=1 Validate TPC-H query results using integration tests Jul 8, 2025
@NGA-TRAN NGA-TRAN marked this pull request as ready for review July 8, 2025 03:19
@NGA-TRAN NGA-TRAN requested review from LiaCastaneda and gabotechs and removed request for LiaCastaneda, gabotechs and robtandy July 8, 2025 03:19
@robtandy
Copy link
Collaborator

robtandy commented Jul 8, 2025

Thank you @NGA-TRAN !

@@ -0,0 +1,1224 @@
use std::fs;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking ahead, once we add more integration tests, I expect we’ll move functions primarily used for TPC-H testing into a dedicated file or module to keep things organized.

@NGA-TRAN NGA-TRAN merged commit 49f0653 into main Jul 8, 2025
3 checks passed
@LiaCastaneda
Copy link
Collaborator

LiaCastaneda commented Jul 8, 2025

should we include ignored tests on ci.yml?

@NGA-TRAN
Copy link
Collaborator Author

NGA-TRAN commented Jul 8, 2025

should we include ignored tests on ci.yml?

There are a few ignored tests. Can we only include one specific one? I would like to include this test:

cargo test --test tpch_validation test_tpch_validation_all_queries -- --ignored --nocapture

@gabotechs gabotechs deleted the ntran/validate_correctness branch August 4, 2025 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants