You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+20-79Lines changed: 20 additions & 79 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -138,7 +138,9 @@ cargo build --release
138
138
139
139
### Running Tests
140
140
141
-
Run all tests:
141
+
#### Basic Tests
142
+
143
+
Run all unit tests (fast - excludes TPC-H validation):
142
144
143
145
```bash
144
146
cargo test
@@ -150,6 +152,20 @@ Run tests with output:
150
152
cargo test -- --nocapture
151
153
```
152
154
155
+
#### TPC-H Validation Integration Tests
156
+
157
+
Run comprehensive TPC-H validation tests that compare distributed DataFusion against regular DataFusion. No prerequisites needed - the tests handle everything automatically!
158
+
159
+
```bash
160
+
# Run all TPC-H validation tests (manual - excluded from cargo test for speed)
161
+
cargo test --test tpch_validation test_tpch_validation_all_queries -- --ignored --nocapture
162
+
163
+
# Run single query test for debugging
164
+
cargo test --test tpch_validation test_tpch_validation_single_query -- --ignored --nocapture
165
+
```
166
+
167
+
**Note:** TPC-H validation tests are marked with `#[ignore]` to keep `cargo test` fast for development. Run them manually when needed for validation.
168
+
153
169
## Usage
154
170
155
171
With the code now built and ready, the next step is to set up the server and execute queries. To do that, we'll need a schema and some data—so for this example, we'll use the TPC-H schema and queries.
@@ -323,84 +339,6 @@ The system supports various configuration options through environment variables:
323
339
-`DFRAY_TABLES`: Comma-separated list of tables in format `name:format:path`
324
340
-`DFRAY_VIEWS`: Semicolon-separated list of CREATE VIEW SQL statements
325
341
326
-
## TPC-H Query Validation
327
-
328
-
To validate that your distributed cluster is working correctly, you can use the automated validation script that compares results between DataFusion CLI (single-node) and the distributed system:
329
-
330
-
```bash
331
-
# Run validation with default settings (2 workers, /tmp/tpch_s1 data)
The script will warn you if the running cluster has a different number of workers than requested, and automatically handles missing dependencies and data generation.
373
-
374
-
<!-- TODO: Merge this section into the above -->
375
-
## TPC-H Validation Tests
376
-
377
-
The project includes comprehensive TPC-H validation tests that automatically compare results between regular DataFusion and distributed DataFusion to ensure correctness. These tests are completely self-contained and handle all setup automatically:
378
-
379
-
```bash
380
-
# Run all TPC-H validation tests (fully automated)
381
-
cargo test --lib tpch_validation_tests -- --nocapture
382
-
383
-
# Run single query test for debugging
384
-
cargo test --lib test_tpch_validation_single_query -- --ignored --nocapture
385
-
```
386
-
387
-
**What the tests do automatically:**
388
-
- ✅ Kill existing processes on ports 40400-40402
389
-
- ✅ Install `tpchgen-cli` if not available
390
-
- ✅ Generate TPC-H data at `/tmp/tpch_s1` if not present
0 commit comments