Skip to content

Commit 720d77e

Browse files
committed
chore: extract comparison tool from fuzzer
1 parent f381c3d commit 720d77e

File tree

2 files changed

+22
-3
lines changed

2 files changed

+22
-3
lines changed

dev/benchmarks/tpcbench.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -91,9 +91,12 @@ def main(benchmark: str, data_path: str, query_path: str, iterations: int, outpu
9191
df.explain()
9292

9393
if write_path is not None:
94-
output_path = f"{write_path}/q{query}"
95-
df.coalesce(1).write.mode("overwrite").parquet(output_path)
96-
print(f"Query {query} results written to {output_path}")
94+
if len(df.columns) > 0:
95+
output_path = f"{write_path}/q{query}"
96+
df.coalesce(1).write.mode("overwrite").parquet(output_path)
97+
print(f"Query {query} results written to {output_path}")
98+
else:
99+
print(f"Skipping write: DataFrame has no schema for {output_path}")
97100
else:
98101
rows = df.collect()
99102
print(f"Query {query} returned {len(rows)} rows")

fuzz-testing/README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,3 +103,19 @@ $SPARK_HOME/bin/spark-submit \
103103
```
104104

105105
Note that the output filename is currently hard-coded as `results-${System.currentTimeMillis()}.md`
106+
107+
### Compare existing datasets
108+
109+
To compare a pair of existing datasets you can use a comparison tool.
110+
The example below is for TPC-H queries results generated by pure Spark and Comet
111+
112+
113+
```shell
114+
$SPARK_HOME/bin/spark-submit \
115+
--master $SPARK_MASTER \
116+
--class org.apache.comet.fuzz.ComparisonToolMain
117+
target/comet-fuzz-spark3.5_2.12-0.12.0-SNAPSHOT-jar-with-dependencies.jar \
118+
compareParquet --input-spark-folder=/tmp/tpch/spark --input-comet-folder=/tmp/tpch/comet
119+
```
120+
121+
The tool takes a pair of existing folders of the same layout and compares subfolders treating them as parquet based datasets

0 commit comments

Comments
 (0)