Skip to content

Commit 4b54653

Browse files
authored
docs: use tpchgen-rs for TPC-H data generation (#1390)
1 parent 7c62227 commit 4b54653

File tree

2 files changed

+22
-44
lines changed

2 files changed

+22
-44
lines changed

benchmarks/README.md

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,11 +29,30 @@ These benchmarks are derived from the [TPC-H][1] benchmark.
2929

3030
## Generating Test Data
3131

32-
TPC-H data can be generated using the `tpch-gen.sh` script, which creates a Docker image containing the TPC-DS data
33-
generator.
32+
TPC-H data can be generated using [tpchgen-rs](https://github.com/clflushopt/tpchgen-rs), a fast TPC-H data generator written in Rust.
3433

34+
### Installation
35+
36+
Install via pip:
37+
```bash
38+
pip install tpchgen-cli
39+
```
40+
41+
Or via cargo:
42+
```bash
43+
cargo install tpchgen-cli
44+
```
45+
46+
### Generating Data
47+
48+
Generate SF=1 data in Parquet format:
49+
```bash
50+
tpchgen-cli -s 1 --format parquet --output-dir data
51+
```
52+
53+
For larger scale factors (e.g., SF=10):
3554
```bash
36-
./tpch-gen.sh
55+
tpchgen-cli -s 10 --format parquet --output-dir data
3756
```
3857

3958
Data will be generated into the `data` subdirectory and will not be checked in because this directory has been added

benchmarks/tpch-gen.sh

Lines changed: 0 additions & 41 deletions
This file was deleted.

0 commit comments

Comments
 (0)