Skip to content

Commit a913dc9

Browse files
committed
Fix guide for benchmarking
1 parent 4a18ab8 commit a913dc9

File tree

1 file changed

+37
-9
lines changed

1 file changed

+37
-9
lines changed

docs/GettingStarted/Benchmarking.md

Lines changed: 37 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,25 +9,53 @@ LingoDB supports common OLAP benchmarks such as TPC-H, TPC-DS, JOB and SSB.
99
* Do *not* manually create Apache Arrow files, but instead use the `sql` command to define tables and import data. If you miss relevant metadata information (e.g., primary keys), LingoDB will not be able to apply many optimizations and performance will be suboptimal.
1010
* Use a release build of LingoDB for benchmarking. Debug builds are significantly slower.
1111

12-
## Data Generation
13-
For some benchmarks, the LingoDB repository contains scripts to generate data and load them:
12+
13+
## Manual steps
14+
### Data Generation
15+
For some benchmarks, the LingoDB repository contains scripts to generate data:
1416
```sh
15-
# LINGODB_BINARY_DIR is the directory containing at least the `sql` binary
16-
# OUTPUT_DIR is the directory where the database should be stored
17+
# DATA_DIR is the directory where the csv files should be stored
1718
# SF is the scale factor, e.g., 1 for 1GB, 10 for 10GB, etc.
1819

1920
# Generate TPC-H database
20-
bash tools/generate/tpch.sh LINGODB_BINARY_DIR OUTPUT_DIR SF
21+
bash tools/generate/tpch.sh DATA_DIR SF
2122
# Generate TPC-DS database
22-
bash tools/generate/tpcds.sh LINGODB_BINARY_DIR OUTPUT_DIR SF
23+
bash tools/generate/tpcds.sh DATA_DIR SF
2324
# Generate JOB database
24-
bash tools/generate/job.sh LINGODB_BINARY_DIR OUTPUT_DIR
25+
bash tools/generate/job.sh DATA_DIR
2526
# Generate SSB database
26-
bash tools/generate/ssb.sh LINGODB_BINARY_DIR OUTPUT_DIR SF
27+
bash tools/generate/ssb.sh DATA_DIR SF
28+
```
29+
30+
### Data Loading
31+
```sh
32+
# DB_DIR is the directory where the final database files should be stored
33+
cd DATA_DIR
34+
[LINGODB_BINARY_DIR]/sql DB_DIR < [REPO_ROOT]/resources/sql/tpch/initialize.sql # replace tpch with tpcds, job, ssb ...
2735
```
36+
37+
## Makefile target
38+
There is a convenience Makefile target for both generating and loading a set of datasets.
39+
```sh
40+
make resources/data/[NAME]-[SF]/.stamp
41+
42+
# For example
43+
44+
make resources/data/tpch-1/.stamp
45+
make resources/data/tpch-10/.stamp
46+
make resources/data/tpcds-1/.stamp
47+
make resources/data/ssb-1/.stamp
48+
49+
make resources/data/job/.stamp
50+
51+
```
52+
This will prepare the requested dataset (with scale factor) in the `resources/data/[NAME]-[SF]` directory
53+
54+
## Running LingoDB
55+
2856
Afterward, queries can be for examle run with the `sql` command that also reports execution times when the `LINGODB_SQL_REPORT_TIMES` environment variable is set:
2957
```sh
30-
LINGODB_SQL_REPORT_TIMES=1 sql OUTPUT_DIR
58+
LINGODB_SQL_REPORT_TIMES=1 sql DB_DIR
3159
sql>select count(*) from lineitem;
3260
| count |
3361
----------------------------------

0 commit comments

Comments
 (0)