Skip to content

Commit 9a6ce8e

Browse files
hekaishengxuye.qinwjsi
authored
Add TPC-H benchmarks (#2937)
Co-authored-by: xuye.qin <[email protected]> Co-authored-by: Wenjun Si <[email protected]>
1 parent fd034fb commit 9a6ce8e

24 files changed

+1666
-33
lines changed

.codacy.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,4 @@ exclude_paths:
1414
- 'versioneer.py'
1515
- '*.min.js'
1616
- '**/tests/**'
17-
- 'asv_bench/**'
17+
- 'benchmarks/**'

.codecov.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,3 +47,5 @@ ignore:
4747
- "mars/deploy/kubedl"
4848
# proxima related things
4949
- "mars/learn/proxima/**"
50+
# benchmarks
51+
- "benchmarks/**"

.github/workflows/benchmark-ci.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ jobs:
4848
- name: Run ASV benchmarks
4949
run: |
5050
source ./ci/reload-env.sh
51-
cd asv_bench
51+
cd benchmarks/asv_bench
5252
asv check -E existing
5353
git remote add upstream https://github.com/mars-project/mars.git
5454
git fetch upstream
@@ -61,5 +61,5 @@ jobs:
6161
uses: actions/upload-artifact@v2
6262
with:
6363
name: Benchmarks log
64-
path: asv_bench/benchmarks.log
64+
path: benchmarks/asv_bench/benchmarks.log
6565
if: failure()

.github/workflows/platform-ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ jobs:
9292
fi
9393
if [ -n "$WITH_RAY" ]; then
9494
pip install ray[default]==1.9.2
95-
pip install xgboost_ray==0.1.5
95+
pip install "xgboost_ray==0.1.5" "xgboost<1.6.0"
9696
fi
9797
if [ -n "$RUN_DASK" ]; then
9898
pip install dask[complete] mimesis sklearn

.gitignore

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,6 @@ docs/source/savefig/
9696

9797
# Unit / Performance Testing #
9898
##############################
99-
asv_bench/env/
100-
asv_bench/html/
101-
asv_bench/results/
99+
benchmarks/asv_bench/env/
100+
benchmarks/asv_bench/html/
101+
benchmarks/asv_bench/results/

CODEOWNERS

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,34 @@
11
# Each line is a component followed by one or more owners.
22

33
* @qinxuye @wjsi @hekaisheng
4-
/.github @wjsi @qinxuye
5-
/bin @wjsi @qinxuye
6-
/ci @wjsi @qinxuye
4+
/.github @wjsi @qinxuye @hekaisheng
5+
/bin @wjsi @qinxuye @hekaisheng
6+
/ci @wjsi @qinxuye @hekaisheng
77
/docs @qinxuye @hekaisheng @wjsi
88
/mars/ @qinxuye @hekaisheng @wjsi
99
/mars/core @qinxuye @hekaisheng @wjsi
1010
/mars/dataframe @hekaisheng @qinxuye @wjsi
1111
/mars/deploy @wjsi @hekaisheng @qinxuye
12-
/mars/deploy/oscar/ray* @chaokunyang @fyrestone @qinxuye
12+
/mars/deploy/oscar/ray* @chaokunyang @fyrestone @qinxuye @hekaisheng @wjsi
1313
/mars/learn @qinxuye @hekaisheng @wjsi
14-
/mars/lib @wjsi @qinxuye
15-
/mars/optimization @hekaisheng @qinxuye
16-
/mars/oscar @wjsi @qinxuye
17-
/mars/oscar/backends/ray @chaokunyang @fyrestone @qinxuye
14+
/mars/lib @wjsi @qinxuye @hekaisheng
15+
/mars/optimization @hekaisheng @qinxuye @wjsi
16+
/mars/oscar @wjsi @qinxuye @hekaisheng
17+
/mars/oscar/backends/ray @chaokunyang @fyrestone @qinxuye @hekaisheng @wjsi
1818
/mars/remote @qinxuye @wjsi @hekaisheng
19-
/mars/serialization @qinxuye @wjsi
20-
/mars/services/ @wjsi @qinxuye
21-
/mars/services/cluster @wjsi @qinxuye
22-
/mars/services/lifecycle @wjsi @qinxuye
23-
/mars/services/meta @wjsi @qinxuye
24-
/mars/services/scheduling @wjsi @qinxuye
25-
/mars/services/session @wjsi @qinxuye
26-
/mars/services/storage @hekaisheng @qinxuye
27-
/mars/services/subtask @wjsi @qinxuye
28-
/mars/services/task @wjsi @qinxuye
29-
/mars/services/web @wjsi @qinxuye
30-
/mars/storage @hekaisheng @qinxuye
31-
/mars/storage/vineyard.py @sighingnow @acezen @qinxuye
32-
/mars/storage/ray.py @chaokunyang @fyrestone @qinxuye
19+
/mars/serialization @qinxuye @wjsi @hekaisheng
20+
/mars/services/ @wjsi @qinxuye @hekaisheng
21+
/mars/services/cluster @wjsi @qinxuye @hekaisheng
22+
/mars/services/lifecycle @wjsi @qinxuye @hekaisheng
23+
/mars/services/meta @wjsi @qinxuye @hekaisheng
24+
/mars/services/scheduling @wjsi @qinxuye @hekaisheng
25+
/mars/services/session @wjsi @qinxuye @hekaisheng
26+
/mars/services/storage @hekaisheng @qinxuye @wjsi
27+
/mars/services/subtask @wjsi @qinxuye @hekaisheng
28+
/mars/services/task @wjsi @qinxuye @hekaisheng
29+
/mars/services/web @wjsi @qinxuye @hekaisheng
30+
/mars/storage @hekaisheng @qinxuye @wjsi
31+
/mars/storage/vineyard.py @sighingnow @qinxuye @hekaisheng @wjsi
32+
/mars/storage/ray.py @chaokunyang @fyrestone @qinxuye @hekaisheng @wjsi
3333
/mars/tensor @hekaisheng @qinxuye @wjsi
3434
/mars/tests @wjsi @hekaisheng @qinxuye

azure-pipelines.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ jobs:
148148
149149
# special check for __init__.py
150150
grep -A 10000 '\[flake8\]' setup.cfg | awk '!/(F401|F811|__init__\.py)/' > flake8_init.ini
151-
flake8 --config=flake8_init.ini
151+
flake8 mars --config=flake8_init.ini
152152
153153
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
154154
flake8 mars --config="default" --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
@@ -158,7 +158,7 @@ jobs:
158158
set -e
159159
source ./ci/reload-env.sh
160160
161-
black --check --diff --verbose mars asv_bench
161+
black --check --diff --verbose mars benchmarks
162162
displayName: 'Check code style with black'
163163
164164
- bash: |

benchmarks/README.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# TPC-H Queries
2+
3+
TPC-H is a benchmark suite for business-oriented ad-hoc queries that are used to simulate real questions and is usually used to benchmark the performance of database tools for answering them.
4+
5+
More information can be found [here](http://www.tpc.org/tpch/)
6+
7+
## Generating TPC-H Data in Parquet Format
8+
9+
### 1. Download and Install tpch-dbgen
10+
11+
```
12+
git clone https://github.com/mars-project/tpch-dbgen
13+
cd tpch-dbgen
14+
make
15+
cd ../
16+
```
17+
18+
### 2. Generate Data
19+
20+
Usage
21+
22+
```
23+
usage: python gen_data.py [-h] --folder FOLDER [--SF N] [--validate_dataset]
24+
25+
-h, --help Show this help message and exit
26+
folder FOLDER: output folder name (can be local folder or S3 bucket)
27+
SF N: data size number in GB (Default 1)
28+
validate_dataset: Validate each parquet dataset with pyarrow.parquet.ParquetDataset (Default True)
29+
```
30+
31+
Example:
32+
33+
Generate 1GB data locally:
34+
35+
`python gen_data.py --SF 1 --folder SF1`
36+
37+
Generate 1TB data and upload to S3 bucket:
38+
39+
`python gen_data.py --SF 1000 --folder s3://bucket-name/`
40+
41+
NOTES:
42+
43+
This script assumes `tpch-dbgen` is in the same directory. If you downloaded it at another location, make sure to update `tpch_dbgen_location` in the script with the new location.
44+
45+
- If using S3 bucket, install `s3fs` and add your AWS credentials.
46+
47+
## Mars
48+
49+
### Installation
50+
51+
Follow the intstructions [here](https://docs.pymars.org/en/latest/installation/index.html).
52+
53+
### Running queries
54+
55+
Use
56+
57+
`python tpch/run_queries.py --folder folder_path --endpoint mars_endpoint`
58+
59+
```
60+
usage: python run_queries.py [-h] --folder FOLDER
61+
62+
optional arguments:
63+
-h, --help show this help message and exit
64+
--folder FOLDER The folder containing TPCH data
65+
--endpoint ENDPOINT Endpoint to connect to, if not provided, will create a local cluste
66+
```

benchmarks/__init__.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
#!/usr/bin/env python
2+
# -*- coding: utf-8 -*-
3+
# Copyright 1999-2021 Alibaba Group Holding Ltd.
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.

asv_bench/asv.conf.json renamed to benchmarks/asv_bench/asv.conf.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
// The URL or local path of the source code repository for the
1313
// project being benchmarked
14-
"repo": "..",
14+
"repo": "../../",
1515

1616
// The Python project's subdirectory in your repo. If missing or
1717
// the empty string, the project is assumed to be located at the root

0 commit comments

Comments
 (0)