Skip to content

Commit 69f387c

Browse files
committed
update
1 parent f271585 commit 69f387c

File tree

3 files changed

+65
-11
lines changed

3 files changed

+65
-11
lines changed

tpch/README.md

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -19,24 +19,26 @@
1919

2020
# Benchmarking DataFusion Ray on Kubernetes
2121

22-
This is a rough guide to deploying and benchmarking DataFusion Ray on Kubernetes.
22+
This is a rough guide to deploying and benchmarking DataFusion Ray on Kubernetes as part of the development process.
2323

24-
set up new venv
24+
## Building Wheels
2525

26-
```shell
27-
python3 -m venv venv
28-
source venv/bin/activate
29-
pip3 install maturin
30-
pip3 install ray
31-
pip3 install ray[default]
32-
```
26+
Follow the instructions in the [contributor guide] to set up a development environment and then build the project
27+
using the following command.
3328

34-
Build the project.
29+
[contributor guide]: ../docs/contributing.md
3530

3631
```shell
3732
maturin build --strip
3833
```
3934

35+
## Create a Ray Cluster
36+
37+
Create a `datafusion-ray.yaml` file using the following template. It is important that the Ray Docker image uses the
38+
same Python version that was used to build the wheels. This example yaml assumes that the TPC-H data files are
39+
available locally on each node in the cluster at the path `/mnt/bigdata`. If the data is stored on object storage then
40+
the `volume` and `volumeMount` sections can be removed.
41+
4042
```yaml
4143
apiVersion: ray.io/v1alpha1
4244
kind: RayCluster
@@ -93,11 +95,14 @@ spec:
9395
claimName: ray-pvc
9496
```
9597
98+
Run the following command to create the cluster:
99+
96100
```shell
97101
kubectl apply -f datafusion-ray.yaml
98102
```
99103

100-
set up port forwarding on head node 8265
104+
Once the cluster is running, set up port forwarding on port 8265 on the head node and then run the following
105+
command to run the benchmarks.
101106

102107
```shell
103108
ray job submit --address='http://localhost:8265' \
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"engine": "datafusion-ray",
3+
"benchmark": "tpch",
4+
"settings": {
5+
"concurrency": 8,
6+
"batch_size": 8182,
7+
"prefetch_buffer_size": 0,
8+
"partitions_per_worker": null
9+
},
10+
"data_path": "file:///mnt/bigdata/tpch/sf100",
11+
"queries": {
12+
"2": 8.547899007797241
13+
}
14+
}
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
{
2+
"engine": "datafusion-ray",
3+
"benchmark": "tpch",
4+
"settings": {
5+
"concurrency": 8,
6+
"batch_size": 8182,
7+
"prefetch_buffer_size": 0,
8+
"partitions_per_worker": null
9+
},
10+
"data_path": "file:///mnt/bigdata/tpch/sf100",
11+
"queries": {
12+
"1": 7.222118854522705,
13+
"2": 8.797776937484741,
14+
"3": 11.183124780654907,
15+
"4": 7.6282007694244385,
16+
"5": 20.619840383529663,
17+
"6": 3.466888427734375,
18+
"7": 29.999598026275635,
19+
"8": 22.716665267944336,
20+
"9": 38.37256050109863,
21+
"10": 25.540525197982788,
22+
"11": 6.4380128383636475,
23+
"12": 10.021047592163086,
24+
"13": 8.462335348129272,
25+
"14": 3.810248851776123,
26+
"15": 0.8309383392333984,
27+
"16": 3.692992925643921,
28+
"17": 32.96640610694885,
29+
"18": 50.401840925216675,
30+
"19": 4.988840818405151,
31+
"20": 7.992424011230469,
32+
"21": 47.60438632965088,
33+
"22": 3.4463324546813965
34+
}
35+
}

0 commit comments

Comments
 (0)