22
33Creates automatically the appropriate infrastructure in AWS for running benchmarks.
44
5+ ---
6+
57# Deploy
68
79## Prerequisites
@@ -38,6 +40,8 @@ npm run cdk deploy
3840npm run sync-bucket
3941```
4042
43+ ---
44+
4145# Connect to instances
4246
4347## Prerequisites
@@ -59,24 +63,61 @@ sudo ln -s /usr/local/sessionmanagerplugin/bin/session-manager-plugin /usr/local
5963
6064## Port Forward
6165
62- After performing a CDK deploy, a CNF output will be printed to stdout with instructions for port-forwarding
63- to all the machines, something like this:
66+ After performing a CDK deploy, a CNF output will be printed to stdout with instructions for port-forwarding to them.
6467
6568``` shell
66- # instance-0 (forward port 8000 to localhost:8000)
67- aws ssm start-session --target i-04ed9f331dcfae4b6 --document-name AWS-StartPortForwardingSession --parameters " portNumber=8000,localPortNumber=8000"
69+ export INSTANCE_ID=i-0000000000000000
70+
71+ aws ssm start-session --target $INSTANCE_ID --document-name AWS-StartPortForwardingSession --parameters " portNumber=9000,localPortNumber=9000"
6872```
6973
70- Just port-forwarding the first instance is enough for making queries.
74+ Just port-forwarding the first instance is enough for issuing queries.
7175
7276## Connect
7377
7478After performing a CDK deploy, a CNF output will be printed to stdout with instructions for connecting
7579to all the machines, something like this:
7680
7781``` shell
78- # instance-0
79- aws ssm start-session --target i-00000000000000000
82+ export INSTANCE_ID=i-0000000000000000
83+
84+ aws ssm start-session --target $INSTANCE_ID
85+ ```
86+
87+ The logs can be streamed with:
88+
89+ ``` shell
90+ sudo journalctl -u worker.service -f -o cat
91+ ```
92+
93+ ---
94+
95+ # Running benchmarks
96+
97+ There's a script that will run the TPCH benchmarks against the remote cluster:
98+
99+ In one terminal, perform a port-forward of one machine in the cluster, something like this:
100+
101+ ``` shell
102+ export INSTANCE_ID=i-0000000000000000
103+ aws ssm start-session --target $INSTANCE_ID --document-name AWS-StartPortForwardingSession --parameters " portNumber=9000,localPortNumber=9000"
104+ ```
105+
106+ In another terminal, navigate to the benchmarks/cdk folder:
107+
108+ ``` shell
109+ cd benchmarks/cdk
80110```
81111
82- Just running one of those commands in the terminal will connect you to the EC2 instance
112+ And run the benchmarking script
113+
114+ ``` shell
115+ npm run datafusion-bench
116+ ```
117+
118+ Several arguments can be passed for running the benchmarks against different scale factors and with different configs,
119+ for example:
120+
121+ ``` shell
122+ npm run datafusion-bench -- --sf 10 --files-per-task 4 --query 7
123+ ```
0 commit comments