Skip to content

Commit ceb8747

Browse files
committed
Start readme, runsh and benchmark.sh
1 parent 03f4dae commit ceb8747

File tree

3 files changed

+55
-0
lines changed

3 files changed

+55
-0
lines changed

presto/README.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# PrestoDB
2+
3+
Presto is a distributed SQL query engine for big data.
4+
- [Github](https://github.com/prestodb/presto)
5+
- [Homepage](https://prestodb.io)
6+
7+
The benchmarks are based on Presto version `0.287`.
8+
9+
We assume that a Presto cluster is already running. For more information, visit [Getting Started](https://prestodb.io/getting-started/).
10+
11+
----------
12+
## Steps
13+
14+
1. Download parquet file and upload it to S3 Bucket ex. s3://your-bucket/clickbench-parquet/hits/hits.parquet.
15+
2. Create a new schema for the dataset and create hits table in this new schema using the create.sql file. Add the following to the end of the file to use the parquet file on S3.
16+
```
17+
WITH (
18+
format = 'PARQUET',
19+
external_location = 's3a://your-bucket/clickbench-parquet/hits/'
20+
);
21+
```
22+
3. Connect to your Presto coordinator and use presto-cli to run `run.sh`.
23+
4. Presto UI is one of the ways to get detailed information on the queries including runtime.

presto/benchmark.sh

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
#!/bin/bash
2+
3+
4+
PRESTO_VERSION=0.287
5+
6+
# Set the URL to download
7+
PRESTO_BIN=https://repo1.maven.org/maven2/com/facebook/presto/presto-server/${PRESTO_VERSION}/presto-server-${PRESTO_VERSION}.tar.gz
8+
9+
# Update the base image OS and install wget and python
10+
sudo apt-get update
11+
sudo apt-get install -y wget python less
12+
13+
# Download Presto and unpack it to /opt/presto
14+
wget --quiet ${PRESTO_BIN}
15+
mkdir -p /opt
16+
tar -xf presto-server-${PRESTO_VERSION}.tar.gz -C /opt
17+
rm presto-server-${PRESTO_VERSION}.tar.gz
18+
ln -s /opt/presto-server-${PRESTO_VERSION} /opt/presto
19+
20+
#Load the data
21+
wget --no-verbose --continue 'https://datasets.clickhouse.com/hits_compatible/hits.parquet'
22+

presto/run.sh

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#!/bin/bash
2+
3+
4+
TRIES=3
5+
cat queries.sql | while read query; do
6+
echo "{\"sql\":\"$query option(timeoutMs=300000)\"}"| tr -d ';' > query.json
7+
for i in $(seq 1 $TRIES); do
8+
./opt/presto-cli --server 127.0.0.1:8080 --schema "clickbench_parquet" --session offset_clause_enabled=true --catalog "hive" --execute "${query}"
9+
done
10+
done

0 commit comments

Comments
 (0)