Skip to content

Commit 6fab847

Browse files
committed
Update README instructions
1 parent 5721546 commit 6fab847

File tree

3 files changed

+30
-8
lines changed

3 files changed

+30
-8
lines changed

README.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -58,21 +58,27 @@ Spark 3.2.x is the recommended runtime to use. The rest of the instructions are
5858
To place Spark under `/opt/`:
5959

6060
```bash
61-
curl https://archive.apache.org/dist/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz | sudo tar -xz -C /opt/
62-
export SPARK_HOME="/opt/spark-3.2.2-bin-hadoop3.2"
63-
export PATH="${SPARK_HOME}/bin":"${PATH}"
61+
scripts/get-spark-to-opt.sh
6462
```
6563

66-
To place under `~/`:
64+
To place it under `${HOME}/`:
6765

6866
```bash
69-
curl https://archive.apache.org/dist/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz | tar -xz -C ~/
70-
export SPARK_HOME=~/spark-3.2.2-bin-hadoop3.2
71-
export PATH="${SPARK_HOME}/bin":"${PATH}"
67+
scripts/get-spark-to-home.sh
7268
```
7369

7470
Both Java 8 and Java 11 are supported.
7571

72+
#### Building the project
73+
74+
Run:
75+
76+
```bash
77+
scripts/build.sh
78+
```
79+
80+
#### Running the generator
81+
7682
Once you have Spark in place and built the JAR file, run the generator as follows:
7783

7884
```bash
@@ -90,7 +96,7 @@ The runtime configuration arguments determine the amount of memory, number of th
9096
./tools/run.py --help
9197
```
9298

93-
To generate a single `part-*.csv` file, reduce the parallelism (number of Spark partitions) to 1.
99+
To generate a single `part-*` file, reduce the parallelism (number of Spark partitions) to 1.
94100

95101
```bash
96102
./tools/run.py --parallelism 1 -- --format csv --scale-factor 0.003 --mode interactive

scripts/get-spark-to-home.sh

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#!/bin/bash
2+
3+
set -eu
4+
cd "$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
5+
6+
curl https://archive.apache.org/dist/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz | tar -xz -C ${HOME}/
7+
export SPARK_HOME="${HOME}/spark-3.2.2-bin-hadoop3.2"
8+
export PATH="${SPARK_HOME}/bin":"${PATH}"

scripts/get-spark-to-opt.sh

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#!/bin/bash
2+
3+
set -eu
4+
cd "$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
5+
6+
curl https://archive.apache.org/dist/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz | sudo tar -xz -C /opt/
7+
export SPARK_HOME="/opt/spark-3.2.2-bin-hadoop3.2"
8+
export PATH="${SPARK_HOME}/bin":"${PATH}"

0 commit comments

Comments
 (0)