Skip to content

Commit dc3a544

Browse files
committed
Add more details/sample command to EMR script
1 parent d58f2b0 commit dc3a544

File tree

1 file changed

+47
-3
lines changed

1 file changed

+47
-3
lines changed

tools/emr/README.md

Lines changed: 47 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,13 @@ aws s3 cp target/ldbc_snb_datagen_${PLATFORM_VERSION}-${DATAGEN_VERSION}-jar-wit
4141
```bash
4242
JOB_NAME=MyTest
4343
SCALE_FACTOR=10
44-
./tools/emr/submit_datagen_job.py --bucket ${BUCKET_NAME} ${JOB_NAME} ${SCALE_FACTOR} csv raw
44+
./tools/emr/submit_datagen_job.py \
45+
--bucket \
46+
${BUCKET_NAME} \
47+
${JOB_NAME} \
48+
${SCALE_FACTOR} \
49+
csv \
50+
raw
4551
```
4652

4753
Note: scale factors below 1 are not supported.
@@ -51,7 +57,38 @@ Note: scale factors below 1 are not supported.
5157
To use spot instances, add the `--use-spot` argument:
5258

5359
```bash
54-
./tools/emr/submit_datagen_job.py --use-spot --bucket ${BUCKET_NAME} ${JOB_NAME} ${SCALE_FACTOR} csv raw
60+
./tools/emr/submit_datagen_job.py \
61+
--use-spot \
62+
--bucket \
63+
${BUCKET_NAME} \
64+
${JOB_NAME} \
65+
${SCALE_FACTOR} \
66+
csv \
67+
raw
68+
```
69+
70+
### Sample command
71+
72+
Generate the BI data set with the following configuration:
73+
74+
* use spot instances
75+
* in the `csv-composite-projected-fk` format (`--explode-edges`)
76+
* compress CSVs with `gzip`, and
77+
* generate factors.
78+
79+
```bash
80+
./tools/emr/submit_datagen_job.py \
81+
--use-spot \
82+
--bucket \
83+
${BUCKET_NAME} \
84+
${JOB_NAME} \
85+
${SCALE_FACTOR} \
86+
csv \
87+
bi \
88+
-- \
89+
--explode-edges \
90+
--format-options compression=gzip \
91+
--generate-factors
5592
```
5693

5794
### Using a different Spark / EMR version
@@ -61,7 +98,14 @@ Make sure that you have uploaded the right JAR first.
6198

6299
```bash
63100
PLATFORM_VERSION=2.12_spark3.1
64-
./tools/emr/submit_datagen_job.py --bucket ${BUCKET_NAME} --platform-version ${PLATFORM_VERSION} --emr-release emr-6.2.0 ${JOB_NAME} ${SCALE_FACTOR} csv raw
101+
./tools/emr/submit_datagen_job.py \
102+
--bucket ${BUCKET_NAME} \
103+
--platform-version ${PLATFORM_VERSION} \
104+
--emr-release emr-6.2.0 \
105+
${JOB_NAME} \
106+
${SCALE_FACTOR} \
107+
csv \
108+
raw
65109
```
66110

67111
### Using a parameter file

0 commit comments

Comments
 (0)