@@ -41,7 +41,13 @@ aws s3 cp target/ldbc_snb_datagen_${PLATFORM_VERSION}-${DATAGEN_VERSION}-jar-wit
41
41
``` bash
42
42
JOB_NAME=MyTest
43
43
SCALE_FACTOR=10
44
- ./tools/emr/submit_datagen_job.py --bucket ${BUCKET_NAME} ${JOB_NAME} ${SCALE_FACTOR} csv raw
44
+ ./tools/emr/submit_datagen_job.py \
45
+ --bucket \
46
+ ${BUCKET_NAME} \
47
+ ${JOB_NAME} \
48
+ ${SCALE_FACTOR} \
49
+ csv \
50
+ raw
45
51
```
46
52
47
53
Note: scale factors below 1 are not supported.
@@ -51,7 +57,38 @@ Note: scale factors below 1 are not supported.
51
57
To use spot instances, add the ` --use-spot ` argument:
52
58
53
59
``` bash
54
- ./tools/emr/submit_datagen_job.py --use-spot --bucket ${BUCKET_NAME} ${JOB_NAME} ${SCALE_FACTOR} csv raw
60
+ ./tools/emr/submit_datagen_job.py \
61
+ --use-spot \
62
+ --bucket \
63
+ ${BUCKET_NAME} \
64
+ ${JOB_NAME} \
65
+ ${SCALE_FACTOR} \
66
+ csv \
67
+ raw
68
+ ```
69
+
70
+ ### Sample command
71
+
72
+ Generate the BI data set with the following configuration:
73
+
74
+ * use spot instances
75
+ * in the ` csv-composite-projected-fk ` format (` --explode-edges ` )
76
+ * compress CSVs with ` gzip ` , and
77
+ * generate factors.
78
+
79
+ ``` bash
80
+ ./tools/emr/submit_datagen_job.py \
81
+ --use-spot \
82
+ --bucket \
83
+ ${BUCKET_NAME} \
84
+ ${JOB_NAME} \
85
+ ${SCALE_FACTOR} \
86
+ csv \
87
+ bi \
88
+ -- \
89
+ --explode-edges \
90
+ --format-options compression=gzip \
91
+ --generate-factors
55
92
```
56
93
57
94
### Using a different Spark / EMR version
@@ -61,7 +98,14 @@ Make sure that you have uploaded the right JAR first.
61
98
62
99
``` bash
63
100
PLATFORM_VERSION=2.12_spark3.1
64
- ./tools/emr/submit_datagen_job.py --bucket ${BUCKET_NAME} --platform-version ${PLATFORM_VERSION} --emr-release emr-6.2.0 ${JOB_NAME} ${SCALE_FACTOR} csv raw
101
+ ./tools/emr/submit_datagen_job.py \
102
+ --bucket ${BUCKET_NAME} \
103
+ --platform-version ${PLATFORM_VERSION} \
104
+ --emr-release emr-6.2.0 \
105
+ ${JOB_NAME} \
106
+ ${SCALE_FACTOR} \
107
+ csv \
108
+ raw
65
109
```
66
110
67
111
### Using a parameter file
0 commit comments