Skip to content

Commit 5474289

Browse files
authored
Merge pull request #356 from ldbc/refine-readmes
2 parents 55dcdb9 + 8dc8be1 commit 5474289

File tree

3 files changed

+9
-23
lines changed

3 files changed

+9
-23
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,9 @@ You can build the JAR with both Maven and SBT.
3636
sbt assembly
3737
```
3838

39-
:warning: When using SBT, change the path of the JAR file in the instructions provided in the README (`target/ldbc_snb_datagen_${PLATFORM_VERSION}-${DATAGEN_VERSION}.jar` -> `./target/scala-2.11/ldbc_snb_datagen-assembly-${DATAGEN_VERSION}.jar`).
39+
:warning: When using SBT, change the path of the JAR file in the instructions provided in the README (`target/ldbc_snb_datagen_${PLATFORM_VERSION}-${DATAGEN_VERSION}.jar` -> `./target/scala-2.12/ldbc_snb_datagen-assembly-${DATAGEN_VERSION}.jar`).
4040

41-
### Install tools
41+
### Install Python tools
4242

4343
Some of the build utilities are written in Python. To use them, you have to create a Python virtual environment
4444
and install the dependencies.

tools/emr/README.md

Lines changed: 5 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -24,19 +24,8 @@ In AWS IAM, add the following roles with **Create Role** | **AWS service** | **E
2424

2525
## Install the required libraries
2626

27-
Make sure you use pip 21.1 or newer.
27+
Install the required libraries as described in the [main README](../../README.md#install-python-tools).
2828

29-
1. From `tools`, run:
30-
31-
```
32-
pip install -e .
33-
```
34-
35-
1. Package the JAR. Make sure you use Java 8:
36-
37-
```bash
38-
./tools/build.sh
39-
```
4029
## Submitting a job
4130

4231
1. Upload the JAR to S3. (We don't version the JARs yet, so you can only make sure that you run the intended code this way :( )
@@ -67,15 +56,12 @@ To use spot instances, add the `--use-spot` argument:
6756

6857
### Using a different Spark / EMR version
6958

70-
71-
72-
We use EMR 6.3.0 by default, which contains Spark 3.1. You can use a different version by specifying it with the `--emr-version` option.
73-
EMR 5.33.0 is the recommended EMR version to be used with Spark 2.4.
74-
Make sure that you have uploaded the right JAR first!
59+
We use EMR 6.3.0 by default, which packages Spark 3.1. You can use a different version by specifying it with the `--emr-version` option.
60+
Make sure that you have uploaded the right JAR first.
7561

7662
```bash
77-
PLATFORM_VERSION=2.11_spark2.4
78-
./tools/emr/submit_datagen_job.py --bucket ${BUCKET_NAME} --platform-version ${PLATFORM_VERSION} --emr-release emr-5.33.0 ${JOB_NAME} ${SCALE_FACTOR} csv raw
63+
PLATFORM_VERSION=2.12_spark3.1
64+
./tools/emr/submit_datagen_job.py --bucket ${BUCKET_NAME} --platform-version ${PLATFORM_VERSION} --emr-release emr-6.2.0 ${JOB_NAME} ${SCALE_FACTOR} csv raw
7965
```
8066

8167
### Using a parameter file

tools/emr/submit_datagen_job.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -238,13 +238,13 @@ def submit_datagen_job(name,
238238
help='EC2 key name for SSH connection')
239239
parser.add_argument('--platform-version',
240240
default=defaults['platform_version'],
241-
help='The spark platform the JAR is compiled for formatted like {scala.compat.version}_spark{spark.compat.version}, e.g. 2.11_spark2.4, 2.12_spark3.1')
241+
help='The spark platform the JAR is compiled for formatted like {scala.compat.version}_spark{spark.compat.version}, e.g. 2.12_spark3.1')
242242
parser.add_argument('--version',
243243
default=defaults['version'],
244244
help='LDBC SNB Datagen library version')
245245
parser.add_argument('--emr-release',
246246
default=defaults['emr_release'],
247-
help='The EMR release to use. E.g emr-5.33.0, emr-6.3.0')
247+
help='The EMR release to use. E.g. emr-6.3.0')
248248
parser.add_argument('-y', '--yes',
249249
default=defaults['yes'],
250250
action='store_true',

0 commit comments

Comments
 (0)