Merge pull request #356 from ldbc/refine-readmes

szarnyasg · web-flow · commit 5474289b08d8 · 2021-12-21T18:11:05.000+01:00
diff --git a/README.md b/README.md
@@ -36,9 +36,9 @@ You can build the JAR with both Maven and SBT.
     sbt assembly
     ```
 
-    :warning: When using SBT, change the path of the JAR file in the instructions provided in the README (`target/ldbc_snb_datagen_${PLATFORM_VERSION}-${DATAGEN_VERSION}.jar` -> `./target/scala-2.11/ldbc_snb_datagen-assembly-${DATAGEN_VERSION}.jar`).
+    :warning: When using SBT, change the path of the JAR file in the instructions provided in the README (`target/ldbc_snb_datagen_${PLATFORM_VERSION}-${DATAGEN_VERSION}.jar` -> `./target/scala-2.12/ldbc_snb_datagen-assembly-${DATAGEN_VERSION}.jar`).
 
-### Install tools
+### Install Python tools
 
 Some of the build utilities are written in Python. To use them, you have to create a Python virtual environment
 and install the dependencies.
diff --git a/tools/emr/README.md b/tools/emr/README.md
@@ -24,19 +24,8 @@ In AWS IAM, add the following roles with **Create Role** | **AWS service** | **E
 
 ## Install the required libraries
 
-Make sure you use pip 21.1 or newer.
+Install the required libraries as described in the [main README](../../README.md#install-python-tools).
 
-1. From `tools`, run:
-
-```
-pip install -e .
-```
-
-1. Package the JAR. Make sure you use Java 8:
-
-```bash
-./tools/build.sh
-```
 ## Submitting a job
 
 1. Upload the JAR to S3. (We don't version the JARs yet, so you can only make sure that you run the intended code this way :( ) 
@@ -67,15 +56,12 @@ To use spot instances, add the `--use-spot` argument:
 
 ### Using a different Spark / EMR version
 
-
-
-We use EMR 6.3.0 by default, which contains Spark 3.1. You can use a different version by specifying it with the `--emr-version` option. 
-EMR 5.33.0 is the recommended EMR version to be used with Spark 2.4.
-Make sure that you have uploaded the right JAR first!
+We use EMR 6.3.0 by default, which packages Spark 3.1. You can use a different version by specifying it with the `--emr-version` option.
+Make sure that you have uploaded the right JAR first.
 
 ```bash
-PLATFORM_VERSION=2.11_spark2.4
-./tools/emr/submit_datagen_job.py --bucket ${BUCKET_NAME} --platform-version ${PLATFORM_VERSION} --emr-release emr-5.33.0 ${JOB_NAME} ${SCALE_FACTOR} csv raw
+PLATFORM_VERSION=2.12_spark3.1
+./tools/emr/submit_datagen_job.py --bucket ${BUCKET_NAME} --platform-version ${PLATFORM_VERSION} --emr-release emr-6.2.0 ${JOB_NAME} ${SCALE_FACTOR} csv raw
 ```
 
 ### Using a parameter file
diff --git a/tools/emr/submit_datagen_job.py b/tools/emr/submit_datagen_job.py
@@ -238,13 +238,13 @@ def submit_datagen_job(name,
                         help='EC2 key name for SSH connection')
     parser.add_argument('--platform-version',
                         default=defaults['platform_version'],
-                        help='The spark platform the JAR is compiled for formatted like {scala.compat.version}_spark{spark.compat.version}, e.g. 2.11_spark2.4, 2.12_spark3.1')
+                        help='The spark platform the JAR is compiled for formatted like {scala.compat.version}_spark{spark.compat.version}, e.g. 2.12_spark3.1')
     parser.add_argument('--version',
                         default=defaults['version'],
                         help='LDBC SNB Datagen library version')
     parser.add_argument('--emr-release',
                         default=defaults['emr_release'],
-                        help='The EMR release to use. E.g emr-5.33.0, emr-6.3.0')
+                        help='The EMR release to use. E.g. emr-6.3.0')
     parser.add_argument('-y', '--yes',
                         default=defaults['yes'],
                         action='store_true',