Skip to content

Commit 5a84bb5

Browse files
committed
Move environment variables to Dockerfile, add docs to README.md
1 parent 7fcd872 commit 5a84bb5

File tree

3 files changed

+37
-10
lines changed

3 files changed

+37
-10
lines changed

Dockerfile

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,9 @@ WORKDIR /opt/ldbc_snb_datagen
1111
RUN mvn -DskipTests clean assembly:assembly
1212

1313
ENV HADOOP_CLIENT_OPTS '-Xmx8G'
14+
ENV DATAGEN_SCALE_FACTOR 'snb.interactive.1'
15+
ENV DATAGEN_PERSON_SERIALIZER 'ldbc.snb.datagen.serializer.snb.interactive.CSVPersonSerializer'
16+
ENV DATAGEN_INVARIANT_SERIALIZER 'ldbc.snb.datagen.serializer.snb.interactive.CSVInvariantSerializer'
17+
ENV DATAGEN_PERSON_ACTIVITY_SERIALIZER 'ldbc.snb.datagen.serializer.snb.interactive.CSVPersonActivitySerializer'
18+
1419
CMD /opt/ldbc_snb_datagen/docker_run.sh

README.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,33 @@ cd $LDBC_SNB_DATAGEN_HOME
3333
./run.sh
3434
```
3535

36+
## Docker image
37+
38+
The image can be simply built with the provided Dockerfile.
39+
To build, execute the following command from the project directory:
40+
```
41+
docker build . --tag ldbc/datagen
42+
```
43+
44+
### Running
45+
46+
The project will output it's results in the `/opt/ldbc_snb_datagen/social_network/` directory. In order to save the results of the generation, a directory must be mounted in the container from the host:
47+
48+
```
49+
mkdir datagen_output
50+
51+
docker run --rm --mount type=bind,source="$(pwd)/datagen_output/",target="/opt/ldbc_snb_datagen/social_network/" ldbc/datagen
52+
```
53+
54+
### Options
55+
56+
The container image can be customized with environment variables passed through the `docker run` command. The following options are present:
57+
* `HADOOP_CLIENT_OPTS`: A standard HADOOP environment variable controlling the Hadoop client parameters. Default is `-Xmx8G` to provide the client enough heap.
58+
* `DATAGEN_SCALE_FACTOR`: The scale factor of the generated dataset. Default is `snb.interactive.1`
59+
* `DATAGEN_PERSON_SERIALIZER`: The serializer used for Person objects. Default is `ldbc.snb.datagen.serializer.snb.interactive.CSVPersonSerializer`
60+
* `DATAGEN_INVARIANT_SERIALIZER` The serializer used for Invariant objects. Default is `ldbc.snb.datagen.serializer.snb.interactive.CSVInvariantSerializer`
61+
* `DATAGEN_PERSON_ACTIVITY_SERIALIZER` The serializer used for Invariant objects. Default is `ldbc.snb.datagen.serializer.snb.interactive.CSVPersonActivitySerializer`
62+
3663
<!-- **Datasets** -->
3764

3865
<!-- Publicly available datasets can be found at the LDBC-SNB Amazon Bucket. These datasets are the official SNB datasets and were generated using version 0.2.6. They are available in the three official supported serializers: CSV, CSVMergeForeign and TTL. The bucket is configured in "Requester Pays" mode, thus in order to access them you need a properly set up AWS client.

docker_run.sh

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,12 @@
11
#!/bin/bash
22

3-
# Variables for the default settings
4-
DEFAULT_SCALE_FACTOR=snb.interactive.1
5-
DEFAULT_PERSON_SERIALIZER=ldbc.snb.datagen.serializer.snb.interactive.CSVPersonSerializer
6-
DEFAULT_INVARIANT_SERIALIZER=ldbc.snb.datagen.serializer.snb.interactive.CSVPersonSerializer
7-
DEFAULT_PERSON_ACTIVITY_SERIALIZER=ldbc.snb.datagen.serializer.snb.interactive.CSVPersonSerializer
8-
93
# Parameter serialization
104
PARAMS_FILE=params.ini
11-
echo "ldbc.snb.datagen.generator.scaleFactor:${SCALE_FACTOR:-$DEFAULT_SCALE_FACTOR}" > ${PARAMS_FILE}
12-
echo "ldbc.snb.datagen.serializer.personSerializer:${PERSON_SERIALIZER:-$DEFAULT_SERIALIZER}" >> ${PARAMS_FILE}
13-
echo "ldbc.snb.datagen.serializer.invariantSerializer:${INVARIANT_SERIALIZER:-$DEFAULT_INVARIANT_SERIALIZER}" >> ${PARAMS_FILE}
14-
echo "ldbc.snb.datagen.serializer.personActivitySerializer:${PERSON_ACTIVITY_SERIALIZER:-$DEFAULT_PERSON_ACTIVITY_SERIALIZER}" >> ${PARAMS_FILE}
5+
echo "ldbc.snb.datagen.generator.numThreads":$(nproc) > ${PARAMS_FILE}
6+
echo "ldbc.snb.datagen.generator.scaleFactor:${DATAGEN_SCALE_FACTOR}" >> ${PARAMS_FILE}
7+
echo "ldbc.snb.datagen.serializer.personSerializer:${DATAGEN_PERSON_SERIALIZER}" >> ${PARAMS_FILE}
8+
echo "ldbc.snb.datagen.serializer.invariantSerializer:${DATAGEN_INVARIANT_SERIALIZER}" >> ${PARAMS_FILE}
9+
echo "ldbc.snb.datagen.serializer.personActivitySerializer:${DATAGEN_PERSON_ACTIVITY_SERIALIZER}" >> ${PARAMS_FILE}
1510

1611
# Running the generator
1712
/opt/hadoop-2.6.0/bin/hadoop jar /opt/ldbc_snb_datagen/target/ldbc_snb_datagen-0.2.7-jar-with-dependencies.jar /opt/ldbc_snb_datagen/params.ini

0 commit comments

Comments
 (0)