Skip to content

Commit 57a14ef

Browse files
authored
Merge pull request #80 from hegyibalint/master
Add Docker support
2 parents cdb3213 + a7015f9 commit 57a14ef

File tree

3 files changed

+60
-1
lines changed

3 files changed

+60
-1
lines changed

Dockerfile

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
FROM openjdk:8-jdk-alpine
2+
3+
# Download hadoop
4+
WORKDIR /opt
5+
RUN apk add bash curl maven python
6+
RUN curl -L 'http://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/hadoop-2.6.0.tar.gz' | tar -xz
7+
8+
ENV HADOOP_CLIENT_OPTS '-Xmx8G'
9+
10+
# Copy the project
11+
COPY . /opt/ldbc_snb_datagen
12+
WORKDIR /opt/ldbc_snb_datagen
13+
# Remove sample parameters
14+
RUN rm params*.ini
15+
# Build jar bundle
16+
RUN mvn -DskipTests clean assembly:assembly
17+
18+
CMD /opt/ldbc_snb_datagen/docker_run.sh

README.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,29 @@ cd $LDBC_SNB_DATAGEN_HOME
3333
./run.sh
3434
```
3535

36-
<!-- **Datasets** -->
36+
## Docker image
37+
38+
The image can be simply built with the provided Dockerfile.
39+
To build, execute the following command from the repository directory:
40+
```
41+
docker build . --tag ldbc/datagen
42+
```
43+
44+
### Options
45+
46+
To configure the amount of memory available, set the `HADOOP_CLIENT_OPTS` variable in the Dockerfile. The default value is `-Xmx8G`.
47+
48+
### Running
49+
50+
In order to run the container, a `params.ini` file is required. For reference, please see the `params*.ini` files in the repository. The file will be mounted in the container by the `--mount type=bind,source="$(pwd)/params.ini,target="/opt/ldbc_snb_datagen/params.ini"` option. If required, the source path can be set to a different path.
51+
52+
The container will output it's results in the `/opt/ldbc_snb_datagen/social_network/` directory. In order to save the results of the generation, a directory must be mounted in the container from the host:
53+
54+
```
55+
mkdir datagen_output
56+
57+
docker run --rm --mount type=bind,source="$(pwd)/datagen_output/",target="/opt/ldbc_snb_datagen/social_network/" --mount type=bind,source="$(pwd)/params.ini",target="/opt/ldbc_snb_datagen/params.ini" ldbc/datagen
58+
```
3759

3860
<!-- Publicly available datasets can be found at the LDBC-SNB Amazon Bucket. These datasets are the official SNB datasets and were generated using version 0.2.6. They are available in the three official supported serializers: CSV, CSVMergeForeign and TTL. The bucket is configured in "Requester Pays" mode, thus in order to access them you need a properly set up AWS client.
3961
* http://ldbc-snb.s3.amazonaws.com/ -->

docker_run.sh

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
#o!/bin/bash
2+
3+
set -e
4+
5+
if [ ! -f /opt/ldbc_snb_datagen/params.ini ]; then
6+
echo "The params.ini file is not present"
7+
exit 1
8+
fi
9+
10+
# Running the generator
11+
/opt/hadoop-2.6.0/bin/hadoop jar /opt/ldbc_snb_datagen/target/ldbc_snb_datagen-0.2.7-jar-with-dependencies.jar /opt/ldbc_snb_datagen/params.ini
12+
13+
# Cleanup
14+
rm -f m*personFactors*
15+
rm -f .m*personFactors*
16+
rm -f m*activityFactors*
17+
rm -f .m*activityFactors*
18+
rm -f m0friendList*
19+
rm -f .m0friendList*

0 commit comments

Comments
 (0)