Skip to content

Commit 2248579

Browse files
authored
Merge pull request #98 from ldbc/batching-3.2.1
Move to Hadoop 3.2.1
2 parents 0687e92 + 44004c4 commit 2248579

File tree

206 files changed

+4344
-5785
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

206 files changed

+4344
-5785
lines changed

.travis.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,10 @@ before_install:
1313
- docker build . --tag ldbc/datagen
1414
install: true
1515
script:
16-
- cp params-csv.ini params.ini
16+
- cp params-csv-basic.ini params.ini
1717
- docker run --rm --mount type=bind,source="$(pwd)/",target="/opt/ldbc_snb_datagen/out" --mount type=bind,source="$(pwd)/params.ini",target="/opt/ldbc_snb_datagen/params.ini" ldbc/datagen
18-
- "[[ `md5sum social_network/*.csv | sort | md5sum` == 'fa046e4c44e4c3e8f6858720c45d80ed -' ]]"
19-
- "[[ `md5sum substitution_parameters/interactive_* | sort | md5sum` == '5cba23795df372c19688b05c5a9f318f -' ]]"
18+
- md5sum social_network/*.csv | sort
19+
- "[[ `md5sum social_network/*.csv | sort | md5sum` == 'ee9e6dd99bf7c3459f4c6156d355f5ea -' ]]"
2020
- mkdir out
2121
- cp -r substitution_parameters out/
2222
notifications:

Dockerfile

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,4 @@
1-
FROM openjdk:8-jdk-stretch
2-
3-
# Download hadoop
4-
WORKDIR /opt
5-
RUN apt-get update
6-
RUN apt-get install -y bash curl maven python
7-
RUN curl -L 'http://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/hadoop-2.6.0.tar.gz' | tar -xz
8-
RUN curl -L 'https://julialang-s3.julialang.org/bin/linux/x64/1.2/julia-1.2.0-linux-x86_64.tar.gz' | tar -xz
1+
FROM ldbc/datagen-base:latest
92

103
# Copy the project
114
COPY . /opt/ldbc_snb_datagen

README.md

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ The LDBC-SNB Data Generator (Datagen) is the responsible of providing the data s
2626
Initialize the `params.ini` file as needed. For example, to generate the basic CSV files, issue:
2727

2828
```bash
29-
cp params-csv.ini params.ini
29+
cp params-csv-basic.ini params.ini
3030
```
3131

3232
There are three main ways to run Datagen, each using a different approach to configure the amount of memory available.
@@ -38,15 +38,15 @@ There are three main ways to run Datagen, each using a different approach to con
3838
### Pseudo-distributed Hadoop node
3939

4040
To configure the amount of memory available, set the `HADOOP_CLIENT_OPTS` environment variable.
41-
To grab Hadoop, extract it, and set the environment values to sensible defaults, and generate the data as specified in the `params-csv.ini` file, run the following script:
41+
To grab Hadoop, extract it, and set the environment values to sensible defaults, and generate the data as specified in the `params-csv-params.ini` template file, run the following script:
4242

4343
```bash
44-
cp params-csv.ini params.ini
45-
wget http://archive.apache.org/dist/hadoop/core/hadoop-2.9.2/hadoop-2.9.2.tar.gz
46-
tar xf hadoop-2.9.2.tar.gz
44+
cp params-csv-basic.ini params.ini
45+
wget http://archive.apache.org/dist/hadoop/core/hadoop-3.2.1/hadoop-3.2.1.tar.gz
46+
tar xf hadoop-3.2.1.tar.gz
4747
export HADOOP_CLIENT_OPTS="-Xmx2G"
48-
# set this to the Hadoop 2.9.2 directory
49-
export HADOOP_HOME=`pwd`/hadoop-2.9.2
48+
# set this to the Hadoop 3.2.1 directory
49+
export HADOOP_HOME=`pwd`/hadoop-3.2.1
5050
# set this to the repository's directory
5151
export LDBC_SNB_DATAGEN_HOME=`pwd`
5252
./run.sh
@@ -66,10 +66,13 @@ docker build . --tag ldbc/datagen
6666

6767
Set the `params.ini` in the repository as for the pseudo-distributed case. The file will be mounted in the container by the `--mount type=bind,source="$(pwd)/params.ini,target="/opt/ldbc_snb_datagen/params.ini"` option. If required, the source path can be set to a different path.
6868

69-
The container outputs its results in the `/opt/ldbc_snb_datagen/out/` directory which contains two sub-directories, `social_network/` and `subsitution_parameters`. In order to save the results of the generation, a directory must be mounted in the container from the host. The driver requires the results be in the datagen repository directory. To generate the data, run the following command which includes changing the owner (`chown`) of the Docker-mounted volumes:
69+
The container outputs its results in the `/opt/ldbc_snb_datagen/out/` directory which contains two sub-directories, `social_network/` and `substitution_parameters`. In order to save the results of the generation, a directory must be mounted in the container from the host. The driver requires the results be in the datagen repository directory. To generate the data, run the following command which includes changing the owner (`chown`) of the Docker-mounted volumes.
70+
71+
:warning: This removes the previously generated `social_network` directory:
7072

7173
```bash
72-
docker run --rm --mount type=bind,source="$(pwd)/",target="/opt/ldbc_snb_datagen/out" --mount type=bind,source="$(pwd)/params.ini",target="/opt/ldbc_snb_datagen/params.ini" ldbc/datagen && \
74+
rm -rf social_network/ substitution_parameters && \
75+
docker run --rm --mount type=bind,source="$(pwd)/",target="/opt/ldbc_snb_datagen/out" --mount type=bind,source="$(pwd)/params.ini",target="/opt/ldbc_snb_datagen/params.ini" ldbc/datagen; \
7376
sudo chown -R $USER:$USER social_network/ substitution_parameters/
7477
```
7578

@@ -79,6 +82,12 @@ If you need to raise the memory limit, use the `-e HADOOP_CLIENT_OPTS="-Xmx..."`
7982

8083
Instructions are currently not provided.
8184

85+
### Graph schema
86+
87+
The graph schema is as follows:
88+
89+
![](https://raw.githubusercontent.com/ldbc/ldbc_snb_docs/dev/figures/schema-comfortable.png)
90+
8291
### Community provided tools
8392

8493
* **[Apache Flink Loader:](https://github.com/s1ck/ldbc-flink-import)** A loader of LDBC datasets for Apache Flink.

base-docker-image/Dockerfile

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
FROM openjdk:8-jdk-stretch
2+
3+
# Download hadoop
4+
WORKDIR /opt
5+
RUN apt-get update
6+
RUN apt-get install -y bash curl maven python
7+
RUN curl -L 'http://archive.apache.org/dist/hadoop/core/hadoop-3.2.1/hadoop-3.2.1.tar.gz' | tar -xz
8+
RUN curl -L 'https://julialang-s3.julialang.org/bin/linux/x64/1.2/julia-1.2.0-linux-x86_64.tar.gz' | tar -xz
9+
10+
# Copy the project
11+
COPY . /opt/ldbc_snb_datagen
12+
WORKDIR /opt/ldbc_snb_datagen
13+
# Remove sample parameters
14+
# Build jar bundle
15+
RUN mvn -DskipTests clean assembly:assembly

base-docker-image/pom.xml

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
<project xmlns="http://maven.apache.org/POM/4.0.0"
2+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3+
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
4+
http://maven.apache.org/xsd/maven-4.0.0.xsd">
5+
<modelVersion>4.0.0</modelVersion>
6+
7+
<groupId>ldbc.snb.datagen</groupId>
8+
<artifactId>ldbc_snb_datagen</artifactId>
9+
<version>0.4.0-SNAPSHOT</version>
10+
<packaging>jar</packaging>
11+
12+
<properties>
13+
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
14+
<maven.compiler.source>1.8</maven.compiler.source>
15+
<maven.compiler.target>1.8</maven.compiler.target>
16+
</properties>
17+
18+
<dependencies>
19+
<dependency>
20+
<groupId>junit</groupId>
21+
<artifactId>junit</artifactId>
22+
<version>4.12</version>
23+
<scope>test</scope>
24+
</dependency>
25+
<dependency>
26+
<groupId>xerces</groupId>
27+
<artifactId>xercesImpl</artifactId>
28+
<version>2.9.1</version>
29+
</dependency>
30+
<dependency>
31+
<groupId>xalan</groupId>
32+
<artifactId>xalan</artifactId>
33+
<version>2.7.1</version>
34+
</dependency>
35+
<dependency>
36+
<groupId>org.jdom</groupId>
37+
<artifactId>jdom</artifactId>
38+
<version>1.1.3</version>
39+
</dependency>
40+
<dependency>
41+
<groupId>org.apache.hadoop</groupId>
42+
<artifactId>hadoop-client</artifactId>
43+
<version>3.2.1</version>
44+
</dependency>
45+
<dependency>
46+
<groupId>ca.umontreal.iro</groupId>
47+
<artifactId>ssj</artifactId>
48+
<version>2.5</version>
49+
</dependency>
50+
<dependency>
51+
<groupId>com.google.code.gson</groupId>
52+
<artifactId>gson</artifactId>
53+
<version>2.2.4</version>
54+
</dependency>
55+
<dependency>
56+
<groupId>org.codehaus.groovy</groupId>
57+
<artifactId>groovy</artifactId>
58+
<version>2.1.6</version>
59+
</dependency>
60+
<dependency>
61+
<groupId>org.codehaus.groovy</groupId>
62+
<artifactId>groovy-templates</artifactId>
63+
<version>2.1.6</version>
64+
</dependency>
65+
<dependency>
66+
<groupId>org.codehaus.groovy</groupId>
67+
<artifactId>groovy-jsr223</artifactId>
68+
<version>2.1.6</version>
69+
</dependency>
70+
<dependency>
71+
<groupId>org.apache.commons</groupId>
72+
<artifactId>commons-math3</artifactId>
73+
<version>3.4.1</version>
74+
</dependency>
75+
<dependency>
76+
<groupId>org.python</groupId>
77+
<artifactId>jython</artifactId>
78+
<version>2.7.0</version>
79+
</dependency>
80+
<dependency>
81+
<groupId>org.roaringbitmap</groupId>
82+
<artifactId>RoaringBitmap</artifactId>
83+
<version>0.6.18</version>
84+
</dependency>
85+
</dependencies>
86+
<build>
87+
<plugins>
88+
<plugin>
89+
<groupId>org.apache.maven.plugins</groupId>
90+
<artifactId>maven-compiler-plugin</artifactId>
91+
<version>3.5.1</version>
92+
<configuration>
93+
<source>1.8</source>
94+
<target>1.8</target>
95+
</configuration>
96+
</plugin>
97+
<plugin>
98+
<groupId>org.walkmod.maven.plugins</groupId>
99+
<artifactId>walkmod-maven-plugin</artifactId>
100+
<version>1.0.3</version>
101+
<executions>
102+
<execution>
103+
<phase>generate-sources</phase>
104+
<goals>
105+
<goal>apply</goal>
106+
</goals>
107+
<configuration>
108+
<chains>pmd</chains>
109+
<properties>configurationFile=rulesets/java/basic.xml</properties>
110+
</configuration>
111+
</execution>
112+
</executions>
113+
</plugin>
114+
<plugin>
115+
<artifactId>maven-assembly-plugin</artifactId>
116+
<version>2.4</version>
117+
<configuration>
118+
<archive>
119+
<manifest>
120+
<mainClass>ldbc.snb.datagen.LdbcDatagen</mainClass>
121+
</manifest>
122+
</archive>
123+
<descriptorRefs>
124+
<descriptorRef>jar-with-dependencies</descriptorRef>
125+
</descriptorRefs>
126+
</configuration>
127+
</plugin>
128+
</plugins>
129+
</build>
130+
</project>

check-md5sums-csv-basic.sh

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
#!/bin/bash
2+
3+
set -e
4+
5+
[[ `md5sum social_network/static/organisation_0_0.csv | cut -d' ' -f1` == 'e6164941fce2c182ec303c84568e9282' ]]
6+
[[ `md5sum social_network/static/organisation_isLocatedIn_place_0_0.csv | cut -d' ' -f1` == 'f78cd5fe83cb7d860aae5f2350a89d78' ]]
7+
[[ `md5sum social_network/static/place_0_0.csv | cut -d' ' -f1` == 'af21fd950c4a5766391d50e1c8ee6a16' ]]
8+
[[ `md5sum social_network/static/place_isPartOf_place_0_0.csv | cut -d' ' -f1` == '3ec20dfa7de510165910e201aa0cef8d' ]]
9+
[[ `md5sum social_network/static/tag_0_0.csv | cut -d' ' -f1` == '8d9697e4c0a8bcf8cf8a8c4e3e26c688' ]]
10+
[[ `md5sum social_network/static/tag_hasType_tagclass_0_0.csv | cut -d' ' -f1` == '0333fee682a3d371ff932d782e318bd8' ]]
11+
[[ `md5sum social_network/static/tagclass_0_0.csv | cut -d' ' -f1` == '91f1225d669ddf277ade821a3d5136dc' ]]
12+
[[ `md5sum social_network/static/tagclass_isSubclassOf_tagclass_0_0.csv | cut -d' ' -f1` == '4b7b5342a2e97ab2042f319bf2a96557' ]]
13+
14+
[[ `md5sum social_network/dynamic/comment_0_0.csv | cut -d' ' -f1` == 'd1856b88d75eda099b8aec09710376fb' ]]
15+
[[ `md5sum social_network/dynamic/comment_hasCreator_person_0_0.csv | cut -d' ' -f1` == '91b5e4cc2a80dfd8e3d807c2a5733c5e' ]]
16+
[[ `md5sum social_network/dynamic/comment_hasTag_tag_0_0.csv | cut -d' ' -f1` == 'f434548899ea8218ef2e351988d0f117' ]]
17+
[[ `md5sum social_network/dynamic/comment_isLocatedIn_place_0_0.csv | cut -d' ' -f1` == 'ab411baf2055fa625bb7a703e3e4a745' ]]
18+
[[ `md5sum social_network/dynamic/comment_replyOf_comment_0_0.csv | cut -d' ' -f1` == '4918a8ca7e6dd3404642c6b2a67744aa' ]]
19+
[[ `md5sum social_network/dynamic/comment_replyOf_post_0_0.csv | cut -d' ' -f1` == '5a3aa006fd811c94cb7d02e84df2c468' ]]
20+
[[ `md5sum social_network/dynamic/forum_0_0.csv | cut -d' ' -f1` == 'a96fa4c161f51320819d67af695bf7b7' ]]
21+
[[ `md5sum social_network/dynamic/forum_containerOf_post_0_0.csv | cut -d' ' -f1` == '4d3b696b6aa4903ce3422f0e2ca3037e' ]]
22+
[[ `md5sum social_network/dynamic/forum_hasMember_person_0_0.csv | cut -d' ' -f1` == 'b9b5327212d6e12734da6d9e7340b62b' ]]
23+
[[ `md5sum social_network/dynamic/forum_hasModerator_person_0_0.csv | cut -d' ' -f1` == '67089c7a61cddccdda15086b694dae7c' ]]
24+
[[ `md5sum social_network/dynamic/forum_hasTag_tag_0_0.csv | cut -d' ' -f1` == 'de076e0181c59885b72755f20e73b0e3' ]]
25+
[[ `md5sum social_network/dynamic/person_0_0.csv | cut -d' ' -f1` == '62a02e0a67e645ee9248e19c5522584d' ]]
26+
[[ `md5sum social_network/dynamic/person_email_emailaddress_0_0.csv | cut -d' ' -f1` == 'afedc552761a9b59d9a78ecfe4490291' ]]
27+
[[ `md5sum social_network/dynamic/person_hasInterest_tag_0_0.csv | cut -d' ' -f1` == '42e0e4d51cd4fb56b18cd20834216118' ]]
28+
[[ `md5sum social_network/dynamic/person_isLocatedIn_place_0_0.csv | cut -d' ' -f1` == 'ff540b6cd8d20e5f21461531404c6861' ]]
29+
[[ `md5sum social_network/dynamic/person_knows_person_0_0.csv | cut -d' ' -f1` == 'e8ae7a263e040f9b2b475ebc1e788fe9' ]]
30+
[[ `md5sum social_network/dynamic/person_likes_comment_0_0.csv | cut -d' ' -f1` == '6d9412934d7e9882fc4154ec74ecf952' ]]
31+
[[ `md5sum social_network/dynamic/person_likes_post_0_0.csv | cut -d' ' -f1` == 'e0d460d75f5e60b32e0bb544d2d83951' ]]
32+
[[ `md5sum social_network/dynamic/person_speaks_language_0_0.csv | cut -d' ' -f1` == '474b809d7e756dae80f592f454d33fc9' ]]
33+
[[ `md5sum social_network/dynamic/person_studyAt_organisation_0_0.csv | cut -d' ' -f1` == 'cb2313acb51295ebe7fee6176467d2f5' ]]
34+
[[ `md5sum social_network/dynamic/person_workAt_organisation_0_0.csv | cut -d' ' -f1` == 'ef5a7e95d9870e5715e7f9f4706ee349' ]]
35+
[[ `md5sum social_network/dynamic/post_0_0.csv | cut -d' ' -f1` == '29c261d1b8f87e255e723b58174225da' ]]
36+
[[ `md5sum social_network/dynamic/post_hasCreator_person_0_0.csv | cut -d' ' -f1` == 'f9d258616bb971dc7636e880fda1d146' ]]
37+
[[ `md5sum social_network/dynamic/post_hasTag_tag_0_0.csv | cut -d' ' -f1` == '1be77b94b9559f4a17a04b996df36638' ]]
38+
[[ `md5sum social_network/dynamic/post_isLocatedIn_place_0_0.csv | cut -d' ' -f1` == 'fabbb5c524946bc65416a7ffbdd065c5' ]]
39+
40+
[[ `md5sum social_network/updateStream_0_0_forum.csv | cut -d' ' -f1` == '7e00243f68a8171974eabe4ac37df86b' ]]
41+
[[ `md5sum social_network/updateStream_0_0_person.csv | cut -d' ' -f1` == '2e1f44e6d48112a9fd87092206153b57' ]]
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
#!/bin/bash
2+
3+
set -e
4+
5+
[[ `md5sum social_network/static/organisation_0_0.csv | cut -d' ' -f1` == 'ebe578bcc856dc7bfea164b4fa194cbc' ]]
6+
[[ `md5sum social_network/static/place_0_0.csv | cut -d' ' -f1` == 'a4f5c04146330d3a11b0441522cd6d74' ]]
7+
[[ `md5sum social_network/static/tag_0_0.csv | cut -d' ' -f1` == '2ecf6319e808d1cdd45d917ddaab31e1' ]]
8+
[[ `md5sum social_network/static/tagclass_0_0.csv | cut -d' ' -f1` == '2a9c260bd69680a82a6369b62e846c71' ]]
9+
10+
[[ `md5sum social_network/dynamic/comment_0_0.csv | cut -d' ' -f1` == '8fc7a251596079be643a5cd6b3163b5d' ]]
11+
[[ `md5sum social_network/dynamic/comment_hasTag_tag_0_0.csv | cut -d' ' -f1` == 'f434548899ea8218ef2e351988d0f117' ]]
12+
[[ `md5sum social_network/dynamic/forum_0_0.csv | cut -d' ' -f1` == 'ed77ae23647dbb76ebc9915af5e7e34e' ]]
13+
[[ `md5sum social_network/dynamic/forum_hasMember_person_0_0.csv | cut -d' ' -f1` == 'b9b5327212d6e12734da6d9e7340b62b' ]]
14+
[[ `md5sum social_network/dynamic/forum_hasTag_tag_0_0.csv | cut -d' ' -f1` == 'de076e0181c59885b72755f20e73b0e3' ]]
15+
[[ `md5sum social_network/dynamic/person_0_0.csv | cut -d' ' -f1` == '23e5a120ddd42abeaa541b8be7758334' ]]
16+
[[ `md5sum social_network/dynamic/person_hasInterest_tag_0_0.csv | cut -d' ' -f1` == '42e0e4d51cd4fb56b18cd20834216118' ]]
17+
[[ `md5sum social_network/dynamic/person_knows_person_0_0.csv | cut -d' ' -f1` == 'e8ae7a263e040f9b2b475ebc1e788fe9' ]]
18+
[[ `md5sum social_network/dynamic/person_likes_comment_0_0.csv | cut -d' ' -f1` == '6d9412934d7e9882fc4154ec74ecf952' ]]
19+
[[ `md5sum social_network/dynamic/person_likes_post_0_0.csv | cut -d' ' -f1` == 'e0d460d75f5e60b32e0bb544d2d83951' ]]
20+
[[ `md5sum social_network/dynamic/person_studyAt_organisation_0_0.csv | cut -d' ' -f1` == 'cb2313acb51295ebe7fee6176467d2f5' ]]
21+
[[ `md5sum social_network/dynamic/person_workAt_organisation_0_0.csv | cut -d' ' -f1` == 'ef5a7e95d9870e5715e7f9f4706ee349' ]]
22+
[[ `md5sum social_network/dynamic/post_0_0.csv | cut -d' ' -f1` == '11a2d5e2d26fbd60631b15a100d46eda' ]]
23+
[[ `md5sum social_network/dynamic/post_hasTag_tag_0_0.csv | cut -d' ' -f1` == '1be77b94b9559f4a17a04b996df36638' ]]
24+
25+
[[ `md5sum social_network/updateStream_0_0_forum.csv | cut -d' ' -f1` == '7e00243f68a8171974eabe4ac37df86b' ]]
26+
[[ `md5sum social_network/updateStream_0_0_person.csv | cut -d' ' -f1` == '2e1f44e6d48112a9fd87092206153b57' ]]

0 commit comments

Comments
 (0)