Skip to content

Commit 992feb2

Browse files
authored
Merge branch 'master' into build-docker-in-travis
2 parents 11fde97 + 4f296a6 commit 992feb2

File tree

4 files changed

+38
-45
lines changed

4 files changed

+38
-45
lines changed

.travis.yml

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,25 @@ script:
1616
- mvn test | grep "^\\[" | tee mvn.log
1717
# test if the output of the mvn command contained an "[INFO] BUILD SUCCESS" entry
1818
- grep 'BUILD SUCCESS' mvn.log
19-
- ls -al test_data
20-
#- if [ "$TRAVIS_BRANCH" = "master" -a "$TRAVIS_PULL_REQUEST" = "false" ]; then ./deploy.sh; fi
19+
# generate SF1 data set
20+
- wget -q http://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/hadoop-2.6.0.tar.gz
21+
- tar xf hadoop-2.6.0.tar.gz
22+
- export HADOOP_CLIENT_OPTS="-Xmx2G"
23+
- export HADOOP_HOME=`pwd`/hadoop-2.6.0
24+
- export LDBC_SNB_DATAGEN_HOME=`pwd`
25+
- ./run.sh
26+
- mkdir out
27+
- cp -r substitution_parameters out/
2128
notifications:
2229
slack: ldbcouncil:OrBanrJ7l0EHQbj8T5YdJYhd
2330
email: false
2431
on_success: change
2532
on_failure: always
33+
deploy:
34+
provider: pages
35+
skip-cleanup: true
36+
local-dir: out
37+
github-token: $GITHUB_TOKEN
38+
keep: true
39+
on:
40+
branch: master

README.md

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ LDBC-SNB Data Generator
66
[![Build Status](https://travis-ci.org/ldbc/ldbc_snb_datagen.svg?branch=master)](https://travis-ci.org/ldbc/ldbc_snb_datagen)
77
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/5b0c677c9c4c4de3b6af15f118c9212c)](https://www.codacy.com/app/ArnauPrat/ldbc_snb_datagen?utm_source=github.com&utm_medium=referral&utm_content=ldbc/ldbc_snb_datagen&utm_campaign=Badge_Grade)
88

9-
The LDBC-SNB Data Generator (DATAGEN) is the responsible of providing the data sets used by all the LDBC benchmarks. This data generator is designed to produce directed labeled graphs that mimic the characteristics of those graphs of real data. A detailed description of the schema produced by datagen, as well as the format of the output files, can be found in the latest version of official [LDBC SNB specification document](https://github.com/ldbc/ldbc_snb_docs).
9+
The LDBC-SNB Data Generator (Datagen) is the responsible of providing the data sets used by all the LDBC benchmarks. This data generator is designed to produce directed labeled graphs that mimic the characteristics of those graphs of real data. A detailed description of the schema produced by Datagen, as well as the format of the output files, can be found in the latest version of official [LDBC SNB specification document](https://github.com/ldbc/ldbc_snb_docs).
1010

1111

12-
ldbc_snb_datagen is part of the [LDBC project](http://www.ldbcouncil.org/).
13-
ldbc_snb_datagen is GPLv3 licensed, to see detailed information about this license read the `LICENSE.txt` file.
12+
`ldbc_snb_datagen` is part of the [LDBC project](http://www.ldbcouncil.org/).
13+
`ldbc_snb_datagen` is GPLv3 licensed, to see detailed information about this license read the `LICENSE.txt` file.
1414

1515
* **[Releases](https://github.com/ldbc/ldbc_snb_datagen/releases)**
1616
* **[Configuration](https://github.com/ldbc/ldbc_snb_datagen/wiki/Configuration)**
@@ -21,6 +21,13 @@ ldbc_snb_datagen is GPLv3 licensed, to see detailed information about this licen
2121

2222
## Quick start
2323

24+
There are three main ways to run Datagen:
25+
(1) using a pseudo-distributed Hadoop installation,
26+
(2) running the same setup in a Docker image,
27+
(3) running on a distributed Hadoop cluster.
28+
29+
### Pseudo-distributed Hadoop node
30+
2431
```bash
2532
wget http://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/hadoop-2.6.0.tar.gz
2633
tar xf hadoop-2.6.0.tar.gz
@@ -33,36 +40,36 @@ cd $LDBC_SNB_DATAGEN_HOME
3340
./run.sh
3441
```
3542

36-
## Docker image
43+
### Docker image
3744

3845
The image can be simply built with the provided Dockerfile.
3946
To build, execute the following command from the repository directory:
40-
```
47+
48+
```bash
4149
docker build . --tag ldbc/datagen
4250
```
4351

44-
### Options
52+
#### Configuration
4553

4654
To configure the amount of memory available, set the `HADOOP_CLIENT_OPTS` variable in the Dockerfile. The default value is `-Xmx8G`.
4755

48-
### Running
56+
#### Running
4957

5058
In order to run the container, a `params.ini` file is required. For reference, please see the `params*.ini` files in the repository. The file will be mounted in the container by the `--mount type=bind,source="$(pwd)/params.ini,target="/opt/ldbc_snb_datagen/params.ini"` option. If required, the source path can be set to a different path.
5159

52-
The container will output it's results in the `/opt/ldbc_snb_datagen/social_network/` directory. In order to save the results of the generation, a directory must be mounted in the container from the host:
60+
The container will output its results in the `/opt/ldbc_snb_datagen/social_network/` directory. In order to save the results of the generation, a directory must be mounted in the container from the host:
5361

54-
```
62+
```bash
5563
mkdir datagen_output
56-
5764
docker run --rm --mount type=bind,source="$(pwd)/datagen_output/",target="/opt/ldbc_snb_datagen/social_network/" --mount type=bind,source="$(pwd)/params.ini",target="/opt/ldbc_snb_datagen/params.ini" ldbc/datagen
5865
```
5966

6067
If the memory limit has to be raised, the `-e HADOOP_CLIENT_OPTS="-Xmx..."` parameter can override the default `-Xmx8G` value.
6168

62-
<!-- Publicly available datasets can be found at the LDBC-SNB Amazon Bucket. These datasets are the official SNB datasets and were generated using version 0.2.6. They are available in the three official supported serializers: CSV, CSVMergeForeign and TTL. The bucket is configured in "Requester Pays" mode, thus in order to access them you need a properly set up AWS client.
63-
* http://ldbc-snb.s3.amazonaws.com/ -->
69+
### Hadoop cluster
6470

65-
**Community provided tools**
71+
Instructions are currently not provided. (TBD)
6672

73+
### Community provided tools
6774

6875
* **[Apache Flink Loader:](https://github.com/s1ck/ldbc-flink-import)** A loader of LDBC datasets for Apache Flink.

deploy.sh

Lines changed: 0 additions & 29 deletions
This file was deleted.

docker_run.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#o!/bin/bash
1+
#!/bin/bash
22

33
set -e
44

0 commit comments

Comments
 (0)