Skip to content

Commit 4f296a6

Browse files
authored
Update README.md
1 parent 01d3105 commit 4f296a6

File tree

1 file changed

+20
-13
lines changed

1 file changed

+20
-13
lines changed

README.md

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ LDBC-SNB Data Generator
66
[![Build Status](https://travis-ci.org/ldbc/ldbc_snb_datagen.svg?branch=master)](https://travis-ci.org/ldbc/ldbc_snb_datagen)
77
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/5b0c677c9c4c4de3b6af15f118c9212c)](https://www.codacy.com/app/ArnauPrat/ldbc_snb_datagen?utm_source=github.com&utm_medium=referral&utm_content=ldbc/ldbc_snb_datagen&utm_campaign=Badge_Grade)
88

9-
The LDBC-SNB Data Generator (DATAGEN) is the responsible of providing the data sets used by all the LDBC benchmarks. This data generator is designed to produce directed labeled graphs that mimic the characteristics of those graphs of real data. A detailed description of the schema produced by datagen, as well as the format of the output files, can be found in the latest version of official [LDBC SNB specification document](https://github.com/ldbc/ldbc_snb_docs).
9+
The LDBC-SNB Data Generator (Datagen) is the responsible of providing the data sets used by all the LDBC benchmarks. This data generator is designed to produce directed labeled graphs that mimic the characteristics of those graphs of real data. A detailed description of the schema produced by Datagen, as well as the format of the output files, can be found in the latest version of official [LDBC SNB specification document](https://github.com/ldbc/ldbc_snb_docs).
1010

1111

12-
ldbc_snb_datagen is part of the [LDBC project](http://www.ldbcouncil.org/).
13-
ldbc_snb_datagen is GPLv3 licensed, to see detailed information about this license read the `LICENSE.txt` file.
12+
`ldbc_snb_datagen` is part of the [LDBC project](http://www.ldbcouncil.org/).
13+
`ldbc_snb_datagen` is GPLv3 licensed, to see detailed information about this license read the `LICENSE.txt` file.
1414

1515
* **[Releases](https://github.com/ldbc/ldbc_snb_datagen/releases)**
1616
* **[Configuration](https://github.com/ldbc/ldbc_snb_datagen/wiki/Configuration)**
@@ -21,6 +21,13 @@ ldbc_snb_datagen is GPLv3 licensed, to see detailed information about this licen
2121

2222
## Quick start
2323

24+
There are three main ways to run Datagen:
25+
(1) using a pseudo-distributed Hadoop installation,
26+
(2) running the same setup in a Docker image,
27+
(3) running on a distributed Hadoop cluster.
28+
29+
### Pseudo-distributed Hadoop node
30+
2431
```bash
2532
wget http://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/hadoop-2.6.0.tar.gz
2633
tar xf hadoop-2.6.0.tar.gz
@@ -33,36 +40,36 @@ cd $LDBC_SNB_DATAGEN_HOME
3340
./run.sh
3441
```
3542

36-
## Docker image
43+
### Docker image
3744

3845
The image can be simply built with the provided Dockerfile.
3946
To build, execute the following command from the repository directory:
40-
```
47+
48+
```bash
4149
docker build . --tag ldbc/datagen
4250
```
4351

44-
### Options
52+
#### Configuration
4553

4654
To configure the amount of memory available, set the `HADOOP_CLIENT_OPTS` variable in the Dockerfile. The default value is `-Xmx8G`.
4755

48-
### Running
56+
#### Running
4957

5058
In order to run the container, a `params.ini` file is required. For reference, please see the `params*.ini` files in the repository. The file will be mounted in the container by the `--mount type=bind,source="$(pwd)/params.ini,target="/opt/ldbc_snb_datagen/params.ini"` option. If required, the source path can be set to a different path.
5159

52-
The container will output it's results in the `/opt/ldbc_snb_datagen/social_network/` directory. In order to save the results of the generation, a directory must be mounted in the container from the host:
60+
The container will output its results in the `/opt/ldbc_snb_datagen/social_network/` directory. In order to save the results of the generation, a directory must be mounted in the container from the host:
5361

54-
```
62+
```bash
5563
mkdir datagen_output
56-
5764
docker run --rm --mount type=bind,source="$(pwd)/datagen_output/",target="/opt/ldbc_snb_datagen/social_network/" --mount type=bind,source="$(pwd)/params.ini",target="/opt/ldbc_snb_datagen/params.ini" ldbc/datagen
5865
```
5966

6067
If the memory limit has to be raised, the `-e HADOOP_CLIENT_OPTS="-Xmx..."` parameter can override the default `-Xmx8G` value.
6168

62-
<!-- Publicly available datasets can be found at the LDBC-SNB Amazon Bucket. These datasets are the official SNB datasets and were generated using version 0.2.6. They are available in the three official supported serializers: CSV, CSVMergeForeign and TTL. The bucket is configured in "Requester Pays" mode, thus in order to access them you need a properly set up AWS client.
63-
* http://ldbc-snb.s3.amazonaws.com/ -->
69+
### Hadoop cluster
6470

65-
**Community provided tools**
71+
Instructions are currently not provided. (TBD)
6672

73+
### Community provided tools
6774

6875
* **[Apache Flink Loader:](https://github.com/s1ck/ldbc-flink-import)** A loader of LDBC datasets for Apache Flink.

0 commit comments

Comments
 (0)