Skip to content

Commit 47f1c15

Browse files
committed
Update README
1 parent 397bd89 commit 47f1c15

File tree

1 file changed

+20
-18
lines changed

1 file changed

+20
-18
lines changed

README.md

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -21,16 +21,27 @@ The LDBC-SNB Data Generator (Datagen) is the responsible of providing the data s
2121

2222
## Quick start
2323

24-
There are three main ways to run Datagen:
25-
(1) using a pseudo-distributed Hadoop installation,
26-
(2) running the same setup in a Docker image,
27-
(3) running on a distributed Hadoop cluster.
24+
### Configuration
25+
26+
Initialize the `params.ini` file as needed. For example, to generate the basic CSV files, issue:
27+
28+
```bash
29+
cp params-csv.ini params.ini
30+
```
31+
32+
There are three main ways to run Datagen, each using a different approach to configure the amount of memory available.
33+
34+
1. using a pseudo-distributed Hadoop installation,
35+
2. running the same setup in a Docker image,
36+
3. running on a distributed Hadoop cluster.
2837

2938
### Pseudo-distributed Hadoop node
3039

31-
To grab Hadoop, extract it, and set the environment values to sensible defaults, and generate the data as specified in the `params.ini` file, run the following script:
40+
To configure the amount of memory available, set the `HADOOP_CLIENT_OPTS` environment variable.
41+
To grab Hadoop, extract it, and set the environment values to sensible defaults, and generate the data as specified in the `params-csv.ini` file, run the following script:
3242

3343
```bash
44+
cp params-csv.ini params.ini
3445
wget http://archive.apache.org/dist/hadoop/core/hadoop-2.9.2/hadoop-2.9.2.tar.gz
3546
tar xf hadoop-2.9.2.tar.gz
3647
export HADOOP_CLIENT_OPTS="-Xmx2G"
@@ -42,27 +53,18 @@ export LDBC_SNB_DATAGEN_HOME=`pwd`
4253
```
4354

4455
### Docker image
45-
SNB datagen images are available via [DockerHub](https://hub.docker.com/r/ldbc/datagen/) where you may find both the latest version of the generator as well as previous stable versions.
56+
57+
SNB datagen images are available via [Docker Hub](https://hub.docker.com/r/ldbc/datagen/) where you may find both the latest version of the generator as well as previous stable versions.
4658

4759
Alternatively, the image can be built with the provided Dockerfile. To build, execute the following command from the repository directory:
4860

4961
```bash
5062
docker build . --tag ldbc/datagen
5163
```
5264

53-
#### Configuration
54-
55-
To configure the amount of memory available, set the `HADOOP_CLIENT_OPTS` variable in the Dockerfile. The default value is `-Xmx2G`. If you are using a the precompiled image, you can the `-e HADOOP_CLIENT_OPTS=` flag when running (as described below).
56-
57-
Initialize the `params.ini` file as needed. For example, to generate the basic CSV files, issue:
58-
59-
```bash
60-
cp params-csv.ini params.ini
61-
```
62-
6365
#### Running
6466

65-
In order to run the container, a `params.ini` file is required. For reference, please see the `params*.ini` files in the repository. The file will be mounted in the container by the `--mount type=bind,source="$(pwd)/params.ini,target="/opt/ldbc_snb_datagen/params.ini"` option. If required, the source path can be set to a different path.
67+
Set the `params.ini` in the repository as for the pseudo-distributed case. The file will be mounted in the container by the `--mount type=bind,source="$(pwd)/params.ini,target="/opt/ldbc_snb_datagen/params.ini"` option. If required, the source path can be set to a different path.
6668

6769
The container outputs its results in the `/opt/ldbc_snb_datagen/out/` directory which contains two sub-directories, `social_network/` and `subsitution_parameters`. In order to save the results of the generation, a directory must be mounted in the container from the host. The driver requires the results be in the datagen repository directory. To generate the data, run the following command which includes changing the owner (`chown`) of the Docker-mounted volumes:
6870

@@ -75,7 +77,7 @@ If you need to raise the memory limit, use the `-e HADOOP_CLIENT_OPTS="-Xmx..."`
7577

7678
### Hadoop cluster
7779

78-
Instructions are currently not provided. (TBD)
80+
Instructions are currently not provided.
7981

8082
### Community provided tools
8183

0 commit comments

Comments
 (0)