You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+20-18Lines changed: 20 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,16 +21,27 @@ The LDBC-SNB Data Generator (Datagen) is the responsible of providing the data s
21
21
22
22
## Quick start
23
23
24
-
There are three main ways to run Datagen:
25
-
(1) using a pseudo-distributed Hadoop installation,
26
-
(2) running the same setup in a Docker image,
27
-
(3) running on a distributed Hadoop cluster.
24
+
### Configuration
25
+
26
+
Initialize the `params.ini` file as needed. For example, to generate the basic CSV files, issue:
27
+
28
+
```bash
29
+
cp params-csv.ini params.ini
30
+
```
31
+
32
+
There are three main ways to run Datagen, each using a different approach to configure the amount of memory available.
33
+
34
+
1. using a pseudo-distributed Hadoop installation,
35
+
2. running the same setup in a Docker image,
36
+
3. running on a distributed Hadoop cluster.
28
37
29
38
### Pseudo-distributed Hadoop node
30
39
31
-
To grab Hadoop, extract it, and set the environment values to sensible defaults, and generate the data as specified in the `params.ini` file, run the following script:
40
+
To configure the amount of memory available, set the `HADOOP_CLIENT_OPTS` environment variable.
41
+
To grab Hadoop, extract it, and set the environment values to sensible defaults, and generate the data as specified in the `params-csv.ini` file, run the following script:
SNB datagen images are available via [DockerHub](https://hub.docker.com/r/ldbc/datagen/) where you may find both the latest version of the generator as well as previous stable versions.
56
+
57
+
SNB datagen images are available via [Docker Hub](https://hub.docker.com/r/ldbc/datagen/) where you may find both the latest version of the generator as well as previous stable versions.
46
58
47
59
Alternatively, the image can be built with the provided Dockerfile. To build, execute the following command from the repository directory:
48
60
49
61
```bash
50
62
docker build . --tag ldbc/datagen
51
63
```
52
64
53
-
#### Configuration
54
-
55
-
To configure the amount of memory available, set the `HADOOP_CLIENT_OPTS` variable in the Dockerfile. The default value is `-Xmx2G`. If you are using a the precompiled image, you can the `-e HADOOP_CLIENT_OPTS=` flag when running (as described below).
56
-
57
-
Initialize the `params.ini` file as needed. For example, to generate the basic CSV files, issue:
58
-
59
-
```bash
60
-
cp params-csv.ini params.ini
61
-
```
62
-
63
65
#### Running
64
66
65
-
In order to run the container, a `params.ini`file is required. For reference, please see the `params*.ini` files in the repository. The file will be mounted in the container by the `--mount type=bind,source="$(pwd)/params.ini,target="/opt/ldbc_snb_datagen/params.ini"` option. If required, the source path can be set to a different path.
67
+
Set the `params.ini`in the repository as for the pseudo-distributed case. The file will be mounted in the container by the `--mount type=bind,source="$(pwd)/params.ini,target="/opt/ldbc_snb_datagen/params.ini"` option. If required, the source path can be set to a different path.
66
68
67
69
The container outputs its results in the `/opt/ldbc_snb_datagen/out/` directory which contains two sub-directories, `social_network/` and `subsitution_parameters`. In order to save the results of the generation, a directory must be mounted in the container from the host. The driver requires the results be in the datagen repository directory. To generate the data, run the following command which includes changing the owner (`chown`) of the Docker-mounted volumes:
68
70
@@ -75,7 +77,7 @@ If you need to raise the memory limit, use the `-e HADOOP_CLIENT_OPTS="-Xmx..."`
0 commit comments