You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The LDBC-SNB Data Generator (DATAGEN) is the responsible of providing the data sets used by all the LDBC benchmarks. This data generator is designed to produce directed labeled graphs that mimic the characteristics of those graphs of real data. A detailed description of the schema produced by datagen, as well as the format of the output files, can be found in the latest version of official [LDBC SNB specification document](https://github.com/ldbc/ldbc_snb_docs).
9
+
The LDBC-SNB Data Generator (Datagen) is the responsible of providing the data sets used by all the LDBC benchmarks. This data generator is designed to produce directed labeled graphs that mimic the characteristics of those graphs of real data. A detailed description of the schema produced by Datagen, as well as the format of the output files, can be found in the latest version of official [LDBC SNB specification document](https://github.com/ldbc/ldbc_snb_docs).
10
10
11
11
12
-
ldbc_snb_datagen is part of the [LDBC project](http://www.ldbcouncil.org/).
13
-
ldbc_snb_datagen is GPLv3 licensed, to see detailed information about this license read the `LICENSE.txt` file.
12
+
`ldbc_snb_datagen` is part of the [LDBC project](http://www.ldbcouncil.org/).
13
+
`ldbc_snb_datagen` is GPLv3 licensed, to see detailed information about this license read the `LICENSE.txt` file.
The image can be simply built with the provided Dockerfile.
39
46
To build, execute the following command from the repository directory:
40
-
```
47
+
48
+
```bash
41
49
docker build . --tag ldbc/datagen
42
50
```
43
51
44
-
###Options
52
+
#### Configuration
45
53
46
54
To configure the amount of memory available, set the `HADOOP_CLIENT_OPTS` variable in the Dockerfile. The default value is `-Xmx8G`.
47
55
48
-
### Running
56
+
####Running
49
57
50
58
In order to run the container, a `params.ini` file is required. For reference, please see the `params*.ini` files in the repository. The file will be mounted in the container by the `--mount type=bind,source="$(pwd)/params.ini,target="/opt/ldbc_snb_datagen/params.ini"` option. If required, the source path can be set to a different path.
51
59
52
-
The container will output it's results in the `/opt/ldbc_snb_datagen/social_network/` directory. In order to save the results of the generation, a directory must be mounted in the container from the host:
60
+
The container will output its results in the `/opt/ldbc_snb_datagen/social_network/` directory. In order to save the results of the generation, a directory must be mounted in the container from the host:
53
61
54
-
```
62
+
```bash
55
63
mkdir datagen_output
56
-
57
64
docker run --rm --mount type=bind,source="$(pwd)/datagen_output/",target="/opt/ldbc_snb_datagen/social_network/" --mount type=bind,source="$(pwd)/params.ini",target="/opt/ldbc_snb_datagen/params.ini" ldbc/datagen
58
65
```
59
66
60
67
If the memory limit has to be raised, the `-e HADOOP_CLIENT_OPTS="-Xmx..."` parameter can override the default `-Xmx8G` value.
61
68
62
-
<!-- Publicly available datasets can be found at the LDBC-SNB Amazon Bucket. These datasets are the official SNB datasets and were generated using version 0.2.6. They are available in the three official supported serializers: CSV, CSVMergeForeign and TTL. The bucket is configured in "Requester Pays" mode, thus in order to access them you need a properly set up AWS client.
63
-
* http://ldbc-snb.s3.amazonaws.com/ -->
69
+
### Hadoop cluster
64
70
65
-
**Community provided tools**
71
+
Instructions are currently not provided. (TBD)
66
72
73
+
### Community provided tools
67
74
68
75
***[Apache Flink Loader:](https://github.com/s1ck/ldbc-flink-import)** A loader of LDBC datasets for Apache Flink.
0 commit comments