You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Social Network Benchmark: DBGEN dataset generator and QGEN workload generator
4
+
Social Network Benchmark: [DBGEN](https://github.com/ldbc/ldbc_socialnet_bm/tree/master/ldbc_socialnet_dbgen) dataset generator and QGEN workload generator
The LDBC Social Network Dataset Generator (SNDG) is the responsible of providing the data sets used by all the LDBC benchmarks. This dataset generator is designed to produce directed labeled graphs that mimic the characteristics of those graphs of real data. A detailed description of the generator can be found in the following pages:
4
+
5
+
* In **[Data Schema](https://github.com/ldbc/ldbc_socialnet_bm/wiki/Data-Schema)**, a description of the schema of the data produced by the generator.
6
+
* In **[Data Generation Process](https://github.com/ldbc/ldbc_socialnet_bm/wiki/Data-Generation)**, information about the generation process of the data.
7
+
* In **[Data Output](https://github.com/ldbc/ldbc_socialnet_bm/wiki/Data-Output)**, a description of the contents and the format of the files produced by the generator.
8
+
9
+
3
10
ldbc_socialnet_dbgen is part of the LDBC project (http://www.ldbc.eu/).
4
11
ldbc_socialnet_dbgen is GPLv3 licensed, to see detailed information about this license read the LICENSE.txt.
5
12
6
-
This software was build using Apache hadoop version 1.0.3 and we not guarantee compatibility with newer releases.
7
-
You can download hadoop 1.0.3 from http://archive.apache.org/dist/hadoop/core/hadoop-1.0.3/
8
13
14
+
## Requirements
15
+
16
+
This software is build using Apache hadoop version 1.0.3 and we not guarantee compatibility with newer releases.
17
+
You can download hadoop 1.0.3 from [here](http://archive.apache.org/dist/hadoop/core/hadoop-1.0.3/). To Configure your hadoop machine or cluster, please visit [here](http://hadoop.apache.org/docs/stable/index.html).
9
18
10
-
## Compilation
11
19
12
-
The compilation uses Apache Maven to automatically detect and download the necessary dependencies. See: maven.apache.org.
20
+
## Compilation
13
21
14
-
Make sure you are in your ldbc_socialnet_bm/ldbc_socialnet_dbgen/ project folder.
15
-
To generate the jar containing all the dependencies the following maven instruction is used:
22
+
The compilation uses [Apache Maven](http://maven.apache.org) to automatically detect and download the necessary dependencies. Make sure you are in your ldbc_socialnet_bm/ldbc_socialnet_dbgen/ project folder.
23
+
To generate the jar containing all the dependencies, type
16
24
25
+
```
17
26
mvn assembly:assembly
27
+
```
18
28
19
-
This can lead to the generation of two jars in the target folder the default one called ldbc_socialnet_dbgen-<Version-Number>.jar or the one containing all the dependencies inside the jar called ldbc_socialnet_dbgen.jar.
29
+
This can lead to the generation of two jars in the target folder: the default one called ldbc_socialnet_dbgen-\<Version-Number\>.jar or the one containing all the dependencies inside the jar called ldbc_socialnet_dbgen.jar.
20
30
21
31
22
32
## Configuration
23
33
24
-
* Configure your hadoop machine or cluster. For more information on how to do it, please refer its official page http://hadoop.apache.org/docs/stable/index.html
34
+
The SNDG is configured by means of the ldbc\_socialnet\_bm/ldbc\_socialnet\_dbgen/_params.init_ file. Set the parameters properly to meet your needs. This file has the following format.
25
35
26
-
* Configure the params.ini to your needs. This file contains:
27
-
- numtotalUser: The number of users the social network will have. It shoud be bigger than 1000.
28
-
- startYear: The first year.
29
-
- numYears: The period of years.
30
-
- serializerType: The serializer type has to be one of this three values: ttl (Turtle format), n3 (N3 format), csv (coma separated value).
31
-
- rdfOutputFileName: The base name for the files generated in rdf format (Turtle and N3)
36
+
```
37
+
numtotalUser: #The number of users the social network will have. It shoud be bigger than 1000.
38
+
startYear: #The first year.
39
+
numYears: #The period of years.
40
+
serializerType: #The serializer type has to be one of this three values: ttl (Turtle format), n3 (N3 format), csv (coma separated value).
41
+
rdfOutputFileName: #The base name for the files generated in rdf format (Turtle and N3)
42
+
```
32
43
33
-
This configuration will generate for the startYear-01-01 to the (startYear+numYears)-01-01 period activity in the simulated social network for the amount of users configurated.
44
+
This configuration will generate a database for the startYear-01-01 to the (startYear+numYears)-01-01 period activity in the simulated social network for the amount of users configurated.
34
45
35
46
36
47
## Execution
@@ -41,11 +52,13 @@ Terminology:
41
52
* $HADOOP_HOME is used to refer to the hadoop-1.0.3 folder in your system.
42
53
* $LDBC_SOCIALNET_DBGEN_HOME is used to refer to the ldbc_socialnet_dbgen folder in your system.
43
54
44
-
The execution instruction is:
55
+
To execute the generator, please type:
45
56
57
+
```
46
58
$HADOOP_HOME/bin/hadoop jar $LDBC_SOCIALNET_DBGEN_HOME/ldbc_socialnet_dbgen.jar hadoop_input_folder hadoop_output_folder Num_machines_ldbc_will_use $LDBC_SOCIALNET_DBGEN_HOME/ Final_output_folder
59
+
```
47
60
48
-
You can refer to the run.sh script to see a clearer example of how to run it.
61
+
In ldbc\_socialnet\_bm/ldbc\_socialnet\_dbgen/run.sh you can find a full example of how to compile and execute the SNDG.
49
62
50
63
## Output
51
-
The generator will create CSV files [with the following format](https://github.com/ldbc/ldbc_socialnet_bm/wiki/Generated-CSV-Files)
64
+
The generator can create data in three formats: CSV, TTL and N3. For more information please check the [wiki](https://github.com/ldbc/ldbc_socialnet_bm/wiki/Data-Output)
0 commit comments