Skip to content

Commit 1b89140

Browse files
committed
remove "Two Pass" stats to be consistent with our current configuration and integration.
1 parent c5cc670 commit 1b89140

File tree

1 file changed

+14
-23
lines changed

1 file changed

+14
-23
lines changed

docs/deploy-and-configure/requirements/graph-insights-sizing.md

Lines changed: 14 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -10,33 +10,25 @@ This section is intended to provide assistance in estimating the required memory
1010
The following is a statistical evaluation of RDF Graph Insights on the indexing speed and the memory requirements.
1111
For this, we considered altogether 26 datasets with up to 352M triples.
1212
The benchmark has been conducted for different JVM memory allocations in order to roughly estimate the memory requirements to support a desired amount of triples.
13-
Moreover, it compares our two indexing methods, namely the "one pass" and "two pass" approaches.
14-
In particular, we generate the index (compressed dictionary and triples) in a single parsing iteration (namely "one pass": faster, higher memory consumption) or in two separate parsing iterations ("two pass": slower, but less memory consumption).
13+
We generate the index (compressed dictionary and triples) in a single parsing iteration (namely "one pass": faster, higher memory consumption).
1514

1615
For the experiments, we used Graph Insights v16.0.1 and conducted them on an Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz, 6 cores with 2 threads per core, 64 GB DDR3 @ 1334MHz.
1716

1817
## Memory and Disk Space Requirements
1918

20-
The following table should be read as a lookup table: Assuming the JVM is allocated with a certain amount of memory (**JVM Memory (GB)**), how many triples can you expect to be able to index (**Max. Num. Triples**) with RDF Graph Insights? Please note that a comparison of memory consumption of "one pass" against "two pass" for a specific memory setting should be treated with caution, as the results often refer to a different number of datasets.
19+
The following table should be read as a lookup table: Assuming the JVM is allocated with a certain amount of memory (**JVM Memory (GB)**), how many triples can you expect to be able to index (**Max. Num. Triples**) with RDF Graph Insights?
2120
Furthermore, this table also lists the initial memory allocation for loading an existing index into Graph Insights for exploration.
2221
Since Graph Insights uses caching for performance reasons the latter will increase over time up to the given allocation limit.
2322

24-
| Approach | JVM Memory (GB) | Num. Datasets | Max. Num. Triples | Max. Num. Classes | Max. Num. Instances | Indexing (MB) | Exploration after restart (MB) |
25-
| :-------: | --------------: | ------------: | ----------------: | ----------------: | ------------------: | ------------: | ----------------------------: |
26-
| One Pass | 1 | 16 | 5,344,375 | 809 | 776,845 | 787 | 185 |
27-
| Two Pass | 1 | 16 | 5,344,414 | 809 | 776,845 | 741 | 184 |
28-
| One Pass | 5 | 19 | 19,903,402 | 809 | 1,583,073 | 2,839 | 238 |
29-
| Two Pass | 5 | 19 | 19,903,402 | 809 | 1,583,073 | 4,174 | 238 |
30-
| One Pass | 10 | 22 | 72,820,690 | 809 | 5,401,200 | 9,749 | 1,485 |
31-
| Two Pass | 10 | 23 | 152,528,023 | 809 | 19,837,857 | 10,011 | 2,749 |
32-
| One Pass | 15 | 22 | 72,820,690 | 809 | 5,401,200 | 10,446 | 1,485 |
33-
| Two Pass | 15 | 24 | 158,962,783 | 10,193 | 19,837,857 | 14,827 | 3,633 |
34-
| One Pass | 20 | 23 | 152,528,023 | 809 | 19,837,857 | 19,852 | 2,846 |
35-
| Two Pass | 20 | 25 | 257,831,425 | 10,193 | 50,567,464 | 19,421 | 5,833 |
36-
| One Pass | 30 | 24 | 158,962,783 | 10,193 | 19,837,857 | 28,582 | 3,762 |
37-
| Two Pass | 30 | 25 | 257,831,425 | 10,193 | 50,567,464 | 28,736 | 5,834 |
38-
| One Pass | 40 | 24 | 158,962,783 | 10,193 | 19,837,857 | 36,022 | 4,043 |
39-
| Two Pass | 40 | 26 | 352,417,591 | 10,193 | 50,567,464 | 40,689 | 17,059 |
23+
| JVM Memory (GB) | Num. Datasets | Max. Num. Triples | Max. Num. Classes | Max. Num. Instances | Indexing (MB) | Exploration after restart (MB) |
24+
| --------------: | ------------: | ----------------: | ----------------: | ------------------: | ------------: | ----------------------------: |
25+
| 1 | 16 | 5,344,375 | 809 | 776,845 | 787 | 185 |
26+
| 5 | 19 | 19,903,402 | 809 | 1,583,073 | 2,839 | 238 |
27+
| 10 | 22 | 72,820,690 | 809 | 5,401,200 | 9,749 | 1,485 |
28+
| 15 | 22 | 72,820,690 | 809 | 5,401,200 | 10,446 | 1,485 |
29+
| 20 | 23 | 152,528,023 | 809 | 19,837,857 | 19,852 | 2,846 |
30+
| 30 | 24 | 158,962,783 | 10,193 | 19,837,857 | 28,582 | 3,762 |
31+
| 40 | 24 | 158,962,783 | 10,193 | 19,837,857 | 36,022 | 4,043 |
4032

4133
### Disk Space Considerations
4234

@@ -56,10 +48,9 @@ Assuming our dataset has 10,000,000 triples and we are using the "one pass" appr
5648
As can be seen, the maximum indexing throughput is much higher (factor `~3`) since the individual speed depends on the dataset and its inherent characteristics such as the depth of the class and property hierarchy or the number of object property assertions in connection with the reasoning mode of Graph Insights.
5749
As soon as an index has been created and saved on disk, it only takes a fraction of the indexing time to load it into memory for all subsequent calls of Graph Insights.
5850

59-
| Approach | Mean Triples / Second | Maximum Triples / Second |
60-
| :------: | --------------------: | -----------------------: |
61-
| One pass | 50.5K | 182.0K |
62-
| Two pass | 31.0K | 104.3K |
51+
| Mean Triples / Second | Maximum Triples / Second |
52+
| --------------------: | -----------------------: |
53+
| 50.5K | 182.0K |
6354

6455
## CPU Usage
6556

0 commit comments

Comments
 (0)