You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The graph clustering framework VieClus -- Vienna Graph Clustering. Part of the [KaHIP](https://github.com/KaHIP) organization.
22
+
**VieClus** (Vienna Graph Clustering) is a memetic algorithm for high-quality graph clustering that optimizes modularity. It is the state-of-the-art solver for achieving the highest possible modularity values. Part of the [KaHIP](https://github.com/KaHIP) organization.
23
23
24
-
Graph clustering is the problem of detecting tightly connected regions of a
25
-
graph. Depending on the task, knowledge about the structure of the graph can
26
-
reveal information such as voter behavior, the formation of new trends, existing
27
-
terrorist groups and recruitment or a natural partitioning of
28
-
data records onto pages. Further application areas
29
-
include the study of protein interaction, gene
30
-
expression networks, fraud
31
-
detection, program optimization and the spread of
32
-
epidemics---possible applications are plentiful, as
33
-
almost all systems containing interacting or coexisting entities can be modeled
34
-
as a graph.
24
+
|||
25
+
|:--|:--|
26
+
|**What it solves**| Graph clustering: detecting tightly connected regions (communities) in a graph |
|**Parallel**| Optional MPI support for parallel evolutionary search |
35
32
33
+
<palign="center">
34
+
<img src="./img/example_clustering.png"
35
+
alt="Example: a graph with three detected clusters (red, cyan, yellow)"
36
+
width="400">
37
+
</p>
36
38
39
+
## Quick Start
37
40
38
-
This is the release of our memetic algorithm, VieClus (Vienna Graph Clustering), to tackle the graph clustering problem.
39
-
A key component of our contribution are natural recombine operators that employ ensemble clusterings as well as multi-level techniques.
40
-
In our experimental evaluation, we show that **our algorithm successfully improves or reproduces all entries of the 10th DIMACS implementation challenge** under consideration in a small amount of time. In fact, for most of the small instances, we can improve the old benchmark result in less than a minute.
41
-
Moreover, while the previous best result for different instances has been computed by a variety of solvers, our algorithm can now be used as a single tool to compute the result. **In short our solver is the currently best modularity based clustering algorithm available.**
MPI enables the parallel evolutionary algorithm which typically yields better solutions.
77
+
print(f"Modularity: {modularity}") # e.g. 0.41
78
+
print(f"Clustering: {clustering}") # e.g. [0, 0, 0, 1, 1, 1]
79
+
```
59
80
60
-
Prerequisites:
61
-
- OpenMPI (http://www.open-mpi.org/) -- note: due to removed progress threads in OpenMPI > 1.8, please use an OpenMPI version < 1.8 or Intel MPI to obtain a scalable parallel algorithm.
|`<graph-file>`| Path to graph in METIS format (see [Graph Format](#graph-format)) |*required*|
92
+
|`--time_limit=<double>`| Time limit in seconds. Must be > 0 to enable evolutionary recombination. |`0`|
93
+
|`--seed=<int>`| Random seed |`0`|
94
+
|`--output_filename=<string>`| Output file for the clustering |`tmpclustering`|
95
+
|`--help`| Print help ||
67
96
68
-
#### Without MPI (NOMPI)
97
+
**Included tools:**
69
98
70
-
If you do not have MPI installed or only need single-process execution, you can compile without MPI support. The algorithm will run on a single process using a pseudo-MPI layer.
99
+
| Program | Description |
100
+
|:--------|:-----------|
101
+
|`vieclus`| Main clustering algorithm |
102
+
|`graphchecker`| Validate that a graph file is correctly formatted |
103
+
|`evaluator`| Compute modularity of a given clustering: `./deploy/evaluator <graph> --input_partition=<clustering>`|
VieClus uses the **METIS graph format**, the same format used by [KaHIP](https://github.com/KaHIP/KaHIP), Metis, Chaco, and the 10th DIMACS Implementation Challenge.
236
+
237
+
### Input format
238
+
239
+
A plain text file with `n + 1` lines (excluding comments). Lines starting with `%` are comments and are skipped.
240
+
241
+
**Header line:**
242
+
```
243
+
n m [f]
244
+
```
245
+
-`n` = number of vertices, `m` = number of undirected edges
246
+
-`f` = format flag (optional): `0` = unweighted, `1` = edge weights, `10` = node weights, `11` = both
247
+
248
+
**Vertex lines (one per vertex):**
249
+
Each of the following `n` lines describes one vertex's adjacency list. For `f=1` (edge weights):
250
+
```
251
+
v1 w1 v2 w2 ...
252
+
```
253
+
where `v_i` are neighbor IDs (**1-indexed**) and `w_i` are edge weights.
254
+
255
+
**Example** (4 vertices, 5 edges, unweighted):
256
+
```
257
+
4 5
258
+
2 3
259
+
1 3 4
260
+
1 2 4
261
+
2 3
262
+
```
263
+
264
+
### Output format
265
+
266
+
The clustering output file contains `n` lines. Line `i` contains the cluster ID of vertex `i` (0-indexed). Cluster IDs are numbered consecutively from 0.
267
+
268
+
### Validating your graph
269
+
270
+
```bash
271
+
./deploy/graphchecker mygraph.graph
272
+
```
273
+
274
+
---
275
+
276
+
## How It Works
277
+
278
+
VieClus is a **memetic algorithm** that combines evolutionary search with multilevel graph clustering techniques:
279
+
280
+
1.**Multilevel approach**: The graph is recursively coarsened, an initial clustering is computed on the smallest graph, and local search improves the clustering at each level during uncoarsening.
281
+
2.**Evolutionary recombination**: A population of clusterings is maintained. Two parent clusterings are combined using an *ensemble clustering* overlay, where two vertices end up in the same cluster only if they agree in both parents.
282
+
3.**Parallel search** (with MPI): Multiple processes explore the solution space independently and exchange high-quality individuals, improving diversity and convergence.
283
+
284
+
More time and more MPI processes generally yield better modularity values. For details, see the [paper](https://arxiv.org/abs/1802.07034).
285
+
286
+
---
287
+
288
+
## Related Projects
289
+
290
+
| Project | Description |
291
+
|:--------|:-----------|
292
+
|[KaHIP](https://github.com/KaHIP/KaHIP)| Karlsruhe High Quality Graph Partitioning (flagship framework) |
293
+
|[CluStRE](https://github.com/KaHIP/CluStRE)| Fast streaming graph clustering |
0 commit comments