Skip to content

Commit 361e109

Browse files
Rewrite README for clarity, agent-friendliness, and structured documentation
1 parent 54ba21c commit 361e109

File tree

1 file changed

+234
-63
lines changed

1 file changed

+234
-63
lines changed

README.md

Lines changed: 234 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -19,80 +19,120 @@ VieClus v1.2
1919
<img src="./logo/vieclus-logo.svg" alt="VieClus Logo" width="900"/>
2020
</p>
2121

22-
The graph clustering framework VieClus -- Vienna Graph Clustering. Part of the [KaHIP](https://github.com/KaHIP) organization.
22+
**VieClus** (Vienna Graph Clustering) is a memetic algorithm for high-quality graph clustering that optimizes modularity. It is the state-of-the-art solver for achieving the highest possible modularity values. Part of the [KaHIP](https://github.com/KaHIP) organization.
2323

24-
Graph clustering is the problem of detecting tightly connected regions of a
25-
graph. Depending on the task, knowledge about the structure of the graph can
26-
reveal information such as voter behavior, the formation of new trends, existing
27-
terrorist groups and recruitment or a natural partitioning of
28-
data records onto pages. Further application areas
29-
include the study of protein interaction, gene
30-
expression networks, fraud
31-
detection, program optimization and the spread of
32-
epidemics---possible applications are plentiful, as
33-
almost all systems containing interacting or coexisting entities can be modeled
34-
as a graph.
24+
| | |
25+
|:--|:--|
26+
| **What it solves** | Graph clustering: detecting tightly connected regions (communities) in a graph |
27+
| **Objective** | Maximize [modularity](https://en.wikipedia.org/wiki/Modularity_(networks)) |
28+
| **Key result** | Improves or reproduces **all entries** of the 10th DIMACS Implementation Challenge |
29+
| **Algorithm** | Memetic (evolutionary) algorithm with multilevel techniques and ensemble recombination |
30+
| **Interfaces** | CLI, Python (`pip install vieclus`), C/C++ library |
31+
| **Parallel** | Optional MPI support for parallel evolutionary search |
3532

33+
<p align="center">
34+
<img src="./img/example_clustering.png"
35+
alt="Example: a graph with three detected clusters (red, cyan, yellow)"
36+
width="400">
37+
</p>
3638

39+
## Quick Start
3740

38-
This is the release of our memetic algorithm, VieClus (Vienna Graph Clustering), to tackle the graph clustering problem.
39-
A key component of our contribution are natural recombine operators that employ ensemble clusterings as well as multi-level techniques.
40-
In our experimental evaluation, we show that **our algorithm successfully improves or reproduces all entries of the 10th DIMACS implementation challenge** under consideration in a small amount of time. In fact, for most of the small instances, we can improve the old benchmark result in less than a minute.
41-
Moreover, while the previous best result for different instances has been computed by a variety of solvers, our algorithm can now be used as a single tool to compute the result. **In short our solver is the currently best modularity based clustering algorithm available.**
41+
### Install
4242

43-
Installation Notes
44-
=====
43+
| Method | Command |
44+
|:-------|:--------|
45+
| **Homebrew** (macOS/Linux) | `brew install KaHIP/kahip/vieclus` |
46+
| **pip** (Python) | `pip install vieclus` |
47+
| **Build from source** | `./compile_withcmake.sh` (with MPI) or `./compile_withcmake.sh NOMPI` |
4548

46-
### Install via Homebrew
49+
### Run
4750

51+
**Command line:**
4852
```bash
49-
brew install KaHIP/kahip/vieclus
53+
# With MPI (better solution quality through parallel evolutionary search)
54+
mpirun -n 4 ./deploy/vieclus examples/astro-ph.graph --time_limit=60
55+
56+
# Without MPI
57+
./deploy/vieclus examples/astro-ph.graph --time_limit=60
5058
```
5159

52-
### C++ Command Line Tool
60+
**Python (quickest way to get started):**
61+
```python
62+
import vieclus
5363

54-
VieClus can be compiled with or without MPI support.
64+
g = vieclus.vieclus_graph()
65+
g.set_num_nodes(6)
66+
g.add_undirected_edge(0, 1, 5)
67+
g.add_undirected_edge(1, 2, 5)
68+
g.add_undirected_edge(0, 2, 5)
69+
g.add_undirected_edge(3, 4, 5)
70+
g.add_undirected_edge(4, 5, 5)
71+
g.add_undirected_edge(3, 5, 5)
72+
g.add_undirected_edge(2, 3, 1) # weak bridge between two communities
5573

56-
#### With MPI (recommended for best solution quality)
74+
vwgt, xadj, adjcwgt, adjncy = g.get_csr_arrays()
75+
modularity, clustering = vieclus.cluster(vwgt, xadj, adjcwgt, adjncy, time_limit=1.0)
5776

58-
MPI enables the parallel evolutionary algorithm which typically yields better solutions.
77+
print(f"Modularity: {modularity}") # e.g. 0.41
78+
print(f"Clustering: {clustering}") # e.g. [0, 0, 0, 1, 1, 1]
79+
```
5980

60-
Prerequisites:
61-
- OpenMPI (http://www.open-mpi.org/) -- note: due to removed progress threads in OpenMPI > 1.8, please use an OpenMPI version < 1.8 or Intel MPI to obtain a scalable parallel algorithm.
81+
---
82+
83+
## Command Line Usage
6284

63-
```bash
64-
./compile_withcmake.sh
65-
mpirun -n 2 ./deploy/vieclus examples/astro-ph.graph --time_limit=60
6685
```
86+
mpirun -n P vieclus <graph-file> [options]
87+
```
88+
89+
| Option | Description | Default |
90+
|:-------|:-----------|:--------|
91+
| `<graph-file>` | Path to graph in METIS format (see [Graph Format](#graph-format)) | *required* |
92+
| `--time_limit=<double>` | Time limit in seconds. Must be > 0 to enable evolutionary recombination. | `0` |
93+
| `--seed=<int>` | Random seed | `0` |
94+
| `--output_filename=<string>` | Output file for the clustering | `tmpclustering` |
95+
| `--help` | Print help | |
6796

68-
#### Without MPI (NOMPI)
97+
**Included tools:**
6998

70-
If you do not have MPI installed or only need single-process execution, you can compile without MPI support. The algorithm will run on a single process using a pseudo-MPI layer.
99+
| Program | Description |
100+
|:--------|:-----------|
101+
| `vieclus` | Main clustering algorithm |
102+
| `graphchecker` | Validate that a graph file is correctly formatted |
103+
| `evaluator` | Compute modularity of a given clustering: `./deploy/evaluator <graph> --input_partition=<clustering>` |
71104

105+
**Example workflow:**
72106
```bash
73-
./compile_withcmake.sh NOMPI
74-
./deploy/vieclus examples/astro-ph.graph --time_limit=60
75-
```
107+
# 1. Check your graph file
108+
./deploy/graphchecker mygraph.graph
76109

77-
No additional dependencies are required for the NOMPI build.
110+
# 2. Cluster it (4 MPI processes, 60 second time limit)
111+
mpirun -n 4 ./deploy/vieclus mygraph.graph --time_limit=60 --output_filename=result.clustering
78112

79-
For a description of the graph format please have a look into the manual.
113+
# 3. Evaluate the result
114+
./deploy/evaluator mygraph.graph --input_partition=result.clustering
115+
```
80116

81-
Python Interface
82-
=====
117+
---
83118

84-
You can install the Python interface via pip:
85-
``pip install vieclus``
119+
## Python Interface
120+
121+
Install via pip:
122+
```bash
123+
pip install vieclus
124+
```
86125

87126
Or build from source:
88-
``pip install .``
127+
```bash
128+
pip install .
129+
```
89130

90-
### Example: Using the vieclus_graph class
131+
### Using the `vieclus_graph` helper class
91132

92133
```python
93134
import vieclus
94135

95-
# Build a graph using the vieclus_graph helper class
96136
g = vieclus.vieclus_graph()
97137
g.set_num_nodes(6)
98138

@@ -105,24 +145,22 @@ g.add_undirected_edge(4, 5, 5)
105145
g.add_undirected_edge(3, 5, 5)
106146
g.add_undirected_edge(2, 3, 1) # weak bridge between two communities
107147

108-
# Convert to CSR format and cluster
109148
vwgt, xadj, adjcwgt, adjncy = g.get_csr_arrays()
110-
modularity, clustering = vieclus.cluster(vwgt, xadj, adjcwgt, adjncy,
111-
time_limit=1.0)
149+
modularity, clustering = vieclus.cluster(vwgt, xadj, adjcwgt, adjncy, time_limit=1.0)
112150

113151
print(f"Modularity: {modularity}")
114152
print(f"Clustering: {clustering}")
115153
```
116154

117-
### Example: Using raw CSR arrays
155+
### Using raw CSR arrays
118156

119157
```python
120158
import vieclus
121159

122160
# Graph in METIS CSR format (same as KaHIP)
123-
xadj = [0, 2, 5, 7, 9, 12]
124-
adjncy = [1, 4, 0, 2, 4, 1, 3, 2, 4, 0, 1, 3]
125-
vwgt = [1, 1, 1, 1, 1]
161+
xadj = [0, 2, 5, 7, 9, 12]
162+
adjncy = [1, 4, 0, 2, 4, 1, 3, 2, 4, 0, 1, 3]
163+
vwgt = [1, 1, 1, 1, 1]
126164
adjcwgt = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
127165

128166
modularity, clustering = vieclus.cluster(vwgt, xadj, adjcwgt, adjncy,
@@ -134,25 +172,131 @@ print(f"Modularity: {modularity}")
134172
print(f"Clustering: {clustering}")
135173
```
136174

137-
### Parameters
175+
### API Reference
138176

139-
The `vieclus.cluster` function takes the following arguments:
177+
**`vieclus.cluster(vwgt, xadj, adjcwgt, adjncy, **kwargs)`**
140178

141179
| Parameter | Type | Default | Description |
142-
|-----------|------|---------|-------------|
143-
| `vwgt` | list | *required* | Node weights (length n) |
144-
| `xadj` | list | *required* | CSR index array (length n+1) |
145-
| `adjcwgt` | list | *required* | Edge weights (length m) |
146-
| `adjncy` | list | *required* | CSR adjacency array (length m) |
180+
|:----------|:-----|:--------|:------------|
181+
| `vwgt` | list[int] | *required* | Node weights (length n) |
182+
| `xadj` | list[int] | *required* | CSR index array (length n+1) |
183+
| `adjcwgt` | list[int] | *required* | Edge weights (length m) |
184+
| `adjncy` | list[int] | *required* | CSR adjacency array (length m) |
147185
| `suppress_output` | bool | `True` | Suppress console output |
148186
| `seed` | int | `0` | Random seed |
149187
| `time_limit` | float | `1.0` | Time limit in seconds |
150188
| `cluster_upperbound` | int | `0` | Max cluster size (0 = no limit) |
151189

152-
Returns a tuple `(modularity, clustering)` where `modularity` is a float in [-1, 1] and `clustering` is a list of cluster IDs for each node.
190+
**Returns:** `(modularity: float, clustering: list[int])` where modularity is in [-1, 1] and clustering maps each node to a cluster ID.
153191

154-
Release Notes
155-
=====
192+
**`vieclus.vieclus_graph`** -- helper class for building graphs (same interface as `kahip.kahip_graph`):
193+
194+
| Method | Description |
195+
|:-------|:-----------|
196+
| `set_num_nodes(n)` | Set the number of nodes |
197+
| `add_undirected_edge(u, v, weight)` | Add an undirected edge with weight |
198+
| `get_csr_arrays()` | Returns `(vwgt, xadj, adjcwgt, adjncy)` ready for `vieclus.cluster()` |
199+
200+
---
201+
202+
## C/C++ Library
203+
204+
Link against `libvieclus_static.a` and include `vieclus_interface.h`:
205+
206+
```cpp
207+
#include "vieclus_interface.h"
208+
209+
int n = 5;
210+
int xadj[] = {0, 2, 5, 7, 9, 12};
211+
int adjncy[] = {1, 4, 0, 2, 4, 1, 3, 2, 4, 0, 1, 3};
212+
213+
int clustering[5];
214+
double modularity;
215+
int num_clusters;
216+
217+
vieclus_clustering(&n, NULL, xadj, NULL, adjncy,
218+
true, // suppress_output
219+
0, // seed
220+
10.0, // time_limit
221+
0, // cluster_upperbound (0 = no limit)
222+
&modularity, &num_clusters, &clustering[0]);
223+
```
224+
225+
Build with:
226+
```bash
227+
./compile_withcmake.sh NOMPI
228+
g++ -std=c++11 my_program.cpp -I interface/ -L build/ -lvieclus_static -lpthread -fopenmp -o my_program
229+
```
230+
231+
---
232+
233+
## Graph Format
234+
235+
VieClus uses the **METIS graph format**, the same format used by [KaHIP](https://github.com/KaHIP/KaHIP), Metis, Chaco, and the 10th DIMACS Implementation Challenge.
236+
237+
### Input format
238+
239+
A plain text file with `n + 1` lines (excluding comments). Lines starting with `%` are comments and are skipped.
240+
241+
**Header line:**
242+
```
243+
n m [f]
244+
```
245+
- `n` = number of vertices, `m` = number of undirected edges
246+
- `f` = format flag (optional): `0` = unweighted, `1` = edge weights, `10` = node weights, `11` = both
247+
248+
**Vertex lines (one per vertex):**
249+
Each of the following `n` lines describes one vertex's adjacency list. For `f=1` (edge weights):
250+
```
251+
v1 w1 v2 w2 ...
252+
```
253+
where `v_i` are neighbor IDs (**1-indexed**) and `w_i` are edge weights.
254+
255+
**Example** (4 vertices, 5 edges, unweighted):
256+
```
257+
4 5
258+
2 3
259+
1 3 4
260+
1 2 4
261+
2 3
262+
```
263+
264+
### Output format
265+
266+
The clustering output file contains `n` lines. Line `i` contains the cluster ID of vertex `i` (0-indexed). Cluster IDs are numbered consecutively from 0.
267+
268+
### Validating your graph
269+
270+
```bash
271+
./deploy/graphchecker mygraph.graph
272+
```
273+
274+
---
275+
276+
## How It Works
277+
278+
VieClus is a **memetic algorithm** that combines evolutionary search with multilevel graph clustering techniques:
279+
280+
1. **Multilevel approach**: The graph is recursively coarsened, an initial clustering is computed on the smallest graph, and local search improves the clustering at each level during uncoarsening.
281+
2. **Evolutionary recombination**: A population of clusterings is maintained. Two parent clusterings are combined using an *ensemble clustering* overlay, where two vertices end up in the same cluster only if they agree in both parents.
282+
3. **Parallel search** (with MPI): Multiple processes explore the solution space independently and exchange high-quality individuals, improving diversity and convergence.
283+
284+
More time and more MPI processes generally yield better modularity values. For details, see the [paper](https://arxiv.org/abs/1802.07034).
285+
286+
---
287+
288+
## Related Projects
289+
290+
| Project | Description |
291+
|:--------|:-----------|
292+
| [KaHIP](https://github.com/KaHIP/KaHIP) | Karlsruhe High Quality Graph Partitioning (flagship framework) |
293+
| [CluStRE](https://github.com/KaHIP/CluStRE) | Fast streaming graph clustering |
294+
| [KaMinPar](https://github.com/KaHIP/KaMinPar) | Shared-memory parallel graph partitioner |
295+
| [KaHyPar](https://github.com/kahypar) | Karlsruhe Hypergraph Partitioning |
296+
297+
---
298+
299+
## Release Notes
156300

157301
### v1.2
158302
- Added Python interface (`pip install vieclus`) with pybind11 bindings
@@ -169,8 +313,36 @@ Release Notes
169313
### v1.0
170314
- Initial release of the memetic graph clustering algorithm
171315

172-
Licence
173-
=====
316+
---
317+
318+
## Building from Source
319+
320+
### With MPI (recommended for best solution quality)
321+
322+
MPI enables the parallel evolutionary algorithm which typically yields better solutions.
323+
324+
**Prerequisites:** OpenMPI or Intel MPI
325+
326+
```bash
327+
git clone https://github.com/KaHIP/VieClus.git
328+
cd VieClus
329+
./compile_withcmake.sh
330+
```
331+
332+
Binaries are placed in `./deploy/`.
333+
334+
### Without MPI
335+
336+
```bash
337+
./compile_withcmake.sh NOMPI
338+
```
339+
340+
No additional dependencies beyond a C++11 compiler and CMake 3.10+.
341+
342+
---
343+
344+
## Licence
345+
174346
The program is licenced under MIT licence.
175347
If you publish results using our algorithms, please acknowledge our work by quoting the following paper:
176348

@@ -187,4 +359,3 @@ If you publish results using our algorithms, please acknowledge our work by quot
187359
doi = {10.4230/LIPIcs.SEA.2018.3}
188360
}
189361
```
190-

0 commit comments

Comments
 (0)