Rewrite README for clarity, agent-friendliness, and structured documentation

schulzchristian · schulzchristian · commit 361e1090ace4 · 2026-03-10T10:08:54.000+01:00
diff --git a/README.md b/README.md
@@ -19,80 +19,120 @@ VieClus v1.2
   <img src="./logo/vieclus-logo.svg" alt="VieClus Logo" width="900"/>
 </p>
 
-The graph clustering framework VieClus -- Vienna Graph Clustering. Part of the [KaHIP](https://github.com/KaHIP) organization.
+**VieClus** (Vienna Graph Clustering) is a memetic algorithm for high-quality graph clustering that optimizes modularity. It is the state-of-the-art solver for achieving the highest possible modularity values. Part of the [KaHIP](https://github.com/KaHIP) organization.
 
-Graph clustering is the problem of detecting tightly connected regions of a
-graph. Depending on the task, knowledge about the structure of the graph can
-reveal information such as voter behavior, the formation of new trends, existing
-terrorist groups and recruitment or a natural partitioning of
-data records onto pages. Further application areas
-include the study of protein interaction, gene
-expression networks, fraud
-detection, program optimization and the spread of
-epidemics---possible applications are plentiful, as
-almost all systems containing interacting or coexisting entities can be modeled
-as a graph. 
+| | |
+|:--|:--|
+| **What it solves** | Graph clustering: detecting tightly connected regions (communities) in a graph |
+| **Objective** | Maximize [modularity](https://en.wikipedia.org/wiki/Modularity_(networks)) |
+| **Key result** | Improves or reproduces **all entries** of the 10th DIMACS Implementation Challenge |
+| **Algorithm** | Memetic (evolutionary) algorithm with multilevel techniques and ensemble recombination |
+| **Interfaces** | CLI, Python (`pip install vieclus`), C/C++ library |
+| **Parallel** | Optional MPI support for parallel evolutionary search |
 
+<p align="center">
+<img src="./img/example_clustering.png"
+  alt="Example: a graph with three detected clusters (red, cyan, yellow)"
+  width="400">
+</p>
 
+## Quick Start
 
-This is the release of our memetic algorithm, VieClus (Vienna Graph Clustering), to tackle the graph clustering problem. 
-A key component of our contribution are natural recombine operators that employ ensemble clusterings as well as multi-level techniques. 
-In our experimental evaluation, we show that **our algorithm successfully improves or reproduces all entries of the 10th DIMACS implementation challenge** under consideration in a small amount of time. In fact, for most of the small instances, we can improve the old benchmark result in less than a minute.
-Moreover, while the previous best result for different instances has been computed by a variety of solvers, our algorithm can now be used as a single tool to compute the result. **In short our solver is the currently best modularity based clustering algorithm available.**
+### Install
 
-Installation Notes
-=====
+| Method | Command |
+|:-------|:--------|
+| **Homebrew** (macOS/Linux) | `brew install KaHIP/kahip/vieclus` |
+| **pip** (Python) | `pip install vieclus` |
+| **Build from source** | `./compile_withcmake.sh` (with MPI) or `./compile_withcmake.sh NOMPI` |
 
-### Install via Homebrew
+### Run
 
+**Command line:**
 ```bash
-brew install KaHIP/kahip/vieclus
+# With MPI (better solution quality through parallel evolutionary search)
+mpirun -n 4 ./deploy/vieclus examples/astro-ph.graph --time_limit=60
+
+# Without MPI
+./deploy/vieclus examples/astro-ph.graph --time_limit=60
 ```
 
-### C++ Command Line Tool
+**Python (quickest way to get started):**
+```python
+import vieclus
 
-VieClus can be compiled with or without MPI support.
+g = vieclus.vieclus_graph()
+g.set_num_nodes(6)
+g.add_undirected_edge(0, 1, 5)
+g.add_undirected_edge(1, 2, 5)
+g.add_undirected_edge(0, 2, 5)
+g.add_undirected_edge(3, 4, 5)
+g.add_undirected_edge(4, 5, 5)
+g.add_undirected_edge(3, 5, 5)
+g.add_undirected_edge(2, 3, 1)  # weak bridge between two communities
 
-#### With MPI (recommended for best solution quality)
+vwgt, xadj, adjcwgt, adjncy = g.get_csr_arrays()
+modularity, clustering = vieclus.cluster(vwgt, xadj, adjcwgt, adjncy, time_limit=1.0)
 
-MPI enables the parallel evolutionary algorithm which typically yields better solutions.
+print(f"Modularity: {modularity}")   # e.g. 0.41
+print(f"Clustering: {clustering}")   # e.g. [0, 0, 0, 1, 1, 1]
+```
 
-Prerequisites:
-- OpenMPI (http://www.open-mpi.org/) -- note: due to removed progress threads in OpenMPI > 1.8, please use an OpenMPI version < 1.8 or Intel MPI to obtain a scalable parallel algorithm.
+---
+
+## Command Line Usage
 
-```bash
-./compile_withcmake.sh
-mpirun -n 2 ./deploy/vieclus examples/astro-ph.graph --time_limit=60
 ```
+mpirun -n P vieclus <graph-file> [options]
+```
+
+| Option | Description | Default |
+|:-------|:-----------|:--------|
+| `<graph-file>` | Path to graph in METIS format (see [Graph Format](#graph-format)) | *required* |
+| `--time_limit=<double>` | Time limit in seconds. Must be > 0 to enable evolutionary recombination. | `0` |
+| `--seed=<int>` | Random seed | `0` |
+| `--output_filename=<string>` | Output file for the clustering | `tmpclustering` |
+| `--help` | Print help | |
 
-#### Without MPI (NOMPI)
+**Included tools:**
 
-If you do not have MPI installed or only need single-process execution, you can compile without MPI support. The algorithm will run on a single process using a pseudo-MPI layer.
+| Program | Description |
+|:--------|:-----------|
+| `vieclus` | Main clustering algorithm |
+| `graphchecker` | Validate that a graph file is correctly formatted |
+| `evaluator` | Compute modularity of a given clustering: `./deploy/evaluator <graph> --input_partition=<clustering>` |
 
+**Example workflow:**
 ```bash
-./compile_withcmake.sh NOMPI
-./deploy/vieclus examples/astro-ph.graph --time_limit=60
-```
+# 1. Check your graph file
+./deploy/graphchecker mygraph.graph
 
-No additional dependencies are required for the NOMPI build.
+# 2. Cluster it (4 MPI processes, 60 second time limit)
+mpirun -n 4 ./deploy/vieclus mygraph.graph --time_limit=60 --output_filename=result.clustering
 
-For a description of the graph format please have a look into the manual.
+# 3. Evaluate the result
+./deploy/evaluator mygraph.graph --input_partition=result.clustering
+```
 
-Python Interface
-=====
+---
 
-You can install the Python interface via pip:
-``pip install vieclus``
+## Python Interface
+
+Install via pip:
+```bash
+pip install vieclus
+```
 
 Or build from source:
-``pip install .``
+```bash
+pip install .
+```
 
-### Example: Using the vieclus_graph class
+### Using the `vieclus_graph` helper class
 
 ```python
 import vieclus
 
-# Build a graph using the vieclus_graph helper class
 g = vieclus.vieclus_graph()
 g.set_num_nodes(6)
 
@@ -105,24 +145,22 @@ g.add_undirected_edge(4, 5, 5)
 g.add_undirected_edge(3, 5, 5)
 g.add_undirected_edge(2, 3, 1)  # weak bridge between two communities
 
-# Convert to CSR format and cluster
 vwgt, xadj, adjcwgt, adjncy = g.get_csr_arrays()
-modularity, clustering = vieclus.cluster(vwgt, xadj, adjcwgt, adjncy,
-                                         time_limit=1.0)
+modularity, clustering = vieclus.cluster(vwgt, xadj, adjcwgt, adjncy, time_limit=1.0)
 
 print(f"Modularity: {modularity}")
 print(f"Clustering: {clustering}")
 ```
 
-### Example: Using raw CSR arrays
+### Using raw CSR arrays
 
 ```python
 import vieclus
 
 # Graph in METIS CSR format (same as KaHIP)
-xadj   = [0, 2, 5, 7, 9, 12]
-adjncy = [1, 4, 0, 2, 4, 1, 3, 2, 4, 0, 1, 3]
-vwgt   = [1, 1, 1, 1, 1]
+xadj    = [0, 2, 5, 7, 9, 12]
+adjncy  = [1, 4, 0, 2, 4, 1, 3, 2, 4, 0, 1, 3]
+vwgt    = [1, 1, 1, 1, 1]
 adjcwgt = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
 
 modularity, clustering = vieclus.cluster(vwgt, xadj, adjcwgt, adjncy,
@@ -134,25 +172,131 @@ print(f"Modularity: {modularity}")
 print(f"Clustering: {clustering}")
 ```
 
-### Parameters
+### API Reference
 
-The `vieclus.cluster` function takes the following arguments:
+**`vieclus.cluster(vwgt, xadj, adjcwgt, adjncy, **kwargs)`**
 
 | Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `vwgt` | list | *required* | Node weights (length n) |
-| `xadj` | list | *required* | CSR index array (length n+1) |
-| `adjcwgt` | list | *required* | Edge weights (length m) |
-| `adjncy` | list | *required* | CSR adjacency array (length m) |
+|:----------|:-----|:--------|:------------|
+| `vwgt` | list[int] | *required* | Node weights (length n) |
+| `xadj` | list[int] | *required* | CSR index array (length n+1) |
+| `adjcwgt` | list[int] | *required* | Edge weights (length m) |
+| `adjncy` | list[int] | *required* | CSR adjacency array (length m) |
 | `suppress_output` | bool | `True` | Suppress console output |
 | `seed` | int | `0` | Random seed |
 | `time_limit` | float | `1.0` | Time limit in seconds |
 | `cluster_upperbound` | int | `0` | Max cluster size (0 = no limit) |
 
-Returns a tuple `(modularity, clustering)` where `modularity` is a float in [-1, 1] and `clustering` is a list of cluster IDs for each node.
+**Returns:** `(modularity: float, clustering: list[int])` where modularity is in [-1, 1] and clustering maps each node to a cluster ID.
 
-Release Notes
-=====
+**`vieclus.vieclus_graph`** -- helper class for building graphs (same interface as `kahip.kahip_graph`):
+
+| Method | Description |
+|:-------|:-----------|
+| `set_num_nodes(n)` | Set the number of nodes |
+| `add_undirected_edge(u, v, weight)` | Add an undirected edge with weight |
+| `get_csr_arrays()` | Returns `(vwgt, xadj, adjcwgt, adjncy)` ready for `vieclus.cluster()` |
+
+---
+
+## C/C++ Library
+
+Link against `libvieclus_static.a` and include `vieclus_interface.h`:
+
+```cpp
+#include "vieclus_interface.h"
+
+int n = 5;
+int xadj[]   = {0, 2, 5, 7, 9, 12};
+int adjncy[] = {1, 4, 0, 2, 4, 1, 3, 2, 4, 0, 1, 3};
+
+int clustering[5];
+double modularity;
+int num_clusters;
+
+vieclus_clustering(&n, NULL, xadj, NULL, adjncy,
+                   true,   // suppress_output
+                   0,      // seed
+                   10.0,   // time_limit
+                   0,      // cluster_upperbound (0 = no limit)
+                   &modularity, &num_clusters, &clustering[0]);
+```
+
+Build with:
+```bash
+./compile_withcmake.sh NOMPI
+g++ -std=c++11 my_program.cpp -I interface/ -L build/ -lvieclus_static -lpthread -fopenmp -o my_program
+```
+
+---
+
+## Graph Format
+
+VieClus uses the **METIS graph format**, the same format used by [KaHIP](https://github.com/KaHIP/KaHIP), Metis, Chaco, and the 10th DIMACS Implementation Challenge.
+
+### Input format
+
+A plain text file with `n + 1` lines (excluding comments). Lines starting with `%` are comments and are skipped.
+
+**Header line:**
+```
+n m [f]
+```
+- `n` = number of vertices, `m` = number of undirected edges
+- `f` = format flag (optional): `0` = unweighted, `1` = edge weights, `10` = node weights, `11` = both
+
+**Vertex lines (one per vertex):**
+Each of the following `n` lines describes one vertex's adjacency list. For `f=1` (edge weights):
+```
+v1 w1 v2 w2 ...
+```
+where `v_i` are neighbor IDs (**1-indexed**) and `w_i` are edge weights.
+
+**Example** (4 vertices, 5 edges, unweighted):
+```
+4 5
+2 3
+1 3 4
+1 2 4
+2 3
+```
+
+### Output format
+
+The clustering output file contains `n` lines. Line `i` contains the cluster ID of vertex `i` (0-indexed). Cluster IDs are numbered consecutively from 0.
+
+### Validating your graph
+
+```bash
+./deploy/graphchecker mygraph.graph
+```
+
+---
+
+## How It Works
+
+VieClus is a **memetic algorithm** that combines evolutionary search with multilevel graph clustering techniques:
+
+1. **Multilevel approach**: The graph is recursively coarsened, an initial clustering is computed on the smallest graph, and local search improves the clustering at each level during uncoarsening.
+2. **Evolutionary recombination**: A population of clusterings is maintained. Two parent clusterings are combined using an *ensemble clustering* overlay, where two vertices end up in the same cluster only if they agree in both parents.
+3. **Parallel search** (with MPI): Multiple processes explore the solution space independently and exchange high-quality individuals, improving diversity and convergence.
+
+More time and more MPI processes generally yield better modularity values. For details, see the [paper](https://arxiv.org/abs/1802.07034).
+
+---
+
+## Related Projects
+
+| Project | Description |
+|:--------|:-----------|
+| [KaHIP](https://github.com/KaHIP/KaHIP) | Karlsruhe High Quality Graph Partitioning (flagship framework) |
+| [CluStRE](https://github.com/KaHIP/CluStRE) | Fast streaming graph clustering |
+| [KaMinPar](https://github.com/KaHIP/KaMinPar) | Shared-memory parallel graph partitioner |
+| [KaHyPar](https://github.com/kahypar) | Karlsruhe Hypergraph Partitioning |
+
+---
+
+## Release Notes
 
 ### v1.2
 - Added Python interface (`pip install vieclus`) with pybind11 bindings
@@ -169,8 +313,36 @@ Release Notes
 ### v1.0
 - Initial release of the memetic graph clustering algorithm
 
-Licence
-=====
+---
+
+## Building from Source
+
+### With MPI (recommended for best solution quality)
+
+MPI enables the parallel evolutionary algorithm which typically yields better solutions.
+
+**Prerequisites:** OpenMPI or Intel MPI
+
+```bash
+git clone https://github.com/KaHIP/VieClus.git
+cd VieClus
+./compile_withcmake.sh
+```
+
+Binaries are placed in `./deploy/`.
+
+### Without MPI
+
+```bash
+./compile_withcmake.sh NOMPI
+```
+
+No additional dependencies beyond a C++11 compiler and CMake 3.10+.
+
+---
+
+## Licence
+
 The program is licenced under MIT licence.
 If you publish results using our algorithms, please acknowledge our work by quoting the following paper:
 
@@ -187,4 +359,3 @@ If you publish results using our algorithms, please acknowledge our work by quot
   doi       = {10.4230/LIPIcs.SEA.2018.3}
 }
 ```
-