Skip to content

Commit 2bebf59

Browse files
committed
Init Commit
0 parents  commit 2bebf59

33 files changed

+4741
-0
lines changed

.gitignore

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# docs
2+
docs/build/
3+
4+
# env and caches
5+
6+
mdsenv/
7+
**/__pycache__/
8+
.pytest_cache/
9+
.ruff_cache/
10+
11+
# IDE files
12+
.idea
13+
14+
# compiled files
15+
**.so
16+
build/
17+
**/auto_examples/
18+
wheelhouse/
19+
dist/
20+
**.egg-info
21+
docs/source/modules/generated/
22+
/docs/source/sg_execution_times.rst
23+
24+
# cython files
25+
**/emos.c
26+
**/mds.cpp
27+
28+
# Reportings
29+
reporting/

EXPERIMENTS.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
### Experimental results
2+
3+
The Radius Clustering package provides two algorithms to solve the MDS problem: an exact algorithm and an approximate algorithm. The approximate algorithm is based on a heuristic that iteratively selects the vertex that dominates the most vertices in the graph. The exact algorithm is based on a branch-and-bound algorithm that finds the minimum dominating set in the graph. Experimentation has been conducted on real-world datasets to compare the performances of these two algorithms, and compare them to state-of-the-art clustering algorithms. The complete results are available in the paper [Clustering under radius constraint using minimum dominating sets](https://hal.science/hal-04533921/).
4+
5+
The algorithms selected for comparison are:
6+
7+
1. Equiwide clustering (EQW-LP), a state-of-the-art exact algorithm using LP formulation of the problem [[3]](https://hal.science/hal-03356000)
8+
2. ProtoClust [[4](http://faculty.marshall.usc.edu/Jacob-Bien/papers/jasa2011minimax.pdf)]
9+
10+
Here are some key results from the experiments:
11+
12+
Table 1: Average running time (in seconds) of the algorithms on real-world datasets.
13+
14+
| **Dataset** | **MDS-APPROX** | **MDS-EXACT** | **EQW-LP** | **PROTOCLUST** |
15+
|--------------------------|----------------|---------------|--------------|----------------|
16+
| **Iris** | 0.062 ± 0.01 | 0.009 ± 0.00 | 0.018 ± 0.01 | 0.026 ± 0.00 |
17+
| **Wine** | 0.029 ± 0.00 | 0.010 ± 0.00 | 0.014 ± 0.00 | 0.034 ± 0.00 |
18+
| **Glass Identification** | 0.015 ± 0.00 | 0.020 ± 0.00 | 0.026 ± 0.00 | 0.046 ± 0.00 |
19+
| **Ionosphere** | 0.078 ± 0.01 | 2.640 ± 0.05 | 0.104 ± 0.00 | 0.120 ± 0.00 |
20+
| **WDBC** | 0.315 ± 0.01 | 0.138 ± 0.00 | 0.197 ± 0.01 | 0.402 ± 0.00 |
21+
| **Synthetic Control** | 0.350 ± 0.03 | 0.036 ± 0.00 | 0.143 ± 0.01 | 0.489 ± 0.00 |
22+
| **Vehicle** | 0.955 ± 0.04 | 0.185 ± 0.00 | 0.526 ± 0.01 | 0.830 ± 0.01 |
23+
| **Yeast** | 2.361 ± 0.03 | 738.8 ± 0.30 | 6.718 ± 0.02 | 2.374 ± 0.08 |
24+
| **Ozone** | 49.82 ± 1.18 | 1447 ± 0.54 | 26.86 ± 0.63 | 15.32 ± 0.15 |
25+
| **Waveform** | 48.01 ± 0.39 | 8813 ± 57.80 | 233.9 ± 1.45 | 61.27 ± 0.08 |
26+
27+
Table 2: Number of clusters obtained on real-world datasets.
28+
29+
| **Dataset** | **MDS-APPROX** | **MDS-EXACT** | **EQW-LP** | **PROTOCLUST** |
30+
|--------------------------|----------------|---------------|------------|----------------|
31+
| **Iris** | 3 | 3 | 3 | 4 |
32+
| **Wine** | 4 | 3 | 3 | 4 |
33+
| **Glass Identification** | 7 | 6 | 6 | 7 |
34+
| **Ionosphere** | 2 | 2 | 2 | 5 |
35+
| **WDBC** | 2 | 2 | 2 | 3 |
36+
| **Synthetic Control** | 8 | 6 | 6 | 8 |
37+
| **Vehicle** | 5 | 4 | 4 | 6 |
38+
| **Yeast** | 10 | 10 | 10 | 13 |
39+
| **Ozone** | 3 | 2 | 2 | 3 |
40+
| **Waveform** | 3 | 3 | 3 | 6 |
41+
42+
43+
Table 3: Compactness of the clusters (maximal radius obtained after clustering) obtained on real-world datasets.
44+
45+
| **Dataset** | **MDS-APPROX** | **MDS-EXACT** | **EQW-LP** | **PROTOCLUST** |
46+
|--------------------------|----------------|---------------|------------|----------------|
47+
| **Iris** | 1.43 | 1.43 | 1.43 | 1.24 |
48+
| **Wine** | 220.05 | 232.08 | 232.08 | 181.35 |
49+
| **Glass Identification** | 3.94 | 3.94 | 3.94 | 3.31 |
50+
| **Ionosphere** | 4.45 | 5.45 | 5.45 | 5.35 |
51+
| **WDBC** | 1197.42 | 1197.42 | 1197.42 | 907.10 |
52+
| **Synthetic Control** | 66.59 | 70.11 | 70.11 | 68.27 |
53+
| **Vehicle** | 150.87 | 155.05 | 155.05 | 120.97 |
54+
| **Yeast** | 0.42 | 0.42 | 0.42 | 0.42 |
55+
| **Ozone** | 235.77 | 245.58 | 245.58 | 194.89 |
56+
| **Waveform** | 10.73 | 10.73 | 10.73 | 10.47 |
57+
58+
59+
#### Key insights:
60+
61+
- The approximate algorithm is significantly faster than the exact algorithm, but it may not always provide the optimal solution.
62+
- The exact algorithm is slower but provides the optimal solution. Does not scale well to large datasets, due to the NP-Hard nature of the problem.
63+
- The approximate algorithm is a good trade-off between speed and accuracy for most datasets.
64+
- MDS based approach are both more accurate than Protoclust. However, Protoclust is remarkably faster on most datasets.
65+
66+
67+
> :memo: **Note**: The results show that MDS-based clustering algorithms might be a good alternative to state-of-the-art clustering algorithms for clustering under radius constraint problems.
68+
69+
> :memo: **Note**: Since the publication of the paper, the Radius Clustering package has been improved and optimized. The results presented here are based on the initial version of the package. For the latest results, please refer to the documentation or the source code.
70+
71+
72+
## References
73+
74+
- [3] [Clustering to the fewest clusters under intra-cluster dissimilarity constraints](https://hal.science/hal-03356000)
75+
- [4] [Hierarchical Clustering with prototypes via Minimax Linkage](http://faculty.marshall.usc.edu/Jacob-Bien/papers/jasa2011minimax.pdf)
76+

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2024 Lias Laboratory
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

PRESENTATION.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
## How it works
2+
3+
### Clustering under radius constraint
4+
Clustering tasks are globally concerned about grouping data points into clusters based on some similarity measure. Clustering under radius constraints is a specific clustering task where the goal is to group data points such that the minimal maximum distance between any two points in the same cluster is less than or equal to a given radius. Mathematically, given a set of data points $X = \{x_1, x_2, \ldots, x_n\}$ and a radius $r$, the goal is to find a partition $ \mathcal{P}$ of $X$ into clusters $C_1, C_2, \ldots, C_k$ such that :
5+
```math
6+
\forall C \in \mathcal{P}, \min_{x_i \in C}\max_{x_j \in C} d(x_i, x_j) \leq r
7+
```
8+
where $d(x_i, x_j)$ is the dissimilarity between $x_i$ and $x_j$.
9+
10+
### Minimum Dominating Set (MDS) problem
11+
12+
The Radius Clustering package implements a clustering algorithm based on the Minimum Dominating Set (MDS) problem. The MDS problem is a well-known NP-Hard problem in graph theory, and it has been proven to be linked to the clustering under radius constraint problem. The MDS problem is defined as follows:
13+
14+
Given an undirected weighted graph $G = (V,E)$ where $V$ is a set of vertices and $E$ is a set of edges, a dominating set $D$ is a subset of $V$ such that every vertex in $V$ is either in $D$ or adjacent to a vertex in $D$. The goal is to find a dominating set $D$ such that the number of vertices in $D$ is minimized. This problem is known to be NP-Hard.
15+
16+
However, solving this problem in the context of clustering task can be useful, but we need some adaptations.
17+
18+
### Radius Clustering algorithm
19+
20+
To adapt the MDS problem to the clustering under radius constraint problem, we need to define a graph based on the data points. The vertices of the graph are the data points, and the edges are defined based on the distance between the data points. The weight of the edges is the dissimilarity between the data points. Then, the algorithm operates as follows:
21+
22+
1. Construct a graph $G = (V,E)$ based on the data points $X$.
23+
2. Prune the graph by removing the edges $e_{ij}$ such that $d(x_i,x_j) > r$.
24+
3. Solve the MDS problem on the pruned graph.
25+
4. Assign each vertex to the closest vertex in the dominating set. In case of a tie, assign the vertex to the vertex with the smallest index.
26+
5. Return the cluster labels.

README.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# Radius Clustering
2+
3+
Radius clustering is a Python package that implements clustering under radius constraint based on the Minimum Dominating Set (MDS) problem. This problem is NP-Hard but has been studied in the literature and proven to be linked to the clustering under radius constraint problem (see [references](#references) for more details).
4+
5+
## Features
6+
7+
- Implements both exact and approximate MDS-based clustering algorithms
8+
- Compatible with scikit-learn's API for clustering algorithms
9+
- Supports radius-constrained clustering
10+
- Provides options for exact and approximate solutions
11+
12+
## Installation
13+
14+
You can install Radius Clustering using pip:
15+
16+
```bash
17+
pip install radius-clustering
18+
```
19+
20+
> Note: This package is not yet available on PyPI. You may need to install it from the source.
21+
22+
## Usage
23+
24+
Here's a basic example of how to use Radius Clustering:
25+
26+
```python
27+
import numpy as np
28+
from radius_clustering import RadiusClustering
29+
30+
# Example usage
31+
X = np.random.rand(100, 2) # Generate random data
32+
33+
# Create an instance of MdsClustering
34+
rad_clustering = RadiusClustering(manner="approx", threshold=0.5)
35+
36+
# Fit the model to the data
37+
rad_clustering.fit(X)
38+
39+
# Get cluster labels
40+
labels = rad_clustering.labels_
41+
42+
print(labels)
43+
```
44+
45+
## Documentation
46+
47+
To build the documentation, you can run the following command, assuming you have Sphinx installed:
48+
49+
```bash
50+
cd docs
51+
make html
52+
```
53+
54+
Then you can open the `index.html` file in the `build` directory to view the full documentation.
55+
56+
## More information
57+
58+
For more information please refer to the official documentation.
59+
60+
If you want insights on how the algorithm works, please refer to the [presentation](PRESENTATION.md).
61+
62+
If you want to know more about the experiments conducted with the package, please refer to the [experiments](EXPERIMENTS.md).
63+
64+
65+
## Contributing
66+
67+
Contributions to MDS Clustering are welcome! Please feel free to submit a Pull Request.
68+
69+
## License
70+
71+
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
72+
73+
74+
## Acknowledgments
75+
76+
### MDS Algorithms
77+
78+
The two MDS algorithms implemented are forked and modified (or rewritten) from the following authors:
79+
80+
- [Alejandra Casado](https://github.com/AlejandraCasado) for the minimum dominating set heuristic code [[1](https://www.sciencedirect.com/science/article/pii/S0378475422005055)], whom original code is available at [truc]. We rewrote the code in C++ to adapt to the need of python interfacing.
81+
- [Hua Jiang](https://github.com/huajiang-ynu) for the minimum dominating set exact algorithm code [[2](https://dl.acm.org/doi/abs/10.24963/ijcai.2023/622)]. The code has been adapted to the need of python interfacing.
82+
83+
### Funders
84+
85+
The Radius Clustering work has been funded by:
86+
87+
- [LIAS, ISAE-ENSMA](https://www.lias-lab.fr/)
88+
- LabCom @lienor and the [French National Research Agency](https://anr.fr/)
89+
90+
### Contributors
91+
92+
- Mickaël Baron, LIAS, ISAE-ENSMA
93+
- Brice Chardin, LIAS, ISAE-ENSMA
94+
- Quentin Haenn, LIAS, ISAE-ENSMA
95+
96+
97+
## References
98+
99+
- [1] [An iterated greedy algorithm for finding the minimum dominating set in graphs](https://www.sciencedirect.com/science/article/pii/S0378475422005055)
100+
- [2] [An exact algorithm for the minimum dominating set problem](https://dl.acm.org/doi/abs/10.24963/ijcai.2023/622)
101+
102+

build_wheel.sh

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
#!/bin/bash
2+
set -e -u -x
3+
4+
PLAT=manylinux_2_24_x86_64
5+
6+
function repair_wheel {
7+
wheel="$1"
8+
if ! auditwheel show "$wheel"; then
9+
echo "Skipping non-platform wheel $wheel"
10+
else
11+
auditwheel repair "$wheel" --plat "$PLAT" -w /io/wheelhouse/
12+
fi
13+
}
14+
15+
# Compile wheels
16+
for PYBIN in /opt/python/*/bin; do
17+
if [[ "$PYBIN" == *cp39* || "$PYBIN" == *cp310* || $PYBIN == *cp311* ]] ; then
18+
"${PYBIN}/pip" wheel --no-deps /io/ -w wheelhouse/
19+
fi
20+
done
21+
22+
# Bundle external shared libraries into the wheels
23+
for whl in wheelhouse/*.whl; do
24+
repair_wheel "$whl"
25+
done

docs/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = source
9+
BUILDDIR = build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

docs/make.bat

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
@ECHO OFF
2+
3+
pushd %~dp0
4+
5+
REM Command file for Sphinx documentation
6+
7+
if "%SPHINXBUILD%" == "" (
8+
set SPHINXBUILD=sphinx-build
9+
)
10+
set SOURCEDIR=source
11+
set BUILDDIR=build
12+
13+
%SPHINXBUILD% >NUL 2>NUL
14+
if errorlevel 9009 (
15+
echo.
16+
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
17+
echo.installed, then set the SPHINXBUILD environment variable to point
18+
echo.to the full path of the 'sphinx-build' executable. Alternatively you
19+
echo.may add the Sphinx directory to PATH.
20+
echo.
21+
echo.If you don't have Sphinx installed, grab it from
22+
echo.https://www.sphinx-doc.org/
23+
exit /b 1
24+
)
25+
26+
if "%1" == "" goto help
27+
28+
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
29+
goto end
30+
31+
:help
32+
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
33+
34+
:end
35+
popd

docs/source/api.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
API Reference
2+
=============
3+
4+
.. automodule:: radius_clustering
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:

0 commit comments

Comments
 (0)