Spectral clustering

Aeroplane from (b) subimage is taken from this link.

This is the accompanying code to our bachelor thesis. If you want to reproduce results from our bachelor thesis, run src/experiments.py. This package can also be freely used as spectral clustering implementation for cluster analysis purpose (see Usage for examples).

Abstract:

Spectral clustering is an important method of cluster analysis based on the spectral theory of matrices. In this thesis we deal with the relations of spectral clustering to several mathematical areas, for example to graph theory, perturbation theory and Markov chains. We prove the Rayleigh-Ritz theorem, which is important not only for the spectral clustering method, but can also be used in the method of principal component analysis and other methods that use eigenvalues and eigenvectors. Furthermore, in this thesis we deal with spectral clustering from the point of view of applications, where we present experimental results on non-trivial real data, and we also discuss the appropriate setting of hyperparameters.

Requirements

This code was written in python 3.8.8 with the following packages:

numpy 1.19.2
scipy 1.6.2
scikit-learn 0.24.1
pynndescent 0.5.2
tensorflow 2.3.0 (only for load MNIST and Fashion-MNIST dataset)
pandas 1.2.3 (only for store experimental results)

If you want to run src/experiments.py with REUTERS dataset, you schould download it by cd src/data; ./get_data.sh and preprocess it by src/data/make_reuters_dataset.py.

Usage

from spectral_methods import SpectralClustering
import numpy as np

# data
x = np.array([[0, 0], [0, 1], [1, 1],
              [5, 5], [5, 6], [6, 6]])

# define hyperparameters
params = {'n_clusters': 2, 
          'n_neighbors': 3,
          'sigma': 'mean', 
          'affinity': 'k-hNNG',
          'type_of_laplacian': 'sym', 
          'knn_aprox': False,
          'eigen_aprox': False}

sc = SpectralClustering(**params).fit(x)

print(sc.labels_) # output: [1 1 1 0 0 0]

from spectral_methods import SpectralClustering
import numpy as np
from util import load_data
from util import acc
from sklearn.metrics import normalized_mutual_info_score as nmi

# load MNIST dataset
x, y_true = load_data('mnist')

# define hyperparameters
params = {'n_clusters': 10, 
          'n_neighbors': int(np.log(x.shape[0])*10),
          'sigma': 'mean', 
          'affinity': 'k-hNNG',
          'type_of_laplacian': 'sym', 
          'knn_aprox': True,
          'eigen_aprox': True}

# time: aprox 24 seconds
sc = SpectralClustering(**params).fit(x)

# evaluate clustering results (only if true labels is known)
print(nmi(y_true, sc.labels_))  # output: 0.6855
print(acc(y_true, sc.labels_))  # output: 0.6520

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
src		src
README.md		README.md
spectral_clustering.png		spectral_clustering.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spectral clustering

Abstract:

Requirements

Usage

About

Uh oh!

Releases

Packages

Languages

richardcepka/Spectral-Clustering

Folders and files

Latest commit

History

Repository files navigation

Spectral clustering

Abstract:

Requirements

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages