Repository Description

This repository contains datasets and code for reproducing the results in our paper

He Yudong, An Equilibrium Approach to Clustering: Surpassing Fuzzy C-Means on Imbalanced Data, IEEE Transactions on Fuzzy Systems, 2025.

He Yudong, Imbalanced Data Clustering Using Equilibrium K-Means, arXiv, 2024.

A Python package "sklekmeans" that implement euilibrium k-means, whose API is compatible with sklearn estimator, has been released. Check the repo or doc or PyPI project page for more details.

Equilibrium K-Means: A K-Means-type Clustering Algorithm for Imbalanced Data

The objective of EKM:

$$\min_{c_1,\cdots,c_K}\sum_{n=1}^N \left(\sum_{i=1}^K d^2_{in} e^{-\alpha d^2_{in}}\right)/\left(\sum_{i=1}^K e^{-\alpha d^2_{in}}\right),$$ where $c_1,\cdots,c_K$ are K centroids as centers of clusters, $d_{in}$ is the distance (usually the Euclidean distance) between $n$-th data point and $i$-th centroid, and $\alpha$ is a parameter to be tuned.

Optimization by a two-step iteration algorithm:

Step 1. Computing weights:

$$w_{kn}^{(\tau)}=\frac{e^{-\alpha (d_{kn}^{(\tau)})^2}}{\sum_i e^{-\alpha (d_{in}^{(\tau)})^2}} [1-\alpha((d_{kn}^{(\tau)})^2-\frac{\sum_i (d_{in}^{(\tau)})^2e^{-\alpha (d_{in}^{(\tau)})^2}}{\sum_i e^{-\alpha (d_{in}^{(\tau)})^2}})]$$

Step 2. Computing weighted centroids:

$$c_k^{(\tau+1)}=\frac{\sum_n w_{kn}^{(\tau)} x_n}{\sum_n w_{kn}^{(\tau)}},$$ where $x_n$ is the $n$-th data point.

EKM converges when centroids cease to change or the maximum number of iterations is reached. The time complexity of one iteration of the above two steps is $O(NK^2P)$ with $P$ being the data dimension.

Examples

How To Use (Python)

You can find the Python version in the "python" folder. EKM's functionality is packaged in "ekm_sklearn.py" in the sklearn style, and examples and benchmarks are provided.

How To Use (Matlab)

Install Matlab 2022a (or the latest version), and download this repository to your local directory.

Clustering a Dataset

You can find "ekm.m", a matlab function in which ekm is implemented, in the folder of "algorithms".

Below is an example of using EKM to cluster the iris dataset.

rng(0) % for reproducibility
addpath("./algorithms")
addpath("./metrics")
data = iris_dataset; % load the iris dataset
data = data(1:2,:); % only use the first two features for clustering the Iris dataset
data=data';

% normalization
for p=1:2
    data(:,p)=data(:,p)-mean(data(:,p));
    data(:,p)=data(:,p)/std(data(:,p));
end

% create labels of iris dataset
label_iris = ones(150,1);
label_iris(51:100)=2;
label_iris(101:150)=3;

% scatter plot of Iris
figure;
gscatter(data(:,1), data(:,2), label_iris);
title('Iris','FontSize',15)
xlabel('Normalized feature 1','FontSize',15)
ylabel('Normalized feature 2','FontSize',15)
legend off

% clustering by EKM
alpha=1;
K=3; % # of clusters
[label_ekm,C]=ekm(data,3);

% scatter diagram of EKM clustering
figure;
gscatter(data(:,1), data(:,2), label_ekm);
hold on
plot(C(:,1),C(:,2),'k+','MarkerSize',15,'LineWidth',3)
title('EKM clustering for Iris','FontSize',15)
xlabel('Normalized feature 1','FontSize',15)
ylabel('Normalized feature 2','FontSize',15)
legend off

Reproducing the Experiments

To replicate the experiments in the original paper, first put "reproduction", "algorithm", and "metrics" in the same directory (e.g., D:/git-EKM), open Matlab 2022a (or the latest version) and specify the working directory as "D:/git-EKM", then:

addpath("./algorithms")
addpath("./metrics")

Specify your Matlab working directory as "D:/git-EKM/reproduction"

If you want to replicate the clustering result on the "Ecoli" dataset, type the following code and enter in the command window

clustering_Ecoli

After the program is finished, you shall see the generated result files and folders in "./ecoli".

If you want to see the average of clustering quality, run

log_avg_best

If you want to see the average implementation time and the number of iterations, run

log_time

Citation Request

If you find this repo helpful, please cite our paper

@article{he2025equilibrium, title={An Equilibrium Approach to Clustering: Surpassing Fuzzy C-Means on Imbalanced Data}, author={He, Yudong}, journal={IEEE Transactions on Fuzzy Systems}, year={2025}, publisher={IEEE} }

@article{he2024imbalanced, title={Imbalanced Data Clustering using Equilibrium K-Means}, author={He, Yudong}, journal={arXiv preprint arXiv:2402.14490}, year={2024} }

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.vscode		.vscode
algorithms		algorithms
metrics		metrics
python		python
reproduction		reproduction
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repository Description

Equilibrium K-Means: A K-Means-type Clustering Algorithm for Imbalanced Data

The objective of EKM:

Optimization by a two-step iteration algorithm:

Examples

How To Use (Python)

How To Use (Matlab)

Clustering a Dataset

Reproducing the Experiments

Citation Request

About

Uh oh!

Releases

Packages

Languages

ydcnanhe/Imbalanced-Data-Clustering-using-Equilibrium-K-Means

Folders and files

Latest commit

History

Repository files navigation

Repository Description

Equilibrium K-Means: A K-Means-type Clustering Algorithm for Imbalanced Data

The objective of EKM:

Optimization by a two-step iteration algorithm:

Examples

How To Use (Python)

How To Use (Matlab)

Clustering a Dataset

Reproducing the Experiments

Citation Request

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages