This repository contains datasets and code for reproducing the results in our paper
He Yudong, An Equilibrium Approach to Clustering: Surpassing Fuzzy C-Means on Imbalanced Data, IEEE Transactions on Fuzzy Systems, 2025.
He Yudong, Imbalanced Data Clustering Using Equilibrium K-Means, arXiv, 2024.
A Python package "sklekmeans" that implement euilibrium k-means, whose API is compatible with sklearn estimator, has been released. Check the repo or doc or PyPI project page for more details.
Step 1. Computing weights:
Step 2. Computing weighted centroids:
EKM converges when centroids cease to change or the maximum number of iterations is reached. The time complexity of one iteration of the above two steps is
You can find the Python version in the "python" folder. EKM's functionality is packaged in "ekm_sklearn.py" in the sklearn style, and examples and benchmarks are provided.
Install Matlab 2022a (or the latest version), and download this repository to your local directory.
You can find "ekm.m", a matlab function in which ekm is implemented, in the folder of "algorithms".
Below is an example of using EKM to cluster the iris dataset.
rng(0) % for reproducibility
addpath("./algorithms")
addpath("./metrics")
data = iris_dataset; % load the iris dataset
data = data(1:2,:); % only use the first two features for clustering the Iris dataset
data=data';
% normalization
for p=1:2
data(:,p)=data(:,p)-mean(data(:,p));
data(:,p)=data(:,p)/std(data(:,p));
end
% create labels of iris dataset
label_iris = ones(150,1);
label_iris(51:100)=2;
label_iris(101:150)=3;
% scatter plot of Iris
figure;
gscatter(data(:,1), data(:,2), label_iris);
title('Iris','FontSize',15)
xlabel('Normalized feature 1','FontSize',15)
ylabel('Normalized feature 2','FontSize',15)
legend off
% clustering by EKM
alpha=1;
K=3; % # of clusters
[label_ekm,C]=ekm(data,3);
% scatter diagram of EKM clustering
figure;
gscatter(data(:,1), data(:,2), label_ekm);
hold on
plot(C(:,1),C(:,2),'k+','MarkerSize',15,'LineWidth',3)
title('EKM clustering for Iris','FontSize',15)
xlabel('Normalized feature 1','FontSize',15)
ylabel('Normalized feature 2','FontSize',15)
legend off
To replicate the experiments in the original paper, first put "reproduction", "algorithm", and "metrics" in the same directory (e.g., D:/git-EKM), open Matlab 2022a (or the latest version) and specify the working directory as "D:/git-EKM", then:
addpath("./algorithms")
addpath("./metrics")
Specify your Matlab working directory as "D:/git-EKM/reproduction"
If you want to replicate the clustering result on the "Ecoli" dataset, type the following code and enter in the command window
clustering_Ecoli
After the program is finished, you shall see the generated result files and folders in "./ecoli".
If you want to see the average of clustering quality, run
log_avg_best
If you want to see the average implementation time and the number of iterations, run
log_time
If you find this repo helpful, please cite our paper
@article{he2025equilibrium, title={An Equilibrium Approach to Clustering: Surpassing Fuzzy C-Means on Imbalanced Data}, author={He, Yudong}, journal={IEEE Transactions on Fuzzy Systems}, year={2025}, publisher={IEEE} }
@article{he2024imbalanced, title={Imbalanced Data Clustering using Equilibrium K-Means}, author={He, Yudong}, journal={arXiv preprint arXiv:2402.14490}, year={2024} }



