BICL-SDP is an exact algorithm, based on the branch-and-cut technique, for solving the biclustering problem through the
Sudoso, A. M. (2024). A Semidefinite Programming-Based Branch-and-Cut Algorithm for Biclustering. INFORMS Journal on Computing, https://doi.org/10.1287/ijoc.2024.0683.
BICL-SDP calls the semidefinite programming solver SDPNAL+ by using the MATLAB Engine API for C++. It requires the MATLAB engine library libMatlabEngine and the Matlab Data Array library libMatlabDataArray. BICL-SDP calls the linear programming solver Gurobi and uses Armadillo to handle matrices and linear algebra operations efficiently.
Ubuntu and Debian instructions:
-
Install MATLAB (>= 2021b)
-
Install Gurobi (>= 10.0.2)
-
Install CMake, OpenBLAS, LAPACK and Armadillo:
sudo apt-get update
sudo apt-get install cmake libopenblas-dev liblapack-dev libarmadillo-dev
-
Open the makefile
biclustering_cpp/Makefile- Set the variable
matlab_pathwith your MATLAB folder.
- Set the variable
-
Compile the code:
cd biclustering_cpp/
make
- Download SDPNAL+, move the folder
biclustering_matlabcontaining the MATLAB source code of BICL-SDP in the SDPNAL+ main directory and set the parameterSDP_SOLVER_FOLDERof the configuration file accordingly. This folder and its subfolders will be automatically added to the MATLAB search path when BICL-SDP starts.
This code has been tested under Ubuntu 22.04 LTS with MATLAB R2021b, Gurobi 10.0.2 and Armadillo 12.6.
Various parameters used in BICL-SDP can be modified in the configuration file biclustering_cpp/config.txt:
BRANCH_AND_BOUND_TOL- optimality tolerance of the exact algorithmBRANCH_AND_BOUND_PARALLEL- thread pool size: single thread (1), multi-thread (> 1)BRANCH_AND_BOUND_MAX_NODES- maximum number of nodesBRANCH_AND_BOUND_VISITING_STRATEGY- best first (0), depth first (1), breadth first (2)MATLAB_SESSION_THREADS_ROOT- number of threads for the MATLAB session at the root noeeMATLAB_SESSION_THREADS_CHILD- number of threads for the MATLAB session for child nodesSDP_SOLVER_FOLDER- full path of SDPNAL+ folderSDP_SOLVER_TOL- accuracy of SDPNAL+ in the relative KKT residualSDP_SOLVER_VERBOSE- do not display log (0), display log (1)CP_MAX_ITER- maximum number of cutting-plane iterationsCP_TOL- tolerance between two consecutive cutting-plane iterationsCP_MAX_INEQ- maximum number of valid inequalities to separateCP_PERC_INEQ- fraction of the most violated inequalities to addCP_EPS_INEQ- tolerance for checking the violation of the inequalitiesCP_EPS_ACTIVE- tolerance for detecting active inequalitiesGUROBI_FOLDER- Gurobi solver pathGUROBI_VERBOSE- do not display log (0), display log (1)
cd biclustering_cpp/
./bb <W_PATH> <K> <LOG_PATH> <RESULT_PATH>
W_PATH- full path of the data matrixK- number of biclustersLOG_PATH- path of the log fileRESULT_PATH- path of the optimal bicluster assignment matrices
File W_PATH contains the weights w_ij and the must include an header line with the number of rows n and columns m:
n m
w_11 w_12 ... w_1m
w_21 w_22 ... w_2m
...
...
w_n1 w_n2 ... w_nm
The log file reports the progress of the algorithm:
N- number of rows at the current nodeM- number of columns at the current nodeID_PAR- id of the parent nodeID- id of the current nodeUB_PAR- upper bound of the parent nodeUB- upper bound of the current nodeTIME (s)- running time in seconds of the current nodeCP_ITER- number of cutting-plane iterationsCP_FLAG- termination flag of the cutting-plane procedure-2- maximum number of iterations-1- SDP not solved or partially solved successfully0- no violated inequalities1- node must be pruned2- upper bound greater than the previous one3- upper bound decrease is not sufficiently large
CP_INEQ- number of inequalities added in the last cutting-plane iterationLB- current lower boundBEST_LB- global lower boundSET- vertex set selection for branchingU- branch on the vertices in UV- branch on the vertices in V-1- branching is not needed
I J- indices of branching decisionNODE_GAP- gap at the current nodeGAP- overall gapOPEN- number of open nodes