CellProfiler on HPC: A Complete Guide

This repository provides tools, scripts, and documentation for installing, configuring, and running CellProfiler in high-performance computing (HPC) environments. It is designed to support large-scale image analysis workflows, including Cell Painting and high-content screening, on SLURM-based clusters.

Repository Contents

scripts/ – Python scripts for executing CellProfiler in parallel
nf1_analysis.py – Main script to orchestrate plate-wise analysis
cp_parallel.py – Logic for multiprocessing CellProfiler runs
cp_sequential.py – Post-processing script to rename output files
slurm_scripts/ – SLURM submission templates for different configurations
README.md – Setup and usage instructions

Installation on Ubuntu 22.04 with Python 3.8

These instructions guide you through installing CellProfiler and its dependencies on Ubuntu 22.04.

Step 1: Update the system

sudo apt update
sudo apt -y upgrade

Step 2: Install required dependencies

sudo apt install -y build-essential python3-dev default-libmysqlclient-dev openjdk-11-jdk-headless libgtk-3-dev libnotify-dev libsdl2-dev

Step 3: Set environment variables for Java

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export PATH=$PATH:/home/ubuntu/.local/bin

To make these permanent, add the above lines to your .bashrc or .zshrc:

echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64' >> ~/.bashrc
echo 'export PATH=$PATH:/home/ubuntu/.local/bin' >> ~/.bashrc
source ~/.bashrc

For zsh:

echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64' >> ~/.zshrc
echo 'export PATH=$PATH:/home/ubuntu/.local/bin' >> ~/.zshrc
source ~/.zshrc

Step 4: Install wxPython

pip install https://extras.wxpython.org/wxPython4/extras/linux/gtk3/ubuntu-22.04/wxPython-4.2.0-cp38-cp38-linux_x86_64.whl

Step 5: Install CellProfiler

pip install cellprofiler

Optional: Use a Python virtual environment

To avoid package conflicts:

pip install virtualenv
virtualenv --python=python3.8 cellprofiler_env
source cellprofiler_env/bin/activate

Then re-run the steps above within the virtual environment.

HPC Usage Instructions from Institutional Guides

BioHPC at UT Southwestern

Guide: CellProfiler on BioHPC

Key steps:

Load CellProfiler and Anaconda modules:

module load anaconda/2020.11
module load cellprofiler/4.2.1

Use the provided SLURM template for job submission.

Rivanna HPC at University of Virginia

Guide: CellProfiler on Rivanna

Key steps:

Load the Apptainer module:
```
module load apptainer
```

Use CellProfiler in containerized form:

apptainer exec $HOME/cellprofiler/cp-4.2.5.sif cellprofiler -c -r -p pipeline.cppipe -i images/ -o output/

Hydra HPC at VUB

Key steps:

Apptainer module: Currently under development - LIVR
We can run a complete pipeline by making a conda environment with these requirements: https://github.com/WayScience/nf1_schwann_cell_painting_data/blob/main/environments/nf1_cellpainting_env.yml
Suggested partition config in Hydra: 2X Zen4: AMD EPYC 9384X (64 cores) + 64GB RAM (minimum)
Important:

More here on monitoring CellProfiler performance on Hydra - https://github.com/arka2696/CellProfiler-HPC/blob/main/CellProfilerHPC-Performance-Check.md

Example SLURM Script for VUB HPC

#!/bin/bash
#SBATCH --job-name=cellprofiler
#SBATCH --partition=zen4
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=64
#SBATCH --mem=64G
#SBATCH --time=72:00:00
#SBATCH --output=logs/%x_%j.log

# This path should be chnaged based on your needs
cd /scratch/brussel/vo/000/bvo00026/vsc11013/Imaging/cytomining/nf1_schwann_cell_painting_data/2.cellprofiler_analysis  

module purge
module load Mamba
source $EBROOTMAMBA/etc/profile.d/conda.sh
conda activate nf1_cellpainting_data

python scripts/nf1_analysis.py

Another Example:

#!/bin/bash
#SBATCH --job-name=plate3
#SBATCH --partition=zen4
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=16G
#SBATCH --time=4:00:00
#SBATCH --output=logs/%x_%j.out


module purge
module load Mamba
source $EBROOTMAMBA/etc/profile.d/conda.sh
conda activate cytomining_a

cd /scratch/brussel/vo/000/bvo00026/vsc11013/Data/cytomining_test/Profiling_project/1.cellprofiler_ic/image_quality_control

mkdir -p scripts

jupyter nbconvert --to python --output-dir=scripts/ *.ipynb

python scripts/0.whole_image_qc.py 

conda deactivate

export MAX_WORKERS=$SLURM_CPUS_PER_TASK

Alternate SLURM script ideas (Not tested on Hydra):

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=6 # number of cores (plates)
#SBATCH --mem-per-cpu=4.5G
#SBATCH --partition=amilan
#SBATCH --qos=long
#SBATCH --account=amc-general
#SBATCH --time=5-00:00:00 (days-hours:min:sec) # estimate time for one plate testing
#SBATCH --output=run_CP-%j.out

module load miniforge
conda init bash
# activate the CellProfiler environment
conda activate alsf_cp_env

jupyter nbconvert --to=script --FilesWriter.build_directory=nbconverted/ *.ipynb

cd nbconverted/ || exit

# Create the LoadData CSVs that will be used for running the CellProfiler pipeline
python 0.create_loaddata_csvs.py

# Perform CellProfiler in parallel for the plates
python 1.cp_analysis.py

cd ../ || exit
conda deactivate

echo "CP complete!"

Best Practices for HPC Deployment

Use job arrays for parallel plate-wise processing
Assign 8–16 cores per plate, depending on workload
Prefer local scratch storage for input/output
Monitor CPU and memory usage with sacct or htop
Consider limiting thread overuse with OMP_NUM_THREADS and OPENBLAS_NUM_THREADS

License

This repository is distributed under the MIT License.

Contributing

Contributions are welcome. Please open an issue or submit a pull request to discuss changes or additions.

Contact

If you have questions or suggestions, please use the GitHub issues page or email the repository maintainer.

---

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CellProfiler on HPC: A Complete Guide

Repository Contents

Installation on Ubuntu 22.04 with Python 3.8

Step 1: Update the system

Step 2: Install required dependencies

Step 3: Set environment variables for Java

Step 4: Install wxPython

Step 5: Install CellProfiler

Optional: Use a Python virtual environment

HPC Usage Instructions from Institutional Guides

BioHPC at UT Southwestern

Rivanna HPC at University of Virginia

Hydra HPC at VUB

Important:

Example SLURM Script for VUB HPC

Another Example:

Best Practices for HPC Deployment

License

Contributing

Contact

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

CellProfiler on HPC: A Complete Guide

Repository Contents

Installation on Ubuntu 22.04 with Python 3.8

Step 1: Update the system

Step 2: Install required dependencies

Step 3: Set environment variables for Java

Step 4: Install wxPython

Step 5: Install CellProfiler

Optional: Use a Python virtual environment

HPC Usage Instructions from Institutional Guides

BioHPC at UT Southwestern

Rivanna HPC at University of Virginia

Hydra HPC at VUB

Important:

Example SLURM Script for VUB HPC

Another Example:

Best Practices for HPC Deployment

License

Contributing

Contact