This repository provides tools, scripts, and documentation for installing, configuring, and running CellProfiler in high-performance computing (HPC) environments. It is designed to support large-scale image analysis workflows, including Cell Painting and high-content screening, on SLURM-based clusters.
scripts/– Python scripts for executing CellProfiler in parallelnf1_analysis.py– Main script to orchestrate plate-wise analysiscp_parallel.py– Logic for multiprocessing CellProfiler runscp_sequential.py– Post-processing script to rename output filesslurm_scripts/– SLURM submission templates for different configurationsREADME.md– Setup and usage instructions
These instructions guide you through installing CellProfiler and its dependencies on Ubuntu 22.04.
sudo apt update
sudo apt -y upgradesudo apt install -y build-essential python3-dev default-libmysqlclient-dev openjdk-11-jdk-headless libgtk-3-dev libnotify-dev libsdl2-devexport JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export PATH=$PATH:/home/ubuntu/.local/binTo make these permanent, add the above lines to your .bashrc or .zshrc:
echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64' >> ~/.bashrc
echo 'export PATH=$PATH:/home/ubuntu/.local/bin' >> ~/.bashrc
source ~/.bashrcFor zsh:
echo 'export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64' >> ~/.zshrc
echo 'export PATH=$PATH:/home/ubuntu/.local/bin' >> ~/.zshrc
source ~/.zshrcpip install https://extras.wxpython.org/wxPython4/extras/linux/gtk3/ubuntu-22.04/wxPython-4.2.0-cp38-cp38-linux_x86_64.whlpip install cellprofilerTo avoid package conflicts:
pip install virtualenv
virtualenv --python=python3.8 cellprofiler_env
source cellprofiler_env/bin/activateThen re-run the steps above within the virtual environment.
Guide: CellProfiler on BioHPC
Key steps:
-
Load CellProfiler and Anaconda modules:
module load anaconda/2020.11 module load cellprofiler/4.2.1
-
Use the provided SLURM template for job submission.
Guide: CellProfiler on Rivanna
Key steps:
-
Load the Apptainer module:
module load apptainer
-
Use CellProfiler in containerized form:
apptainer exec $HOME/cellprofiler/cp-4.2.5.sif cellprofiler -c -r -p pipeline.cppipe -i images/ -o output/
Key steps:
-
Apptainer module: Currently under development - LIVR
-
We can run a complete pipeline by making a conda environment with these requirements: https://github.com/WayScience/nf1_schwann_cell_painting_data/blob/main/environments/nf1_cellpainting_env.yml
-
Suggested partition config in Hydra: 2X Zen4: AMD EPYC 9384X (64 cores) + 64GB RAM (minimum)
-
More here on monitoring CellProfiler performance on Hydra - https://github.com/arka2696/CellProfiler-HPC/blob/main/CellProfilerHPC-Performance-Check.md
#!/bin/bash
#SBATCH --job-name=cellprofiler
#SBATCH --partition=zen4
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=64
#SBATCH --mem=64G
#SBATCH --time=72:00:00
#SBATCH --output=logs/%x_%j.log
# This path should be chnaged based on your needs
cd /scratch/brussel/vo/000/bvo00026/vsc11013/Imaging/cytomining/nf1_schwann_cell_painting_data/2.cellprofiler_analysis
module purge
module load Mamba
source $EBROOTMAMBA/etc/profile.d/conda.sh
conda activate nf1_cellpainting_data
python scripts/nf1_analysis.py#!/bin/bash
#SBATCH --job-name=plate3
#SBATCH --partition=zen4
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=16G
#SBATCH --time=4:00:00
#SBATCH --output=logs/%x_%j.out
module purge
module load Mamba
source $EBROOTMAMBA/etc/profile.d/conda.sh
conda activate cytomining_a
cd /scratch/brussel/vo/000/bvo00026/vsc11013/Data/cytomining_test/Profiling_project/1.cellprofiler_ic/image_quality_control
mkdir -p scripts
jupyter nbconvert --to python --output-dir=scripts/ *.ipynb
python scripts/0.whole_image_qc.py
conda deactivate
export MAX_WORKERS=$SLURM_CPUS_PER_TASKAlternate SLURM script ideas (Not tested on Hydra):
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=6 # number of cores (plates)
#SBATCH --mem-per-cpu=4.5G
#SBATCH --partition=amilan
#SBATCH --qos=long
#SBATCH --account=amc-general
#SBATCH --time=5-00:00:00 (days-hours:min:sec) # estimate time for one plate testing
#SBATCH --output=run_CP-%j.out
module load miniforge
conda init bash
# activate the CellProfiler environment
conda activate alsf_cp_env
jupyter nbconvert --to=script --FilesWriter.build_directory=nbconverted/ *.ipynb
cd nbconverted/ || exit
# Create the LoadData CSVs that will be used for running the CellProfiler pipeline
python 0.create_loaddata_csvs.py
# Perform CellProfiler in parallel for the plates
python 1.cp_analysis.py
cd ../ || exit
conda deactivate
echo "CP complete!"- Use job arrays for parallel plate-wise processing
- Assign 8–16 cores per plate, depending on workload
- Prefer local scratch storage for input/output
- Monitor CPU and memory usage with
sacctorhtop - Consider limiting thread overuse with
OMP_NUM_THREADSandOPENBLAS_NUM_THREADS
This repository is distributed under the MIT License.
Contributions are welcome. Please open an issue or submit a pull request to discuss changes or additions.
If you have questions or suggestions, please use the GitHub issues page or email the repository maintainer.
---