CellProfiler-HPC-VSC/CellProfilerHPC-Performance-Check.md at main · LIVR-VUB/CellProfiler-HPC-VSC

Monitoring CellProfiler on Hydra-HPC (Core Status and Parallel Processing)

1. While the Job is Running: Use `sstat` or `htop`

Option A: Monitor Live Job Stats with `sstat`

sstat -j <jobid> --format=AveCPU,AveRSS,MaxRSS,Elapsed

<jobid> is the numeric SLURM job ID (you get this after sbatch submission).
AveCPU shows average CPU usage
AveRSS and MaxRSS show average and peak memory usage
Elapsed shows how long the job has been running

Example:

sstat -j 12345678 --format=AveCPU,AveRSS,MaxRSS,Elapsed

Option B: Use `htop` on the Node (if you have SSH access to the compute node)

Find the node your job is running on:

squeue -u $USER

SSH into that node (if allowed):

ssh <nodename>

Run:

htop

Then press:

F2 to configure
F6 to sort by CPU usage
Look for your job’s Python processes using up the cores

If you don’t have direct access to compute nodes, stick to sstat.

2. Inside Your Python Script: Print or Log CPU Info

Add this at the top of nf1_analysis.py:

import os
import multiprocessing

print(f"Total CPUs visible to job: {os.cpu_count()}")
print(f"MAX_WORKERS from env: {os.environ.get('MAX_WORKERS')}")
print(f"Number of plates to process: {len(plate_info_dictionary)}")

And right before launching ProcessPoolExecutor, log:

from datetime import datetime
print(f"Launching ProcessPoolExecutor with {num_processes} workers at {datetime.now()}")

3. Log CPU Allocation to Your SLURM Output

Everything printed with print() in Python will go into your SLURM log file:

#SBATCH --output=logs/%x_%j.out

Check it with:

less logs/plate3_<jobid>.out

4. After the Job Finishes: Use `sacct`

sacct -j <jobid> --format=JobID,JobName%20,Elapsed,TotalCPU,MaxRSS,AveRSS

Example:

sacct -j 12345678 --format=JobID,JobName%20,Elapsed,TotalCPU,MaxRSS,AveRSS

This gives:

Total CPU time used
Max and average memory
Wall-clock time

Summary Table

Tool	Use	Command Example
`sstat`	Live job stats	`sstat -j <jobid>`
`squeue`	See node used	`squeue -u $USER`
`htop`	Live CPU usage (if on node)	`htop`
`sacct`	After job finishes	`sacct -j <jobid>`
`print()`	From within script	View in log file

Command for checking SLURM job output:

To watch the real-time output of a Slurm job .out file (like plate3_10792107.out), you can use the tail command with the -f (follow) flag:

tail -f plate3_10792107.out

Explanation:

tail -f continuously displays new lines as they are added to the file.
Press Ctrl+C to stop watching the file.

Optional enhancements:

If you want to see the last 50 lines first, and then follow new output:

tail -n 50 -f plate3_10792107.out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring CellProfiler on Hydra-HPC (Core Status and Parallel Processing)

1. While the Job is Running: Use `sstat` or `htop`

Option A: Monitor Live Job Stats with `sstat`

Option B: Use `htop` on the Node (if you have SSH access to the compute node)

2. Inside Your Python Script: Print or Log CPU Info

3. Log CPU Allocation to Your SLURM Output

4. After the Job Finishes: Use `sacct`

Summary Table

Command for checking SLURM job output:

Explanation:

Optional enhancements:

FilesExpand file tree

CellProfilerHPC-Performance-Check.md

Latest commit

History

CellProfilerHPC-Performance-Check.md

File metadata and controls

Monitoring CellProfiler on Hydra-HPC (Core Status and Parallel Processing)

1. While the Job is Running: Use sstat or htop

Option A: Monitor Live Job Stats with sstat

Option B: Use htop on the Node (if you have SSH access to the compute node)

2. Inside Your Python Script: Print or Log CPU Info

3. Log CPU Allocation to Your SLURM Output

4. After the Job Finishes: Use sacct

Summary Table

Command for checking SLURM job output:

Explanation:

Optional enhancements:

1. While the Job is Running: Use `sstat` or `htop`

Option A: Monitor Live Job Stats with `sstat`

Option B: Use `htop` on the Node (if you have SSH access to the compute node)

4. After the Job Finishes: Use `sacct`