Skip to content

Latest commit

 

History

History
147 lines (95 loc) · 3.05 KB

File metadata and controls

147 lines (95 loc) · 3.05 KB

Monitoring CellProfiler on Hydra-HPC (Core Status and Parallel Processing)


1. While the Job is Running: Use sstat or htop

Option A: Monitor Live Job Stats with sstat

sstat -j <jobid> --format=AveCPU,AveRSS,MaxRSS,Elapsed
  • <jobid> is the numeric SLURM job ID (you get this after sbatch submission).
  • AveCPU shows average CPU usage
  • AveRSS and MaxRSS show average and peak memory usage
  • Elapsed shows how long the job has been running

Example:

sstat -j 12345678 --format=AveCPU,AveRSS,MaxRSS,Elapsed

Option B: Use htop on the Node (if you have SSH access to the compute node)

  1. Find the node your job is running on:
squeue -u $USER
  1. SSH into that node (if allowed):
ssh <nodename>
  1. Run:
htop

Then press:

  • F2 to configure
  • F6 to sort by CPU usage
  • Look for your job’s Python processes using up the cores

If you don’t have direct access to compute nodes, stick to sstat.


2. Inside Your Python Script: Print or Log CPU Info

Add this at the top of nf1_analysis.py:

import os
import multiprocessing

print(f"Total CPUs visible to job: {os.cpu_count()}")
print(f"MAX_WORKERS from env: {os.environ.get('MAX_WORKERS')}")
print(f"Number of plates to process: {len(plate_info_dictionary)}")

And right before launching ProcessPoolExecutor, log:

from datetime import datetime
print(f"Launching ProcessPoolExecutor with {num_processes} workers at {datetime.now()}")

3. Log CPU Allocation to Your SLURM Output

Everything printed with print() in Python will go into your SLURM log file:

#SBATCH --output=logs/%x_%j.out

Check it with:

less logs/plate3_<jobid>.out

4. After the Job Finishes: Use sacct

sacct -j <jobid> --format=JobID,JobName%20,Elapsed,TotalCPU,MaxRSS,AveRSS

Example:

sacct -j 12345678 --format=JobID,JobName%20,Elapsed,TotalCPU,MaxRSS,AveRSS

This gives:

  • Total CPU time used
  • Max and average memory
  • Wall-clock time

Summary Table

Tool Use Command Example
sstat Live job stats sstat -j <jobid>
squeue See node used squeue -u $USER
htop Live CPU usage (if on node) htop
sacct After job finishes sacct -j <jobid>
print() From within script View in log file

Command for checking SLURM job output:

To watch the real-time output of a Slurm job .out file (like plate3_10792107.out), you can use the tail command with the -f (follow) flag:

tail -f plate3_10792107.out

Explanation:

  • tail -f continuously displays new lines as they are added to the file.
  • Press Ctrl+C to stop watching the file.

Optional enhancements:

If you want to see the last 50 lines first, and then follow new output:

tail -n 50 -f plate3_10792107.out