sstat -j <jobid> --format=AveCPU,AveRSS,MaxRSS,Elapsed<jobid>is the numeric SLURM job ID (you get this aftersbatchsubmission).AveCPUshows average CPU usageAveRSSandMaxRSSshow average and peak memory usageElapsedshows how long the job has been running
Example:
sstat -j 12345678 --format=AveCPU,AveRSS,MaxRSS,Elapsed- Find the node your job is running on:
squeue -u $USER- SSH into that node (if allowed):
ssh <nodename>- Run:
htopThen press:
F2to configureF6to sort by CPU usage- Look for your job’s Python processes using up the cores
If you don’t have direct access to compute nodes, stick to
sstat.
Add this at the top of nf1_analysis.py:
import os
import multiprocessing
print(f"Total CPUs visible to job: {os.cpu_count()}")
print(f"MAX_WORKERS from env: {os.environ.get('MAX_WORKERS')}")
print(f"Number of plates to process: {len(plate_info_dictionary)}")And right before launching ProcessPoolExecutor, log:
from datetime import datetime
print(f"Launching ProcessPoolExecutor with {num_processes} workers at {datetime.now()}")Everything printed with print() in Python will go into your SLURM log file:
#SBATCH --output=logs/%x_%j.outCheck it with:
less logs/plate3_<jobid>.outsacct -j <jobid> --format=JobID,JobName%20,Elapsed,TotalCPU,MaxRSS,AveRSSExample:
sacct -j 12345678 --format=JobID,JobName%20,Elapsed,TotalCPU,MaxRSS,AveRSSThis gives:
- Total CPU time used
- Max and average memory
- Wall-clock time
| Tool | Use | Command Example |
|---|---|---|
sstat |
Live job stats | sstat -j <jobid> |
squeue |
See node used | squeue -u $USER |
htop |
Live CPU usage (if on node) | htop |
sacct |
After job finishes | sacct -j <jobid> |
print() |
From within script | View in log file |
To watch the real-time output of a Slurm job .out file (like plate3_10792107.out), you can use the tail command with the -f (follow) flag:
tail -f plate3_10792107.outtail -fcontinuously displays new lines as they are added to the file.- Press
Ctrl+Cto stop watching the file.
If you want to see the last 50 lines first, and then follow new output:
tail -n 50 -f plate3_10792107.out