Skip to content

Reference: Alpine cluster SLURM config #2

@jayhesselberth

Description

@jayhesselberth

Purpose

This issue captures the full SLURM configuration from CU Boulder's Alpine cluster as a working reference for Bodhi's SLURM setup. Alpine is a large production cluster (460 nodes, SLURM 24.11.5) with HA controllers and job submission from compute nodes working correctly.

See #1 for the specific compute-node job submission investigation this supports.

Alpine scontrol show config (2026-02-23)

Click to expand full config
Configuration data as of 2026-02-23T06:37:42
AccountingStorageBackupHost = (null)
AccountingStorageEnforce = associations,limits,qos,safe
AccountingStorageHost   = slurmdb1
AccountingStorageExternalHost = (null)
AccountingStorageParameters = (null)
AccountingStoragePort   = 6819
AccountingStorageTRES   = cpu,mem,energy,node,billing,fs/disk,vmem,pages,gres/gpu,gres/gpu:a100,gres/gpu:a100_3g.20gb,gres/gpu:gh200,gres/gpu:l40,gres/gpu:mi100,gres/gpumem,gres/gpuutil
AccountingStorageType   = accounting_storage/slurmdbd
AccountingStorageUser   = N/A
AccountingStoreFlags    = job_script
AcctGatherEnergyType    = (null)
AcctGatherFilesystemType = (null)
AcctGatherInterconnectType = (null)
AcctGatherNodeFreq      = 0 sec
AcctGatherProfileType   = (null)
AllowSpecResourcesUsage = no
AuthAltTypes            = auth/jwt
AuthAltParameters       = jwt_key=/etc/jwt_hs256.key
AuthInfo                = (null)
AuthType                = auth/munge
BatchStartTimeout       = 10 sec
BcastExclude            = /lib,/usr/lib,/lib64,/usr/lib64
BcastParameters         = (null)
BOOT_TIME               = 2026-02-16T14:17:24
BurstBufferType         = (null)
CertmgrParameters       = (null)
CertmgrType             = (null)
CliFilterPlugins         = (null)
ClusterName             = alpine
CommunicationParameters = (null)
CompleteWait            = 0 sec
CpuFreqDef              = Unknown
CpuFreqGovernors        = OnDemand,Performance,UserSpace
CredType                = cred/munge
DataParserParameters    = (null)
DebugFlags              = NO_CONF_HASH
DefMemPerNode           = UNLIMITED
DependencyParameters    = (null)
DisableRootJobs         = no
EioTimeout              = 60
EnforcePartLimits       = NO
EpilogMsgTime           = 2000 usec
FairShareDampeningFactor = 1
FederationParameters    = (null)
FirstJobId              = 1
GetEnvTimeout           = 2 sec
GresTypes               = gpu
GpuFreqDef              = (null)
GroupUpdateForce        = 1
GroupUpdateTime         = 600 sec
HASH_VAL                = Match
HashPlugin              = hash/k12
HealthCheckInterval     = 300 sec
HealthCheckNodeState    = ANY
HealthCheckProgram      = /usr/sbin/nhc
InactiveLimit           = 0 sec
InteractiveStepOptions  = --interactive --preserve-env --pty $SHELL
JobAcctGatherFrequency  = 15
JobAcctGatherType       = jobacct_gather/cgroup
JobAcctGatherParams     = (null)
JobCompHost             = localhost
JobCompLoc              = (null)
JobCompParams           = (null)
JobCompPort             = 0
JobCompType             = (null)
JobCompUser             = root
JobContainerType        = (null)
JobDefaults             = (null)
JobFileAppend           = 0
JobRequeue              = 1
JobSubmitPlugins        = lua
KillOnBadExit           = 0
KillWait                = 30 sec
LaunchParameters        = (null)
Licenses                = (null)
LogTimeFormat           = iso8601_ms
MailDomain              = (null)
MailProg                = /usr/bin/smail
MaxArraySize            = 4000001
MaxBatchRequeue         = 5
MaxDBDMsgs              = 101840
MaxJobCount             = 50000
MaxJobId                = 67043328
MaxMemPerNode           = UNLIMITED
MaxNodeCount            = 460
MaxStepCount            = 40000
MaxTasksPerNode         = 512
MCSPlugin               = (null)
MCSParameters           = (null)
MessageTimeout          = 90 sec
MinJobAge               = 300 sec
MpiDefault              = (null)
MpiParams               = (null)
NodeFeaturesPlugins     = (null)
OverTimeLimit           = 0 min
PluginDir               = /usr/lib64/slurm
PlugStackConfig         = (null)
PreemptMode             = REQUEUE
PreemptParameters       = (null)
PreemptType             = preempt/qos
PreemptExemptTime       = 00:00:00
PrEpParameters          = (null)
PrEpPlugins             = prep/script
PriorityParameters      = (null)
PrioritySiteFactorParameters = (null)
PrioritySiteFactorPlugin = (null)
PriorityDecayHalfLife   = 14-00:00:00
PriorityCalcPeriod      = 00:05:00
PriorityFavorSmall      = no
PriorityFlags           =
PriorityMaxAge          = 14-00:00:00
PriorityType            = priority/multifactor
PriorityUsageResetPeriod = NONE
PriorityWeightAge       = 20160
PriorityWeightAssoc     = 0
PriorityWeightFairShare = 20160
PriorityWeightJobSize   = 40320
PriorityWeightPartition = 0
PriorityWeightQOS       = 30240
PriorityWeightTRES      = (null)
PrivateData             = none
ProctrackType           = proctrack/cgroup
PrologEpilogTimeout     = 65534
PrologFlags             = Alloc,Contain,X11
PropagatePrioProcess    = 0
PropagateResourceLimits = NONE
PropagateResourceLimitsExcept = (null)
RebootProgram           = /usr/sbin/reboot
ReconfigFlags           = (null)
RequeueExit             = (null)
RequeueExitHold         = (null)
ResumeFailProgram       = (null)
ResumeProgram           = /curc/slurm/alpine/scripts/resume_node
ResumeRate              = 60 nodes/min
ResumeTimeout           = 60 sec
ResvEpilog              = (null)
ResvOverRun             = 0 min
ResvProlog              = (null)
ReturnToService         = 2
SchedulerParameters     = bf_max_job_test=12000,bf_max_job_user_part=200,bf_window=10080,bf_resolution=120,bf_continue,kill_invalid_depend,default_queue_depth=1000,max_switch_wait=604800,max_array_tasks=1000,bf_job_part_count_reserve=10
SchedulerTimeSlice      = 30 sec
SchedulerType           = sched/backfill
ScronParameters         = enable
SelectType              = select/cons_tres
SelectTypeParameters    = CR_CORE_MEMORY
SlurmUser               = slurm(515)
SlurmctldAddr           = (null)
SlurmctldDebug          = debug
SlurmctldHost[0]        = alpine-slurmctl1
SlurmctldHost[1]        = alpine-slurmctl2
SlurmctldLogFile        = (null)
SlurmctldPort           = 6817
SlurmctldSyslogDebug    = debug
SlurmctldPrimaryOffProg = (null)
SlurmctldPrimaryOnProg  = (null)
SlurmctldTimeout        = 120 sec
SlurmctldParameters     = idle_on_node_suspend
SlurmdDebug             = error
SlurmdLogFile           = (null)
SlurmdParameters        = (null)
SlurmdPidFile           = /var/run/slurmd.pid
SlurmdPort              = 6818
SlurmdSpoolDir          = /var/spool/slurmd
SlurmdSyslogDebug       = error
SlurmdTimeout           = 600 sec
SlurmdUser              = root(0)
SlurmSchedLogFile       = (null)
SlurmSchedLogLevel      = 0
SlurmctldPidFile        = /var/run/slurmctld.pid
SLURM_CONF              = /etc/slurm/slurm.conf
SLURM_VERSION           = 24.11.5
SrunEpilog              = (null)
SrunPortRange           = 0-0
SrunProlog              = (null)
StateSaveLocation       = /curc/slurm/alpine/state
SuspendExcNodes         = (null)
SuspendExcParts         = acompile,atesting,ahub,amilan,amilan128c,aa100,ami100,amem,amc,csu,rmacc,atesting_a100,atesting_mi100,gh200,al40,dtn
SuspendExcStates        = (null)
SuspendProgram          = /curc/slurm/alpine/scripts/suspend_node
SuspendRate             = 60 nodes/min
SuspendTime             = 3600 sec
SuspendTimeout          = 60 sec
SwitchParameters        = (null)
SwitchType              = (null)
TaskEpilog              = /etc/slurm/taskepilog
TaskPlugin              = task/cgroup
TaskPluginParam         = (null type)
TaskProlog              = /etc/slurm/taskprolog
TCPTimeout              = 2 sec
TLSParameters           = (null)
TLSType                 = tls/none
TmpFS                   = /tmp
TopologyParam           = TopoOptional
TopologyPlugin          = topology/tree
TrackWCKey              = no
TreeWidth               = 16
UsePam                  = yes
UnkillableStepProgram   = /curc/slurm/alpine/scripts/unkillable_step_program
UnkillableStepTimeout   = 500 sec
VSizeFactor             = 0 percent
WaitTime                = 0 sec
X11Parameters           = (null)

Cgroup Support Configuration:
AllowedRAMSpace         = 100.0%
AllowedSwapSpace        = 0.0%
CgroupMountpoint        = /sys/fs/cgroup
CgroupPlugin            = autodetect
ConstrainCores          = yes
ConstrainDevices        = yes
ConstrainRAMSpace       = yes
ConstrainSwapSpace      = yes
EnableControllers       = no
IgnoreSystemd           = no
IgnoreSystemdOnFailure  = no
MaxRAMPercent           = 100.0%
MaxSwapPercent          = 100.0%
MemorySwappiness        = (null)
MinRAMSpace             = 30MB
SystemdTimeout          = 1000 ms

MPI Plugins Configuration:
PMIxCliTmpDirBase       = (null)
PMIxCollFence           = (null)
PMIxDebug               = 0
PMIxDirectConn          = yes
PMIxDirectConnEarly     = no
PMIxDirectConnUCX       = no
PMIxDirectSameArch      = no
PMIxEnv                 = (null)
PMIxFenceBarrier        = no
PMIxNetDevicesUCX       = (null)
PMIxTimeout             = 300
PMIxTlsUCX             = (null)

Slurmctld(primary) at alpine-slurmctl1 is UP
Slurmctld(backup) at alpine-slurmctl2 is UP

Key Settings Relevant to Bodhi

Parameter Alpine Value Why It Matters
AuthType auth/munge Primary auth — munge must be running on all nodes
AuthAltTypes auth/jwt Fallback auth — helps with transient munge issues
SlurmctldPort 6817 Must be reachable from compute nodes
SlurmdPort 6818 Daemon port on compute nodes
MessageTimeout 90 sec 9x the default (10s) — critical under load
SlurmdTimeout 600 sec Generous timeout before marking nodes down
SlurmctldTimeout 120 sec HA failover window
ReturnToService 2 Nodes automatically return to service after being marked down
SrunPortRange 0-0 Not explicitly set — uses ephemeral ports
SelectType select/cons_tres Tracks CPU, memory, and GPU resources
JobAcctGatherType jobacct_gather/cgroup Cgroup-based resource tracking
TaskPlugin task/cgroup Cgroup-based task containment
PrologFlags Alloc,Contain,X11 Node preparation and containment
SchedulerType sched/backfill Standard backfill scheduler
SLURM_VERSION 24.11.5 Current version in use
HA Controllers alpine-slurmctl1, alpine-slurmctl2 Dual controller for high availability

Notes

  • Alpine uses DebugFlags = NO_CONF_HASH which disables config hash checking — useful during rolling updates but Bodhi should keep hash checking enabled for consistency
  • HealthCheckProgram = /usr/sbin/nhc (Node Health Check) runs every 300s
  • Cgroup configuration constrains cores, devices, RAM, and swap
  • MaxArraySize = 4000001 supports very large job arrays

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions