Skip to content

Commit d7f6f7b

Browse files
Add instance id and instance type to slurmd
The information will be shown on `scontrol show nodes`: ``` NodeName=queue-on-demand-dy-compute-resource-2-2 Arch=x86_64 CoresPerSocket=1 CPUAlloc=4 CPUEfctv=4 CPUTot=4 CPULoad=0.95 AvailableFeatures=dynamic,c5.xlarge,compute-resource-2 ActiveFeatures=dynamic,c5.xlarge,compute-resource-2 Gres=(null) NodeAddr=192.168.127.110 NodeHostName=queue-on-demand-dy-compute-resource-2-2 Version=24.05.2 OS=Linux 5.10.233-224.894.amzn2.x86_64 #1 SMP Mon Jan 27 16:52:48 UTC 2025 RealMemory=7782 AllocMem=0 FreeMem=6431 Sockets=4 Boards=1 State=ALLOCATED+CLOUD ThreadsPerCore=1 TmpDisk=0 Weight=1000 Owner=N/A MCS_label=N/A Partitions=queue-on-demand BootTime=2025-02-10T21:22:00 SlurmdStartTime=2025-02-10T21:25:05 LastBusyTime=2025-02-10T21:25:05 ResumeAfterTime=None CfgTRES=cpu=4,mem=7782M,billing=4 AllocTRES=cpu=4 CurrentWatts=0 AveWatts=0 InstanceId=i-0eb8d995282xxxx11 InstanceType=c5.xlarge ``` reference: https://slurm.schedmd.com/slurmd.html Signed-off-by: Hanwen <[email protected]>
1 parent 3a6db7c commit d7f6f7b

File tree

3 files changed

+7
-2
lines changed

3 files changed

+7
-2
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ This file is used to list changes made in each version of the AWS ParallelCluste
2020
- Upgrade Pmix to 5.0.6 (from 5.0.3).
2121
- Upgrade ARM PL to version 24.10 (from 23.10).
2222
- Remove generation of DSA keys for login nodes as DSA, which became unsupported in OpenSSH 9.7+.
23+
- Set instance ID and instance type information in Slurm upon compute nodes launch.
2324

2425
3.12.0
2526
------

cookbooks/aws-parallelcluster-slurm/spec/unit/recipes/config_slurmd_systemd_service_spec.rb

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,11 @@
2121
for_all_oses do |platform, version|
2222
context "on #{platform}#{version}" do
2323
cached(:chef_run) do
24-
runner(platform: platform, version: version).converge(described_recipe)
24+
runner = runner(platform: platform, version: version) do |node|
25+
node.override['ec2']['instance_id'] = "i-xxx"
26+
node.override['ec2']['instance_type'] = "fake-instance-type"
27+
end
28+
runner.converge(described_recipe)
2529
end
2630

2731
it 'creates the service definition for slurmd' do

cookbooks/aws-parallelcluster-slurm/templates/default/slurm/compute/slurmd.service.erb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ConditionPathExists=<%= node['cluster']['slurm']['install_dir'] %>/etc/slurm.con
77
[Service]
88
Type=simple
99
EnvironmentFile=-/etc/sysconfig/slurmd
10-
ExecStart=<%= node['cluster']['slurm']['install_dir'] %>/sbin/slurmd -D -s $SLURMD_OPTIONS
10+
ExecStart=<%= node['cluster']['slurm']['install_dir'] %>/sbin/slurmd -D -s $SLURMD_OPTIONS --instance-id <%= node['ec2']['instance_id'] %> --instance-type <%= node['ec2']['instance_type'] %>
1111
ExecReload=/bin/kill -HUP $MAINPID
1212
KillMode=process
1313
LimitNOFILE=131072

0 commit comments

Comments
 (0)