Skip to content

Commit 042a15c

Browse files
committed
[Fault Tolerance] Add retry with delay to the block to copy Munge key and the blocks to start Chronyd and Munge services.
Signed-off-by: Giacomo Marciani <[email protected]>
1 parent 937f4b2 commit 042a15c

File tree

2 files changed

+6
-0
lines changed

2 files changed

+6
-0
lines changed

cookbooks/aws-parallelcluster-platform/resources/chrony/partial/_chrony_common.rb

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@
3939
supports restart: false
4040
reload_command chrony_reload_command
4141
action %i(enable start)
42+
retries 5
43+
retry_delay 10
4244
end unless redhat_on_docker?
4345
end
4446

cookbooks/aws-parallelcluster-slurm/libraries/helpers.rb

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,8 @@ def enable_munge_service
6060
service "munge" do
6161
supports restart: true
6262
action %i(enable start)
63+
retries 5
64+
retry_delay 10
6365
end
6466
end
6567

@@ -111,6 +113,8 @@ def setup_munge_compute_node
111113
# Enforce correct permission on the key
112114
chmod 0600 /etc/munge/munge.key
113115
COMPUTE_MUNGE_KEY
116+
retries 5
117+
retry_delay 10
114118
end
115119

116120
enable_munge_service

0 commit comments

Comments
 (0)