File tree Expand file tree Collapse file tree 4 files changed +9
-7
lines changed
aws-parallelcluster-entrypoints/recipes
aws-parallelcluster-environment Expand file tree Collapse file tree 4 files changed +9
-7
lines changed Original file line number Diff line number Diff line change @@ -34,7 +34,6 @@ This file is used to list changes made in each version of the AWS ParallelCluste
3434** BUG FIXES**
3535- Fix issue making job fail when submitted as active directory user from login nodes.
3636 The issue was caused by an incomplete configuration of the integration with the external Active Directory on the head node.
37- This fix comes with a breaking change: now cluster creation/update would fail if the integration with the Active Directory does not work.
3837
39383.8.0
4039------
Original file line number Diff line number Diff line change 2121end
2222
2323include_recipe "aws-parallelcluster-platform::finalize"
24- include_recipe "aws-parallelcluster-environment::finalize"
2524
2625include_recipe 'aws-parallelcluster-slurm::finalize' if node [ 'cluster' ] [ 'scheduler' ] == 'slurm'
26+
27+ include_recipe "aws-parallelcluster-environment::finalize"
Original file line number Diff line number Diff line change 2222 read_only_user = domain_service_read_only_user_name ( node [ 'cluster' ] [ 'directory_service' ] [ 'domain_read_only_user' ] )
2323
2424 execute 'Fetch user data from remote directory service' do
25- # The switch-user (sudo -u) is necessary to trigger the fetching of AD data
25+ # The switch-user (sudo -u) is necessary to trigger the fetching of AD data.
26+ # Failures are ignored because we experimentally verified that a MsAD backend
27+ # may take long time to become available.
28+ # So, we prefer to execute this step in best effort mode.
29+ # Once we will reintroduce the failures, we should consider 30 retries with 10 seconds delay.
2630 command "sudo -u #{ default_user } getent passwd #{ read_only_user } "
2731 user 'root'
28- retries 10 # Retries are just a safe guard in case the node is still fetching data from the AD
29- retry_delay 3
32+ ignore_failure true
3033 end
3134end
Original file line number Diff line number Diff line change 3838 is_expected . to run_execute ( 'Fetch user data from remote directory service' ) . with (
3939 command : "sudo -u #{ cluster_user } getent passwd #{ domain_read_only_user } " ,
4040 user : 'root' ,
41- retries : 10 ,
42- retry_delay : 3
41+ ignore_failure : true
4342 )
4443 end
4544 else
You can’t perform that action at this time.
0 commit comments