Skip to content

Commit 1e07cc2

Browse files
committed
[AD] Fix the sync of AD data by moving the sync step at the end of the finalize phase ignoring failures.
Signed-off-by: Giacomo Marciani <[email protected]>
1 parent 5ae6249 commit 1e07cc2

File tree

4 files changed

+9
-7
lines changed

4 files changed

+9
-7
lines changed

CHANGELOG.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@ This file is used to list changes made in each version of the AWS ParallelCluste
3434
**BUG FIXES**
3535
- Fix issue making job fail when submitted as active directory user from login nodes.
3636
The issue was caused by an incomplete configuration of the integration with the external Active Directory on the head node.
37-
This fix comes with a breaking change: now cluster creation/update would fail if the integration with the Active Directory does not work.
3837

3938
3.8.0
4039
------

cookbooks/aws-parallelcluster-entrypoints/recipes/finalize.rb

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
end
2222

2323
include_recipe "aws-parallelcluster-platform::finalize"
24-
include_recipe "aws-parallelcluster-environment::finalize"
2524

2625
include_recipe 'aws-parallelcluster-slurm::finalize' if node['cluster']['scheduler'] == 'slurm'
26+
27+
include_recipe "aws-parallelcluster-environment::finalize"

cookbooks/aws-parallelcluster-environment/recipes/finalize/finalize_directory_service.rb

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,13 @@
2222
read_only_user = domain_service_read_only_user_name(node['cluster']['directory_service']['domain_read_only_user'])
2323

2424
execute 'Fetch user data from remote directory service' do
25-
# The switch-user (sudo -u) is necessary to trigger the fetching of AD data
25+
# The switch-user (sudo -u) is necessary to trigger the fetching of AD data.
26+
# Failures are ignored because we experimentally verified that a MsAD backend
27+
# may take long time to become available.
28+
# So, we prefer to execute this step in best effort mode.
29+
# Once we will reintroduce the failures, we should consider 30 retries with 10 seconds delay.
2630
command "sudo -u #{default_user} getent passwd #{read_only_user}"
2731
user 'root'
28-
retries 10 # Retries are just a safe guard in case the node is still fetching data from the AD
29-
retry_delay 3
32+
ignore_failure true
3033
end
3134
end

cookbooks/aws-parallelcluster-environment/spec/unit/recipes/finalize_directory_service_spec.rb

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,7 @@
3838
is_expected.to run_execute('Fetch user data from remote directory service').with(
3939
command: "sudo -u #{cluster_user} getent passwd #{domain_read_only_user}",
4040
user: 'root',
41-
retries: 10,
42-
retry_delay: 3
41+
ignore_failure: true
4342
)
4443
end
4544
else

0 commit comments

Comments
 (0)