-
Notifications
You must be signed in to change notification settings - Fork 109
[Scaling] Remove usage of cfn-init in Compute Fleet #2875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Fixed
Show fixed
Hide fixed
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Fixed
Show fixed
Hide fixed
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Fixed
Show fixed
Hide fixed
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Fixed
Show fixed
Hide fixed
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/resources/cfn_hup_configuration.rb
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/test/controls/cfn_hup_configuration_spec.rb
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/test/controls/cfn_hup_configuration_spec.rb
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/resources/cfn_hup_configuration.rb
Outdated
Show resolved
Hide resolved
...oks/aws-parallelcluster-environment/templates/cfn_hup_configuration/cfn-hook-update.conf.erb
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Fixed
Show fixed
Hide fixed
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Fixed
Show fixed
Hide fixed
cookbooks/aws-parallelcluster-platform/resources/fetch_dna_files.rb
Outdated
Show resolved
Hide resolved
| import logging | ||
| from retrying import retry | ||
|
|
||
| SHARED_LOCATION = "/opt/parallelcluster/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the shared folder is defined by the cookbook attribute default['cluster']['shared_dir'] = "#{node['cluster']['base_dir']}/shared" , so it must be set by the cookbook attribute.
I'm aware that elsewhere in the cookbook we did not comply with this best practice, but since we are here, let's do it the best way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cant without converting it into erb file and if I do that I wont be able to write python unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about making is a script argument passed when it is invoked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In future iterations I want to run the script for LoginNodes too and I dont see the point in re-surfacing this value as an argument for both Login and Compute Nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for doing that is because the cluster creation is supposed to succeed even if I change this path via a custom cookbook attribute. That said, I agree we can address this in a follow up Pr because we have many other scripts were we hard wired this path rather than taking it from the attribtues, so you're not introducing any regression.
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
...luster-environment/templates/cfn_hup_configuration/ComputeFleet/cfn-hup-update-action.sh.erb
Outdated
Show resolved
Hide resolved
cookbooks/aws-parallelcluster-environment/files/cfn_hup_configuration/get_compute_user_data.py
Outdated
Show resolved
Hide resolved
2ede568 to
a89978c
Compare
* Separate cfn-hup update hook for ComputeFleet * Add `get_compute_user_data.py` script to parse and get LaunchTemplates and parse them to write relevant DNA files. * Add invocation of script get_compute_user_data.py by headNode during an update * Writing dna.json files for each Launch template * Using launch template logical id for update action script * Update cfn-hup hook action script for Compute * chnage the owner, group and mode of dna and extra files in tmp * Share extra.json to Compute nodes * adding cleanup operation after an update * Update config_cfn_hup to be streamlined for node-specific configuration files
…d cleaning up dna.json and extra.json during an update * Renaming the files and folders to cfn_hup_configuration * Deleting old recipie config_cfn_hup_spec.rb
a89978c to
a9bb8a6
Compare
64f6bba to
177c975
Compare
* Correcting Kitchen and Unit tests * Adding share_compute_fleet_dna.py for tox checks
177c975 to
016d25d
Compare
|
Added Same reason for adding skip-* labels in this PR |
Description of changes
Changing the Create and Update Path of Compute Nodes in a cluster as we need to remove usage of cfn-init due to CFN API throttling issues and improve fleet scaling time.
For
Create Pathwe revert to a approach we used in ParallelCluster 3.8.0* create shared sub-directory
/opt/parallelcluster/shared/dnawhich is used for storing dna.json* create script
/opt/parallelcluster/scripts/share_compute_fleet_dna.pywhich is executed only by root user on HeadNode* Create
/opt/parallelcluster/scripts/cfn-hup-update-action.shwhich is executed only by root user on Compute node . This script monitors the shared /dna directory and runs cookbook Update recipes on the node.For
Update Pathwe will rely on HeadNode to share dna.json and extra.json for each node as per their Launch Templatesusing EC2 DescribeLaunchTemplateVersions API
* HeadNode run a new script to get latest dna.json and store it in shared directory ( as part of fetch_dna_files resource )
* cfn-hup invokes an update hook action script which monitors the shared directory for checking latest dna.json files and runs cookbook update recipes.
Dependent on CLI aws/aws-parallelcluster#6655
Tests
Same as aws/aws-parallelcluster#6655
References
Checklist
developadd the branch name as prefix in the PR title (e.g.[release-3.6]).Please review the guidelines for contributing and Pull Request Instructions.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.