-
Notifications
You must be signed in to change notification settings - Fork 38
Ansible-init compute node script #476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2fb64d5 to
998ebf1
Compare
|
Tested by running (from deploy host) Then in the compute node: Check status of nodes: Run tests (from deploy host): |
1e08903 to
a32e309
Compare
fd4ee65 to
61392ed
Compare
b3514e6 to
134515d
Compare
3f97454 to
e3ce492
Compare
| group: root | ||
| mode: 0644 | ||
| loop: | ||
| - ../../basic_users/library/terminate_user_sessions.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this; there's no way there can be sessions for local users we want to remove running on boot, which is the only case we need this
| loop: | ||
| - ../../basic_users/library/terminate_user_sessions.py | ||
| - ../../stackhpc.os-manila-mount/library/os_manila_share.py | ||
| - ../../stackhpc.openhpc/library/sacct_cluster.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, not used at all (as shown by grep). Some sleuthing found the taskfile which used it was removed in stackhpc.openhpc v0.22 as no longer required, so we should delete it from that role!
| mode: 0644 | ||
| loop: | ||
| - ../../basic_users/filter_plugins/filter_keys.py | ||
| - ../../stackhpc.openhpc/filter_plugins/slurm_conf.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So some grepping shows this provides:
- hostlist_expression: only used for control node templating slurm.conf and gres.conf, not relevant here
- dict2parameters: only used for control node templating slurm.conf, not relevant here
- error: can't find where this is used
So remove?
| - ../../basic_users/filter_plugins/filter_keys.py | ||
| - ../../stackhpc.openhpc/filter_plugins/slurm_conf.py | ||
|
|
||
| - name: Add filter_plugins ansible.cfg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok as a workaround. Should we move to ansible-init's own cfg definition at some point?
| state: directory | ||
| owner: root | ||
| group: root | ||
| mode: 0755 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're putting secrets in here, is this OK?
| @@ -0,0 +1,150 @@ | |||
| --- | |||
|
|
|||
| - name: Ensure directories exist | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is running on group compute_init which via everything layout defaults to cluster.
So you're creating the directories etc. on EVERY node. Whereas we only want to do that on the control node.
| cluster | ||
|
|
||
| [compute_init:children] | ||
| # Hosts to deploy compute initialisation ansible-init script to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this description is right.
Deploying the actual compute init script is/will be done in the image build.
This should control which hosts get info templated out (and eventually, metadata set to turn on the feature, I think).
|
|
||
| [compute_init:children] | ||
| # Hosts to deploy compute initialisation ansible-init script to. | ||
| cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is right, it needs to either be builder or compute. I need to discuss.
| nfs_enable: | ||
| server: "{{ inventory_hostname in groups['control'] }}" | ||
| clients: false | ||
| nfs_export: "/exports/cluster" # control node has to copy in /etc/hosts to here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| nfs_export: "/exports/cluster" # control node has to copy in /etc/hosts to here | |
| nfs_export: "/exports/cluster" |
| [ansible_init] | ||
| # Hosts to run linux-anisble-init | ||
|
|
||
| [compute_init] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments on everything.
No description provided.