-
Notifications
You must be signed in to change notification settings - Fork 35
Ansible-init compute node script #476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2fb64d5
to
998ebf1
Compare
Tested by running (from deploy host) Then in the compute node: Check status of nodes: Run tests (from deploy host): |
1e08903
to
a32e309
Compare
fd4ee65
to
61392ed
Compare
b3514e6
to
134515d
Compare
3f97454
to
e3ce492
Compare
group: root | ||
mode: 0644 | ||
loop: | ||
- ../../basic_users/library/terminate_user_sessions.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this; there's no way there can be sessions for local users we want to remove running on boot, which is the only case we need this
loop: | ||
- ../../basic_users/library/terminate_user_sessions.py | ||
- ../../stackhpc.os-manila-mount/library/os_manila_share.py | ||
- ../../stackhpc.openhpc/library/sacct_cluster.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, not used at all (as shown by grep). Some sleuthing found the taskfile which used it was removed in stackhpc.openhpc v0.22 as no longer required, so we should delete it from that role!
mode: 0644 | ||
loop: | ||
- ../../basic_users/filter_plugins/filter_keys.py | ||
- ../../stackhpc.openhpc/filter_plugins/slurm_conf.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So some grepping shows this provides:
- hostlist_expression: only used for control node templating slurm.conf and gres.conf, not relevant here
- dict2parameters: only used for control node templating slurm.conf, not relevant here
- error: can't find where this is used
So remove?
- ../../basic_users/filter_plugins/filter_keys.py | ||
- ../../stackhpc.openhpc/filter_plugins/slurm_conf.py | ||
|
||
- name: Add filter_plugins ansible.cfg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok as a workaround. Should we move to ansible-init's own cfg definition at some point?
state: directory | ||
owner: root | ||
group: root | ||
mode: 0755 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're putting secrets in here, is this OK?
@@ -0,0 +1,150 @@ | |||
--- | |||
|
|||
- name: Ensure directories exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is running on group compute_init
which via everything
layout defaults to cluster
.
So you're creating the directories etc. on EVERY node. Whereas we only want to do that on the control node.
cluster | ||
|
||
[compute_init:children] | ||
# Hosts to deploy compute initialisation ansible-init script to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this description is right.
Deploying the actual compute init script is/will be done in the image build.
This should control which hosts get info templated out (and eventually, metadata set to turn on the feature, I think).
|
||
[compute_init:children] | ||
# Hosts to deploy compute initialisation ansible-init script to. | ||
cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is right, it needs to either be builder
or compute
. I need to discuss.
nfs_enable: | ||
server: "{{ inventory_hostname in groups['control'] }}" | ||
clients: false | ||
nfs_export: "/exports/cluster" # control node has to copy in /etc/hosts to here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nfs_export: "/exports/cluster" # control node has to copy in /etc/hosts to here | |
nfs_export: "/exports/cluster" |
[ansible_init] | ||
# Hosts to run linux-anisble-init | ||
|
||
[compute_init] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments on everything.
No description provided.