Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion ansible/bootstrap.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,16 @@
stat:
path: /etc/systemd/system/ansible-init.service
register: _stat_ansible_init_unitfile

- name: Check ansible-init status
command: systemctl is-failed ansible-init
register: _ansible_init_failed
failed_when: false # rc != 0 for non-failure!
changed_when: false
- name: Check ansible-init hasn't failed (yet)
# NB: only allows early exit if it has, does not catch future failures!
assert:
that: "'failed' not in _ansible_init_failed.stdout"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[rocky@cclr-dev-svn3-dr08-u14 ~]$ sudo systemctl status ansible-init
● ansible-init.service
   Loaded: loaded (/etc/systemd/system/ansible-init.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Fri 2025-03-07 17:37:16 UTC; 1s ago
  Process: 24805 ExecStart=/usr/bin/ansible-init (code=exited, status=1/FAILURE)
 Main PID: 24805 (code=exited, status=1/FAILURE)
[rocky@cclr-dev-svn3-dr08-u14 ~]$ systemctl is-failed ansible-init
activating

Mine looks like that. Does it eventually give up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What situation did this occur in? If its during dev with a broken compute-init I think I'm not too fussed and we should just close this PR - Bertie is going to add some notes to the docs on how to recover this situation. If it occurs in production its broken anyway, and ansible can't recover it really, IMO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this was a the issues adding new nodes into the cluster. I guess it should be resolved by Bertie's work on making it exit early if the hostvars do not exist.

For other issues though... It is a large timeout, so would be nice if it failed slightly quicker IMO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe we should knock it down considerably, that's not this PR though. I'm going to close this and we can address during review of the compute-init update.

fail_msg: "ansible-init has failed - check journalctl -xeu ansible-init"
- name: Wait for ansible-init to finish
wait_for:
path: /var/lib/ansible-init.done
Expand Down