Skip to content

Commit 378e3bb

Browse files
authored
fix: health check incorrectly load inventory sometimes (#21864)
Summary: Health check sometimes load wrong inventory admin/password Fixes # (issue) 36307349 From investigating I can see that this issue sometimes happen, sometimes doesn't happen. Diving deeper, I can see that this is heavily dependent on how Ansible process and use memory internally. This would only happen if there are 2 fanout hosts. One is using sonic and one is using non-sonic In a happy scenarios, comparing the fanouthost.vm.extra_vars of 2 fanouts, we can see that they have different memory address memory id 140619746693120 host XXXX <----- DIFFERENT ID HERE 2026-01-08 11:31:14,402 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': True, 'module_stdout': '', 'module_stderr': '/bin/sh: /usr/bin/python3: No such file or directory\n', 'msg': 'The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error', 'rc': 127, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3'}, '_ansible_no_log': False, 'changed': False} memory id 140619740737472 host YYYY <----- DIFFERENT ID HERE 2026-01-08 11:31:15,404 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False} In some scenarios, however, if ansible decided to re-use the memory address when initialising its VariableManager, we have the issue happen memory id 139728659566400 host XXXX <---- SAME ID HERE 2026-01-08 11:31:43,750 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False} memory id 139728659566400 host YYYY <---- SAME ID HERE 2026-01-08 11:31:44,384 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': False, 'failed': True, 'unreachable': True, 'msg': "Invalid/incorrect password: Warning: Permanently added '10.150.22.30' (ED25519) to the list of known hosts.\r\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\n\nUnauthorized access and/or use prohibited. All access and/or use subject to monitoring.\n\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\nPermission denied, please try again.", 'changed': False} Since we're overwriting the ansible_ssh_user and ansible_ssh_password in the extra_vars fanouthost.vm.extra_vars.update({"ansible_ssh_user": fanout_sonic_user, "ansible_ssh_password": fanout_sonic_password}) If in the scenario that the two memory addresses are the same, it will overwrite the ansible_ssh_user, and ansible_ssh_password as well. And everything in extra_vars takes top priority over inventory defined variables. Therefore it leads to using wrong username and password. Signed-off-by: Austin Pham <austinpham@microsoft.com>
1 parent 20d428d commit 378e3bb

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

.azure-pipelines/testbed_health_check.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,9 @@ def pre_check(self):
170170
"fanout_sonic_password")
171171
fanouthost.vm.extra_vars.update(
172172
{"ansible_ssh_user": fanout_sonic_user, "ansible_ssh_password": fanout_sonic_password})
173+
else:
174+
fanouthost.vm.extra_vars.pop("ansible_ssh_user", None)
175+
fanouthost.vm.extra_vars.pop("ansible_ssh_password", None)
173176

174177
is_reachable, result = fanouthost.reachable()
175178

0 commit comments

Comments
 (0)