-
Notifications
You must be signed in to change notification settings - Fork 936
fix: health check incorrectly load inventory sometimes #21864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
wangxin
merged 1 commit into
sonic-net:master
from
auspham:austinpham/36307349-fix-health-check-sometimes-not-loading-right-inventory
Jan 9, 2026
Merged
fix: health check incorrectly load inventory sometimes #21864
wangxin
merged 1 commit into
sonic-net:master
from
auspham:austinpham/36307349-fix-health-check-sometimes-not-loading-right-inventory
Jan 9, 2026
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Austin Pham <austinpham@microsoft.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
lizhijianrd
approved these changes
Jan 8, 2026
wangxin
approved these changes
Jan 9, 2026
venu-nexthop
pushed a commit
to venu-nexthop/sonic-mgmt
that referenced
this pull request
Jan 13, 2026
Summary: Health check sometimes load wrong inventory admin/password
Fixes # (issue) 36307349
From investigating I can see that this issue sometimes happen, sometimes doesn't happen. Diving deeper, I can see that this is heavily dependent on how Ansible process and use memory internally.
This would only happen if there are 2 fanout hosts. One is using sonic and one is using non-sonic
In a happy scenarios, comparing the fanouthost.vm.extra_vars of 2 fanouts, we can see that they have different memory address
memory id 140619746693120 host XXXX <----- DIFFERENT ID HERE
2026-01-08 11:31:14,402 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': True, 'module_stdout': '', 'module_stderr': '/bin/sh: /usr/bin/python3: No such file or directory\n', 'msg': 'The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error', 'rc': 127, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3'}, '_ansible_no_log': False, 'changed': False}
memory id 140619740737472 host YYYY <----- DIFFERENT ID HERE
2026-01-08 11:31:15,404 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
In some scenarios, however, if ansible decided to re-use the memory address when initialising its VariableManager, we have the issue happen
memory id 139728659566400 host XXXX <---- SAME ID HERE
2026-01-08 11:31:43,750 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
memory id 139728659566400 host YYYY <---- SAME ID HERE
2026-01-08 11:31:44,384 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': False, 'failed': True, 'unreachable': True, 'msg': "Invalid/incorrect password: Warning: Permanently added '10.150.22.30' (ED25519) to the list of known hosts.\r\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\n\nUnauthorized access and/or use prohibited. All access and/or use subject to monitoring.\n\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\nPermission denied, please try again.", 'changed': False}
Since we're overwriting the ansible_ssh_user and ansible_ssh_password in the extra_vars
fanouthost.vm.extra_vars.update({"ansible_ssh_user": fanout_sonic_user, "ansible_ssh_password": fanout_sonic_password})
If in the scenario that the two memory addresses are the same, it will overwrite the ansible_ssh_user, and ansible_ssh_password as well. And everything in extra_vars takes top priority over inventory defined variables.
Therefore it leads to using wrong username and password.
Signed-off-by: Austin Pham <austinpham@microsoft.com>
yifan-nexthop
pushed a commit
to nexthop-ai/sonic-mgmt
that referenced
this pull request
Jan 14, 2026
Summary: Health check sometimes load wrong inventory admin/password
Fixes # (issue) 36307349
From investigating I can see that this issue sometimes happen, sometimes doesn't happen. Diving deeper, I can see that this is heavily dependent on how Ansible process and use memory internally.
This would only happen if there are 2 fanout hosts. One is using sonic and one is using non-sonic
In a happy scenarios, comparing the fanouthost.vm.extra_vars of 2 fanouts, we can see that they have different memory address
memory id 140619746693120 host XXXX <----- DIFFERENT ID HERE
2026-01-08 11:31:14,402 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': True, 'module_stdout': '', 'module_stderr': '/bin/sh: /usr/bin/python3: No such file or directory\n', 'msg': 'The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error', 'rc': 127, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3'}, '_ansible_no_log': False, 'changed': False}
memory id 140619740737472 host YYYY <----- DIFFERENT ID HERE
2026-01-08 11:31:15,404 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
In some scenarios, however, if ansible decided to re-use the memory address when initialising its VariableManager, we have the issue happen
memory id 139728659566400 host XXXX <---- SAME ID HERE
2026-01-08 11:31:43,750 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
memory id 139728659566400 host YYYY <---- SAME ID HERE
2026-01-08 11:31:44,384 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': False, 'failed': True, 'unreachable': True, 'msg': "Invalid/incorrect password: Warning: Permanently added '10.150.22.30' (ED25519) to the list of known hosts.\r\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\n\nUnauthorized access and/or use prohibited. All access and/or use subject to monitoring.\n\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\nPermission denied, please try again.", 'changed': False}
Since we're overwriting the ansible_ssh_user and ansible_ssh_password in the extra_vars
fanouthost.vm.extra_vars.update({"ansible_ssh_user": fanout_sonic_user, "ansible_ssh_password": fanout_sonic_password})
If in the scenario that the two memory addresses are the same, it will overwrite the ansible_ssh_user, and ansible_ssh_password as well. And everything in extra_vars takes top priority over inventory defined variables.
Therefore it leads to using wrong username and password.
Signed-off-by: Austin Pham <austinpham@microsoft.com>
Signed-off-by: YiFan Wang <yifan@nexthop.ai>
PriyanshTratiya
pushed a commit
to PriyanshTratiya/sonic-mgmt
that referenced
this pull request
Jan 21, 2026
Summary: Health check sometimes load wrong inventory admin/password
Fixes # (issue) 36307349
From investigating I can see that this issue sometimes happen, sometimes doesn't happen. Diving deeper, I can see that this is heavily dependent on how Ansible process and use memory internally.
This would only happen if there are 2 fanout hosts. One is using sonic and one is using non-sonic
In a happy scenarios, comparing the fanouthost.vm.extra_vars of 2 fanouts, we can see that they have different memory address
memory id 140619746693120 host XXXX <----- DIFFERENT ID HERE
2026-01-08 11:31:14,402 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': True, 'module_stdout': '', 'module_stderr': '/bin/sh: /usr/bin/python3: No such file or directory\n', 'msg': 'The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error', 'rc': 127, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3'}, '_ansible_no_log': False, 'changed': False}
memory id 140619740737472 host YYYY <----- DIFFERENT ID HERE
2026-01-08 11:31:15,404 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
In some scenarios, however, if ansible decided to re-use the memory address when initialising its VariableManager, we have the issue happen
memory id 139728659566400 host XXXX <---- SAME ID HERE
2026-01-08 11:31:43,750 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
memory id 139728659566400 host YYYY <---- SAME ID HERE
2026-01-08 11:31:44,384 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': False, 'failed': True, 'unreachable': True, 'msg': "Invalid/incorrect password: Warning: Permanently added '10.150.22.30' (ED25519) to the list of known hosts.\r\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\n\nUnauthorized access and/or use prohibited. All access and/or use subject to monitoring.\n\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\nPermission denied, please try again.", 'changed': False}
Since we're overwriting the ansible_ssh_user and ansible_ssh_password in the extra_vars
fanouthost.vm.extra_vars.update({"ansible_ssh_user": fanout_sonic_user, "ansible_ssh_password": fanout_sonic_password})
If in the scenario that the two memory addresses are the same, it will overwrite the ansible_ssh_user, and ansible_ssh_password as well. And everything in extra_vars takes top priority over inventory defined variables.
Therefore it leads to using wrong username and password.
Signed-off-by: Austin Pham <austinpham@microsoft.com>
Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
AndoniSanguesa
pushed a commit
to AndoniSanguesa/sonic-mgmt
that referenced
this pull request
Jan 21, 2026
Summary: Health check sometimes load wrong inventory admin/password
Fixes # (issue) 36307349
From investigating I can see that this issue sometimes happen, sometimes doesn't happen. Diving deeper, I can see that this is heavily dependent on how Ansible process and use memory internally.
This would only happen if there are 2 fanout hosts. One is using sonic and one is using non-sonic
In a happy scenarios, comparing the fanouthost.vm.extra_vars of 2 fanouts, we can see that they have different memory address
memory id 140619746693120 host XXXX <----- DIFFERENT ID HERE
2026-01-08 11:31:14,402 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': True, 'module_stdout': '', 'module_stderr': '/bin/sh: /usr/bin/python3: No such file or directory\n', 'msg': 'The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error', 'rc': 127, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3'}, '_ansible_no_log': False, 'changed': False}
memory id 140619740737472 host YYYY <----- DIFFERENT ID HERE
2026-01-08 11:31:15,404 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
In some scenarios, however, if ansible decided to re-use the memory address when initialising its VariableManager, we have the issue happen
memory id 139728659566400 host XXXX <---- SAME ID HERE
2026-01-08 11:31:43,750 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
memory id 139728659566400 host YYYY <---- SAME ID HERE
2026-01-08 11:31:44,384 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': False, 'failed': True, 'unreachable': True, 'msg': "Invalid/incorrect password: Warning: Permanently added '10.150.22.30' (ED25519) to the list of known hosts.\r\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\n\nUnauthorized access and/or use prohibited. All access and/or use subject to monitoring.\n\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\nPermission denied, please try again.", 'changed': False}
Since we're overwriting the ansible_ssh_user and ansible_ssh_password in the extra_vars
fanouthost.vm.extra_vars.update({"ansible_ssh_user": fanout_sonic_user, "ansible_ssh_password": fanout_sonic_password})
If in the scenario that the two memory addresses are the same, it will overwrite the ansible_ssh_user, and ansible_ssh_password as well. And everything in extra_vars takes top priority over inventory defined variables.
Therefore it leads to using wrong username and password.
Signed-off-by: Austin Pham <austinpham@microsoft.com>
Signed-off-by: Andoni Sanguesa <andoniesanguesa@gmail.com>
AndoniSanguesa
pushed a commit
to AndoniSanguesa/sonic-mgmt
that referenced
this pull request
Jan 21, 2026
Summary: Health check sometimes load wrong inventory admin/password
Fixes # (issue) 36307349
From investigating I can see that this issue sometimes happen, sometimes doesn't happen. Diving deeper, I can see that this is heavily dependent on how Ansible process and use memory internally.
This would only happen if there are 2 fanout hosts. One is using sonic and one is using non-sonic
In a happy scenarios, comparing the fanouthost.vm.extra_vars of 2 fanouts, we can see that they have different memory address
memory id 140619746693120 host XXXX <----- DIFFERENT ID HERE
2026-01-08 11:31:14,402 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': True, 'module_stdout': '', 'module_stderr': '/bin/sh: /usr/bin/python3: No such file or directory\n', 'msg': 'The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error', 'rc': 127, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3'}, '_ansible_no_log': False, 'changed': False}
memory id 140619740737472 host YYYY <----- DIFFERENT ID HERE
2026-01-08 11:31:15,404 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
In some scenarios, however, if ansible decided to re-use the memory address when initialising its VariableManager, we have the issue happen
memory id 139728659566400 host XXXX <---- SAME ID HERE
2026-01-08 11:31:43,750 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
memory id 139728659566400 host YYYY <---- SAME ID HERE
2026-01-08 11:31:44,384 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': False, 'failed': True, 'unreachable': True, 'msg': "Invalid/incorrect password: Warning: Permanently added '10.150.22.30' (ED25519) to the list of known hosts.\r\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\n\nUnauthorized access and/or use prohibited. All access and/or use subject to monitoring.\n\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\nPermission denied, please try again.", 'changed': False}
Since we're overwriting the ansible_ssh_user and ansible_ssh_password in the extra_vars
fanouthost.vm.extra_vars.update({"ansible_ssh_user": fanout_sonic_user, "ansible_ssh_password": fanout_sonic_password})
If in the scenario that the two memory addresses are the same, it will overwrite the ansible_ssh_user, and ansible_ssh_password as well. And everything in extra_vars takes top priority over inventory defined variables.
Therefore it leads to using wrong username and password.
Signed-off-by: Austin Pham <austinpham@microsoft.com>
Signed-off-by: Andoni Sanguesa <andoniesanguesa@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of PR
Summary: Health check sometimes load wrong inventory admin/password
Fixes # (issue) 36307349
From investigating I can see that this issue sometimes happen, sometimes doesn't happen. Diving deeper, I can see that this is heavily dependent on how Ansible process and use memory internally.
This would only happen if there are 2 fanout hosts. One is using sonic and one is using non-sonic
In a happy scenarios, comparing the
fanouthost.vm.extra_varsof 2 fanouts, we can see that they have different memory addressIn some scenarios, however, if ansible decided to re-use the memory address when initialising its VariableManager, we have the issue happen
Since we're overwriting the ansible_ssh_user and ansible_ssh_password in the extra_vars
If in the scenario that the two memory addresses are the same, it will overwrite the
ansible_ssh_user, andansible_ssh_passwordas well. And everything inextra_varstakes top priority over inventory defined variables.Therefore it leads to using wrong username and password.
Type of change
Back port request
Approach
What is the motivation for this PR?
How did you do it?
How did you verify/test it?
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation