Skip to content

Conversation

@auspham
Copy link
Contributor

@auspham auspham commented Jan 8, 2026

Description of PR

Summary: Health check sometimes load wrong inventory admin/password
Fixes # (issue) 36307349

From investigating I can see that this issue sometimes happen, sometimes doesn't happen. Diving deeper, I can see that this is heavily dependent on how Ansible process and use memory internally.

This would only happen if there are 2 fanout hosts. One is using sonic and one is using non-sonic

In a happy scenarios, comparing the fanouthost.vm.extra_vars of 2 fanouts, we can see that they have different memory address


memory id 140619746693120 host XXXX <----- DIFFERENT ID HERE

2026-01-08 11:31:14,402 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': True, 'module_stdout': '', 'module_stderr': '/bin/sh: /usr/bin/python3: No such file or directory\n', 'msg': 'The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error', 'rc': 127, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3'}, '_ansible_no_log': False, 'changed': False}

memory id 140619740737472 host YYYY  <----- DIFFERENT ID HERE

2026-01-08 11:31:15,404 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}

In some scenarios, however, if ansible decided to re-use the memory address when initialising its VariableManager, we have the issue happen

memory id 139728659566400 host XXXX <---- SAME ID HERE

2026-01-08 11:31:43,750 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}

memory id 139728659566400 host YYYY <---- SAME ID HERE

2026-01-08 11:31:44,384 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': False, 'failed': True, 'unreachable': True, 'msg': "Invalid/incorrect password: Warning: Permanently added '10.150.22.30' (ED25519) to the list of known hosts.\r\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\n\nUnauthorized access and/or use prohibited. All access and/or use subject to monitoring.\n\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\nPermission denied, please try again.", 'changed': False}

Since we're overwriting the ansible_ssh_user and ansible_ssh_password in the extra_vars

fanouthost.vm.extra_vars.update({"ansible_ssh_user": fanout_sonic_user, "ansible_ssh_password": fanout_sonic_password})

If in the scenario that the two memory addresses are the same, it will overwrite the ansible_ssh_user, and ansible_ssh_password as well. And everything in extra_vars takes top priority over inventory defined variables.

Therefore it leads to using wrong username and password.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Signed-off-by: Austin Pham <austinpham@microsoft.com>
@auspham auspham requested a review from wangxin as a code owner January 8, 2026 11:39
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@wangxin wangxin merged commit 378e3bb into sonic-net:master Jan 9, 2026
21 checks passed
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Jan 13, 2026
Summary: Health check sometimes load wrong inventory admin/password
Fixes # (issue) 36307349

From investigating I can see that this issue sometimes happen, sometimes doesn't happen. Diving deeper, I can see that this is heavily dependent on how Ansible process and use memory internally.

This would only happen if there are 2 fanout hosts. One is using sonic and one is using non-sonic

In a happy scenarios, comparing the fanouthost.vm.extra_vars of 2 fanouts, we can see that they have different memory address


memory id 140619746693120 host XXXX <----- DIFFERENT ID HERE

2026-01-08 11:31:14,402 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': True, 'module_stdout': '', 'module_stderr': '/bin/sh: /usr/bin/python3: No such file or directory\n', 'msg': 'The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error', 'rc': 127, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3'}, '_ansible_no_log': False, 'changed': False}

memory id 140619740737472 host YYYY  <----- DIFFERENT ID HERE

2026-01-08 11:31:15,404 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
In some scenarios, however, if ansible decided to re-use the memory address when initialising its VariableManager, we have the issue happen

memory id 139728659566400 host XXXX <---- SAME ID HERE

2026-01-08 11:31:43,750 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}

memory id 139728659566400 host YYYY <---- SAME ID HERE

2026-01-08 11:31:44,384 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': False, 'failed': True, 'unreachable': True, 'msg': "Invalid/incorrect password: Warning: Permanently added '10.150.22.30' (ED25519) to the list of known hosts.\r\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\n\nUnauthorized access and/or use prohibited. All access and/or use subject to monitoring.\n\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\nPermission denied, please try again.", 'changed': False}

Since we're overwriting the ansible_ssh_user and ansible_ssh_password in the extra_vars

fanouthost.vm.extra_vars.update({"ansible_ssh_user": fanout_sonic_user, "ansible_ssh_password": fanout_sonic_password})
If in the scenario that the two memory addresses are the same, it will overwrite the ansible_ssh_user, and ansible_ssh_password as well. And everything in extra_vars takes top priority over inventory defined variables.

Therefore it leads to using wrong username and password.

Signed-off-by: Austin Pham <austinpham@microsoft.com>
yifan-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Jan 14, 2026
Summary: Health check sometimes load wrong inventory admin/password
Fixes # (issue) 36307349

From investigating I can see that this issue sometimes happen, sometimes doesn't happen. Diving deeper, I can see that this is heavily dependent on how Ansible process and use memory internally.

This would only happen if there are 2 fanout hosts. One is using sonic and one is using non-sonic

In a happy scenarios, comparing the fanouthost.vm.extra_vars of 2 fanouts, we can see that they have different memory address

memory id 140619746693120 host XXXX <----- DIFFERENT ID HERE

2026-01-08 11:31:14,402 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': True, 'module_stdout': '', 'module_stderr': '/bin/sh: /usr/bin/python3: No such file or directory\n', 'msg': 'The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error', 'rc': 127, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3'}, '_ansible_no_log': False, 'changed': False}

memory id 140619740737472 host YYYY  <----- DIFFERENT ID HERE

2026-01-08 11:31:15,404 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
In some scenarios, however, if ansible decided to re-use the memory address when initialising its VariableManager, we have the issue happen

memory id 139728659566400 host XXXX <---- SAME ID HERE

2026-01-08 11:31:43,750 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}

memory id 139728659566400 host YYYY <---- SAME ID HERE

2026-01-08 11:31:44,384 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': False, 'failed': True, 'unreachable': True, 'msg': "Invalid/incorrect password: Warning: Permanently added '10.150.22.30' (ED25519) to the list of known hosts.\r\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\n\nUnauthorized access and/or use prohibited. All access and/or use subject to monitoring.\n\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\nPermission denied, please try again.", 'changed': False}

Since we're overwriting the ansible_ssh_user and ansible_ssh_password in the extra_vars

fanouthost.vm.extra_vars.update({"ansible_ssh_user": fanout_sonic_user, "ansible_ssh_password": fanout_sonic_password})
If in the scenario that the two memory addresses are the same, it will overwrite the ansible_ssh_user, and ansible_ssh_password as well. And everything in extra_vars takes top priority over inventory defined variables.

Therefore it leads to using wrong username and password.

Signed-off-by: Austin Pham <austinpham@microsoft.com>
Signed-off-by: YiFan Wang <yifan@nexthop.ai>
PriyanshTratiya pushed a commit to PriyanshTratiya/sonic-mgmt that referenced this pull request Jan 21, 2026
Summary: Health check sometimes load wrong inventory admin/password
Fixes # (issue) 36307349

From investigating I can see that this issue sometimes happen, sometimes doesn't happen. Diving deeper, I can see that this is heavily dependent on how Ansible process and use memory internally.

This would only happen if there are 2 fanout hosts. One is using sonic and one is using non-sonic

In a happy scenarios, comparing the fanouthost.vm.extra_vars of 2 fanouts, we can see that they have different memory address

memory id 140619746693120 host XXXX <----- DIFFERENT ID HERE

2026-01-08 11:31:14,402 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': True, 'module_stdout': '', 'module_stderr': '/bin/sh: /usr/bin/python3: No such file or directory\n', 'msg': 'The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error', 'rc': 127, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3'}, '_ansible_no_log': False, 'changed': False}

memory id 140619740737472 host YYYY  <----- DIFFERENT ID HERE

2026-01-08 11:31:15,404 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
In some scenarios, however, if ansible decided to re-use the memory address when initialising its VariableManager, we have the issue happen

memory id 139728659566400 host XXXX <---- SAME ID HERE

2026-01-08 11:31:43,750 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}

memory id 139728659566400 host YYYY <---- SAME ID HERE

2026-01-08 11:31:44,384 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': False, 'failed': True, 'unreachable': True, 'msg': "Invalid/incorrect password: Warning: Permanently added '10.150.22.30' (ED25519) to the list of known hosts.\r\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\n\nUnauthorized access and/or use prohibited. All access and/or use subject to monitoring.\n\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\nPermission denied, please try again.", 'changed': False}

Since we're overwriting the ansible_ssh_user and ansible_ssh_password in the extra_vars

fanouthost.vm.extra_vars.update({"ansible_ssh_user": fanout_sonic_user, "ansible_ssh_password": fanout_sonic_password})
If in the scenario that the two memory addresses are the same, it will overwrite the ansible_ssh_user, and ansible_ssh_password as well. And everything in extra_vars takes top priority over inventory defined variables.

Therefore it leads to using wrong username and password.

Signed-off-by: Austin Pham <austinpham@microsoft.com>
Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
AndoniSanguesa pushed a commit to AndoniSanguesa/sonic-mgmt that referenced this pull request Jan 21, 2026
Summary: Health check sometimes load wrong inventory admin/password
Fixes # (issue) 36307349

From investigating I can see that this issue sometimes happen, sometimes doesn't happen. Diving deeper, I can see that this is heavily dependent on how Ansible process and use memory internally.

This would only happen if there are 2 fanout hosts. One is using sonic and one is using non-sonic

In a happy scenarios, comparing the fanouthost.vm.extra_vars of 2 fanouts, we can see that they have different memory address

memory id 140619746693120 host XXXX <----- DIFFERENT ID HERE

2026-01-08 11:31:14,402 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': True, 'module_stdout': '', 'module_stderr': '/bin/sh: /usr/bin/python3: No such file or directory\n', 'msg': 'The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error', 'rc': 127, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3'}, '_ansible_no_log': False, 'changed': False}

memory id 140619740737472 host YYYY  <----- DIFFERENT ID HERE

2026-01-08 11:31:15,404 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
In some scenarios, however, if ansible decided to re-use the memory address when initialising its VariableManager, we have the issue happen

memory id 139728659566400 host XXXX <---- SAME ID HERE

2026-01-08 11:31:43,750 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}

memory id 139728659566400 host YYYY <---- SAME ID HERE

2026-01-08 11:31:44,384 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': False, 'failed': True, 'unreachable': True, 'msg': "Invalid/incorrect password: Warning: Permanently added '10.150.22.30' (ED25519) to the list of known hosts.\r\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\n\nUnauthorized access and/or use prohibited. All access and/or use subject to monitoring.\n\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\nPermission denied, please try again.", 'changed': False}

Since we're overwriting the ansible_ssh_user and ansible_ssh_password in the extra_vars

fanouthost.vm.extra_vars.update({"ansible_ssh_user": fanout_sonic_user, "ansible_ssh_password": fanout_sonic_password})
If in the scenario that the two memory addresses are the same, it will overwrite the ansible_ssh_user, and ansible_ssh_password as well. And everything in extra_vars takes top priority over inventory defined variables.

Therefore it leads to using wrong username and password.

Signed-off-by: Austin Pham <austinpham@microsoft.com>
Signed-off-by: Andoni Sanguesa <andoniesanguesa@gmail.com>
AndoniSanguesa pushed a commit to AndoniSanguesa/sonic-mgmt that referenced this pull request Jan 21, 2026
Summary: Health check sometimes load wrong inventory admin/password
Fixes # (issue) 36307349

From investigating I can see that this issue sometimes happen, sometimes doesn't happen. Diving deeper, I can see that this is heavily dependent on how Ansible process and use memory internally.

This would only happen if there are 2 fanout hosts. One is using sonic and one is using non-sonic

In a happy scenarios, comparing the fanouthost.vm.extra_vars of 2 fanouts, we can see that they have different memory address

memory id 140619746693120 host XXXX <----- DIFFERENT ID HERE

2026-01-08 11:31:14,402 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': True, 'module_stdout': '', 'module_stderr': '/bin/sh: /usr/bin/python3: No such file or directory\n', 'msg': 'The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error', 'rc': 127, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3'}, '_ansible_no_log': False, 'changed': False}

memory id 140619740737472 host YYYY  <----- DIFFERENT ID HERE

2026-01-08 11:31:15,404 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}
In some scenarios, however, if ansible decided to re-use the memory address when initialising its VariableManager, we have the issue happen

memory id 139728659566400 host XXXX <---- SAME ID HERE

2026-01-08 11:31:43,750 testbed_health_check.py#185 INFO - {'hostname': 'XXXX', 'reachable': True, 'failed': False, 'ping': 'pong', 'invocation': {'module_args': {'data': 'pong'}}, 'ansible_facts': {'discovered_interpreter_python': '/usr/bin/python3.9'}, '_ansible_no_log': False, 'changed': False}

memory id 139728659566400 host YYYY <---- SAME ID HERE

2026-01-08 11:31:44,384 testbed_health_check.py#185 INFO - {'hostname': 'YYYY', 'reachable': False, 'failed': True, 'unreachable': True, 'msg': "Invalid/incorrect password: Warning: Permanently added '10.150.22.30' (ED25519) to the list of known hosts.\r\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\n\nUnauthorized access and/or use prohibited. All access and/or use subject to monitoring.\n\nNOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE\nPermission denied, please try again.", 'changed': False}

Since we're overwriting the ansible_ssh_user and ansible_ssh_password in the extra_vars

fanouthost.vm.extra_vars.update({"ansible_ssh_user": fanout_sonic_user, "ansible_ssh_password": fanout_sonic_password})
If in the scenario that the two memory addresses are the same, it will overwrite the ansible_ssh_user, and ansible_ssh_password as well. And everything in extra_vars takes top priority over inventory defined variables.

Therefore it leads to using wrong username and password.

Signed-off-by: Austin Pham <austinpham@microsoft.com>
Signed-off-by: Andoni Sanguesa <andoniesanguesa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants