Skip to content

Conversation

@weizhouapache
Copy link
Member

Description

Currently when kvm host does not have NFS, it is determined as Disconnected during agent/vm investigation.
The other investigators are not performed.

This PR fixes the issue so that the other investigators will be performed.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Below is an example of the investigation process with this PR

(on the kvm host, I added a firewall rule to drop the packets to port 8250 of management server)
image

How did you try to break this feature and the system with this change?

Copy link
Contributor

@sureshanaparti sureshanaparti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@sureshanaparti
Copy link
Contributor

@blueorangutan package

@codecov
Copy link

codecov bot commented Mar 6, 2025

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 15.16%. Comparing base (b41acf2) to head (f9fd642).
Report is 2 commits behind head on 4.19.

Files with missing lines Patch % Lines
...vm/src/main/java/com/cloud/ha/KVMInvestigator.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.19   #10515      +/-   ##
============================================
- Coverage     15.17%   15.16%   -0.01%     
+ Complexity    11332    11328       -4     
============================================
  Files          5414     5414              
  Lines        474802   474802              
  Branches      57909    57909              
============================================
- Hits          72028    72008      -20     
- Misses       394718   394742      +24     
+ Partials       8056     8052       -4     
Flag Coverage Δ
uitests 4.28% <ø> (ø)
unittests 15.89% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@weizhouapache weizhouapache added this to the 4.19.3 milestone Mar 6, 2025
@weizhouapache weizhouapache marked this pull request as ready for review March 6, 2025 14:36
@weizhouapache
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12686

@rohityadavcloud rohityadavcloud modified the milestones: 4.19.3, 4.20.1 Mar 7, 2025
@rohityadavcloud
Copy link
Member

@blueorangutan test

@blueorangutan
Copy link

@rohityadavcloud a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-12603)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 48986 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10515-t12603-kvm-ol8.zip
Smoke tests completed. 133 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

Copy link
Contributor

@kiranchavala kiranchavala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Verified the issue manually by executing the following steps

  1. Create a cloudstack env with 2 hosts and no nfs primary storages.
  2. On one of the kvm host configure ha and enable HA.
  3. Add a firewall rule which drops the packets on port 8250

iptables -I OUTPUT -p tcp -m tcp --dport 8250 -j DROP

  1. Check the management server logs

Before fix,

Cloudstack doesn't pick up the HypervInvestigator VMwareInvestigator, ping investigator.

2025-03-06 13:36:30,022 INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Investigating why host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} has disconnected with event PingTimeout
2025-03-06 13:36:30,023 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) checking if agent (Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"}) is alive
2025-03-06 13:36:30,025 DEBUG [c.c.a.t.Request] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Sending  { Cmd , MgmtId: 32986892337576, via: 1(ref-trl-8094-k-mol8-kiran-chavala-kvm1), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:37:10,041 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Waiting some more time because this is the current command
2025-03-06 13:37:10,041 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Waiting some more time because this is the current command
2025-03-06 13:37:10,042 WARN  [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Timed out on Seq 1-8864491441548689460:  { Cmd , MgmtId: 32986892337576, via: 1(ref-trl-8094-k-mol8-kiran-chavala-kvm1), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:37:10,047 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Cancelling.
2025-03-06 13:37:10,047 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Operation timed out: Commands 8864491441548689460 to Host 1 timed out after 100
2025-03-06 13:37:10,067 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) SimpleInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:37:10,067 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) XenServerInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:37:10,083 WARN  [c.c.h.KVMInvestigator] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Agent investigation was requested on host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"}, but host does not support investigation because it has no NFS storage. Skipping investigation.
2025-03-06 13:37:10,083 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) KVMInvestigator was able to determine host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} is in Disconnected
2025-03-06 13:37:10,083 INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) The agent from host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} state determined is Disconnected
2025-03-06 13:37:10,083 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Agent is disconnected but the host is still up: Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} state: Enabled

After fix

Cloudstack picks up the HypervInvestigator VMwareInvestigator, ping investigator.

 [root@ol8 ~]# cat /var/log/cloudstack/management/management-server.log |grep -i "logid:b39c7f05"
2025-03-06 13:08:59,485 INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Investigating why host Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"} has disconnected with event PingTimeout
2025-03-06 13:08:59,485 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) checking if agent (Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"}) is alive
2025-03-06 13:08:59,487 DEBUG [c.c.a.t.Request] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Sending  { Cmd , MgmtId: 32987949302884, via: 2(ref-trl-8087-k-mol8-kiran-chavala-kvm2), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:09:49,487 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Waiting some more time because this is the current command
2025-03-06 13:10:39,487 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Waiting some more time because this is the current command
2025-03-06 13:10:39,488 WARN  [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Timed out on Seq 2-5748563449361727501:  { Cmd , MgmtId: 32987949302884, via: 2(ref-trl-8087-k-mol8-kiran-chavala-kvm2), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:10:39,488 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Cancelling.
2025-03-06 13:10:39,489 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Operation timed out: Commands 5748563449361727501 to Host 2 timed out after 100
2025-03-06 13:10:39,491 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) SimpleInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,491 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) XenServerInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,494 WARN  [c.c.h.KVMInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Agent investigation was requested on host Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"}, but host does not support investigation because it has no NFS storage. Skipping investigation.
2025-03-06 13:10:39,494 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) KVMInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,494 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) HypervInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,494 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) VMwareInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,495 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) checking if agent (Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"}) is alive
2025-03-06 13:10:39,496 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) sending ping from (Host {"id":1,"name":"ol8.localdomain","type":"Routing","uuid":"c0fd498b-e0ff-433c-a68d-698a982a5f6f"}) to agent's host ip address (10.0.35.136)
2025-03-06 13:10:39,497 DEBUG [c.c.a.t.Request] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 1-728457239727181052: Sending  { Cmd , MgmtId: 32987949302884, via: 1(ol8.localdomain), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.PingTestCommand":{"_computingHostIp":"10.0.35.136","wait":"20","bypassHostMaintenance":"false"}}] }
2025-03-06 13:10:39,511 DEBUG [c.c.a.t.Request] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 1-728457239727181052: Received:  { Ans: , MgmtId: 32987949302884, via: 1(ol8.localdomain), Ver: v1, Flags: 10, { Answer } }
2025-03-06 13:10:39,512 DEBUG [c.c.h.AbstractInvestigatorImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) host (10.0.35.136) has been successfully pinged, returning that host is up
2025-03-06 13:10:39,512 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) ping from (Host {"id":1,"name":"ol8.localdomain","type":"Routing","uuid":"c0fd498b-e0ff-433c-a68d-698a982a5f6f"}) to agent's host ip address (10.0.35.136) successful, returning that agent is disconnected
2025-03-06 13:10:39,512 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) PingInvestigator was able to determine host Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"} is in Disconnected

@weizhouapache
Copy link
Member Author

LGTM, Verified the issue manually by executing the following steps

  1. Create a cloudstack env with 2 hosts and no nfs primary storages.
  2. On one of the kvm host configure ha and enable HA.
  3. Add a firewall rule which drops the packets on port 8250

iptables -I OUTPUT -p tcp -m tcp --dport 8250 -j DROP

  1. Check the management server logs

Before fix,

Cloudstack doesn't pick up the HypervInvestigator VMwareInvestigator, ping investigator.

2025-03-06 13:36:30,022 INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Investigating why host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} has disconnected with event PingTimeout
2025-03-06 13:36:30,023 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) checking if agent (Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"}) is alive
2025-03-06 13:36:30,025 DEBUG [c.c.a.t.Request] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Sending  { Cmd , MgmtId: 32986892337576, via: 1(ref-trl-8094-k-mol8-kiran-chavala-kvm1), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:37:10,041 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Waiting some more time because this is the current command
2025-03-06 13:37:10,041 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Waiting some more time because this is the current command
2025-03-06 13:37:10,042 WARN  [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Timed out on Seq 1-8864491441548689460:  { Cmd , MgmtId: 32986892337576, via: 1(ref-trl-8094-k-mol8-kiran-chavala-kvm1), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:37:10,047 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Seq 1-8864491441548689460: Cancelling.
2025-03-06 13:37:10,047 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Operation timed out: Commands 8864491441548689460 to Host 1 timed out after 100
2025-03-06 13:37:10,067 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) SimpleInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:37:10,067 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) XenServerInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:37:10,083 WARN  [c.c.h.KVMInvestigator] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Agent investigation was requested on host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"}, but host does not support investigation because it has no NFS storage. Skipping investigation.
2025-03-06 13:37:10,083 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) KVMInvestigator was able to determine host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} is in Disconnected
2025-03-06 13:37:10,083 INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) The agent from host Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} state determined is Disconnected
2025-03-06 13:37:10,083 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-10:ctx-c383007c) (logid:363305d1) Agent is disconnected but the host is still up: Host {"id":1,"name":"ref-trl-8094-k-mol8-kiran-chavala-kvm1","type":"Routing","uuid":"40f96f30-2b3d-47bd-86ab-cea4c4a5dd4f"} state: Enabled

After fix

Cloudstack picks up the HypervInvestigator VMwareInvestigator, ping investigator.

 [root@ol8 ~]# cat /var/log/cloudstack/management/management-server.log |grep -i "logid:b39c7f05"
2025-03-06 13:08:59,485 INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Investigating why host Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"} has disconnected with event PingTimeout
2025-03-06 13:08:59,485 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) checking if agent (Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"}) is alive
2025-03-06 13:08:59,487 DEBUG [c.c.a.t.Request] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Sending  { Cmd , MgmtId: 32987949302884, via: 2(ref-trl-8087-k-mol8-kiran-chavala-kvm2), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:09:49,487 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Waiting some more time because this is the current command
2025-03-06 13:10:39,487 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Waiting some more time because this is the current command
2025-03-06 13:10:39,488 WARN  [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Timed out on Seq 2-5748563449361727501:  { Cmd , MgmtId: 32987949302884, via: 2(ref-trl-8087-k-mol8-kiran-chavala-kvm2), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CheckHealthCommand":{"wait":"50","bypassHostMaintenance":"false"}}] }
2025-03-06 13:10:39,488 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 2-5748563449361727501: Cancelling.
2025-03-06 13:10:39,489 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Operation timed out: Commands 5748563449361727501 to Host 2 timed out after 100
2025-03-06 13:10:39,491 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) SimpleInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,491 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) XenServerInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,494 WARN  [c.c.h.KVMInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Agent investigation was requested on host Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"}, but host does not support investigation because it has no NFS storage. Skipping investigation.
2025-03-06 13:10:39,494 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) KVMInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,494 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) HypervInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,494 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) VMwareInvestigator unable to determine the state of the host.  Moving on.
2025-03-06 13:10:39,495 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) checking if agent (Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"}) is alive
2025-03-06 13:10:39,496 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) sending ping from (Host {"id":1,"name":"ol8.localdomain","type":"Routing","uuid":"c0fd498b-e0ff-433c-a68d-698a982a5f6f"}) to agent's host ip address (10.0.35.136)
2025-03-06 13:10:39,497 DEBUG [c.c.a.t.Request] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 1-728457239727181052: Sending  { Cmd , MgmtId: 32987949302884, via: 1(ol8.localdomain), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.PingTestCommand":{"_computingHostIp":"10.0.35.136","wait":"20","bypassHostMaintenance":"false"}}] }
2025-03-06 13:10:39,511 DEBUG [c.c.a.t.Request] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) Seq 1-728457239727181052: Received:  { Ans: , MgmtId: 32987949302884, via: 1(ol8.localdomain), Ver: v1, Flags: 10, { Answer } }
2025-03-06 13:10:39,512 DEBUG [c.c.h.AbstractInvestigatorImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) host (10.0.35.136) has been successfully pinged, returning that host is up
2025-03-06 13:10:39,512 DEBUG [c.c.h.UserVmDomRInvestigator] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) ping from (Host {"id":1,"name":"ol8.localdomain","type":"Routing","uuid":"c0fd498b-e0ff-433c-a68d-698a982a5f6f"}) to agent's host ip address (10.0.35.136) successful, returning that agent is disconnected
2025-03-06 13:10:39,512 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-8:ctx-9220e781) (logid:b39c7f05) PingInvestigator was able to determine host Host {"id":2,"name":"ref-trl-8087-k-mol8-kiran-chavala-kvm2","type":"Routing","uuid":"ec2fdf6c-809d-42b9-96e0-1ff6abde5f89"} is in Disconnected

great, thanks @kiranchavala for testing !

@DaanHoogland DaanHoogland merged commit cd6d1a2 into apache:4.19 Mar 10, 2025
24 of 25 checks passed
@DaanHoogland DaanHoogland deleted the 4.19-fix-kvm-investigator branch March 10, 2025 08:06
@Pearl1594 Pearl1594 moved this to Done in ACS 4.20.1 Mar 17, 2025
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Jun 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants