Skip to content

Commit a6ef418

Browse files
dann1tinova
authored andcommitted
M #-: Document SSH timeouts on VM HA (#3138)
* M #: Document SSH timeouts on VM HA * M #: spellcheck (cherry picked from commit 23af6c0)
1 parent daa086f commit a6ef418

File tree

1 file changed

+24
-0
lines changed
  • source/installation_and_configuration/ha

1 file changed

+24
-0
lines changed

source/installation_and_configuration/ha/vm_ha.rst

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,30 @@ More information on hooks :ref:`here <hooks>`.
5656

5757
.. warning:: Note that spurious network errors may lead to a VM being started twice on different hosts and possibly clashing on shared resources. The previous script needs to fence the error host to prevent split brain VMs. You may use any fencing mechanism for the host and invoke it within the error hook.
5858

59+
Tuning HA responsiveness
60+
================================================================================
61+
62+
This HA mechanism is based on the host state monitoring. How long the host the host takes to be reported in ``ERROR`` is crucial for how quickly you want the VMs to be available.
63+
64+
There are multiple timers that you can adjust on ``/etc/one/monitord.conf`` to adjust this. ``BEACON_HOST`` dictates how often the host is checked to make sure it is responding. If it doesn't respond past ``MONITORING_INTERVAL_HOST`` then the frontend will attempt to restart the monitoring on the host.
65+
66+
This process tries to connect to the host via SSH, synchronize the probes and start their execution. It might be possible that this SSH connection hangs if the host is not responsive. This can lead to a situation where the VM workloads running on said host will be unavailable and the HA will not be present during this process. You can adjust how much are you comfortable with waiting for this ssh to fail by setting the parameter ``ConnectTimeout`` on the oneadmin ssh configuration at ``/var/lib/one/.ssh/config``.
67+
68+
The following is a an example configuration
69+
70+
.. code-block:: language
71+
72+
Host *
73+
ServerAliveInterval 10
74+
ControlMaster no
75+
ControlPersist 70s
76+
ControlPath /run/one/ssh-socks/ctl-M-%C.sock
77+
StrictHostKeyChecking no
78+
UserKnownHostsFile /dev/null
79+
ConnectTimeout 15
80+
81+
.. warning:: Consider that a temporary network/host problem or a small hiccup combined with short timers can lead to an overkill situation where the HA hook gets triggered too fast when waiting a few more seconds could have been fine. This is a trade-off you'll have to be aware of when implementing HA.
82+
5983
Enabling Fencing
6084
================================================================================
6185

0 commit comments

Comments
 (0)