You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: source/installation_and_configuration/ha/vm_ha.rst
+24Lines changed: 24 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -56,6 +56,30 @@ More information on hooks :ref:`here <hooks>`.
56
56
57
57
.. warning:: Note that spurious network errors may lead to a VM being started twice on different hosts and possibly clashing on shared resources. The previous script needs to fence the error host to prevent split brain VMs. You may use any fencing mechanism for the host and invoke it within the error hook.
This HA mechanism is based on the host state monitoring. How long the host the host takes to be reported in ``ERROR`` is crucial for how quickly you want the VMs to be available.
63
+
64
+
There are multiple timers that you can adjust on ``/etc/one/monitord.conf`` to adjust this. ``BEACON_HOST`` dictates how often the host is checked to make sure it is responding. If it doesn't respond past ``MONITORING_INTERVAL_HOST`` then the frontend will attempt to restart the monitoring on the host.
65
+
66
+
This process tries to connect to the host via SSH, synchronize the probes and start their execution. It might be possible that this SSH connection hangs if the host is not responsive. This can lead to a situation where the VM workloads running on said host will be unavailable and the HA will not be present during this process. You can adjust how much are you comfortable with waiting for this ssh to fail by setting the parameter ``ConnectTimeout`` on the oneadmin ssh configuration at ``/var/lib/one/.ssh/config``.
67
+
68
+
The following is a an example configuration
69
+
70
+
.. code-block:: language
71
+
72
+
Host *
73
+
ServerAliveInterval 10
74
+
ControlMaster no
75
+
ControlPersist 70s
76
+
ControlPath /run/one/ssh-socks/ctl-M-%C.sock
77
+
StrictHostKeyChecking no
78
+
UserKnownHostsFile /dev/null
79
+
ConnectTimeout 15
80
+
81
+
.. warning:: Consider that a temporary network/host problem or a small hiccup combined with short timers can lead to an overkill situation where the HA hook gets triggered too fast when waiting a few more seconds could have been fine. This is a trade-off you'll have to be aware of when implementing HA.
0 commit comments