You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rework retry/timeout defaults to ensure fast service failover
When the galera pod that receives database traffic becomes
unresponsible, the galera library reacts by running a script
in one of the surviving pod to elect a new endpoint. This
script uses curl to call the API server to update the selector
object responsible for balancing database traffic.
If during the API call the API server becomes unresponsive/unreacheable
(e.g. the API VIP fails over to another master node), the curl call
might get stuck for an unbounded period of time, which delays the
traffic failover and can cause a long database service disruption.
Add a default connect timeout and update default retry parameters
so that curl is never blocked for too long, and the endpoint
configuration can be retried until the API server becomes available.
This commit only improves the default parameters, the ability to override
those parameters will be addressed in a subsequent commit.
Jira: OSPRH-17604
0 commit comments