Skip to content

Commit 3780b7e

Browse files
authored
RFC: Support built-in VIP Management (#584)
1 parent 85f6564 commit 3780b7e

File tree

2 files changed

+130
-0
lines changed

2 files changed

+130
-0
lines changed
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Proposal: VIP-Management
2+
3+
- Author(s): [djshow832](https://github.com/djshow832)
4+
- Tracking Issue: https://github.com/pingcap/tiproxy/issues/583
5+
6+
## Abstract
7+
8+
This proposes a design of managing VIP on TiProxy clusters to achieve high availability of TiProxy without deploying third-party tools.
9+
10+
## Terms
11+
12+
- VIP: Virtual IP
13+
- NIC: Network Interface Card
14+
- ARP: Address Resolution Protocol
15+
- VRRP: Virtual Router Redundancy Protocol
16+
- MMM: Multi-Master Replication Manager for MySQL
17+
- MHA: Master High Availability
18+
19+
## Background
20+
21+
In a self-hosted TiDB cluster with TiProxy, TiProxy is typically the endpoint for clients. To achieve high availability, users may deploy multiple TiProxy instances and only one serves requests so that the client can configure only one database address. When the active TiProxy is down, the cluster should elect another TiProxy automatically and the client doesn't need to update the database address.
22+
23+
So we need a VIP solution. The VIP is always bound to an available TiProxy node. When the active node is down, VIP floats to another node.
24+
25+
<img src="./imgs/vip-arch.png" alt="vip architecture" width="600">
26+
27+
Currently, typical solutions include:
28+
29+
- Deploy keepalived together with TiProxy. Keepalived is capable of both health checks and VIP management.
30+
- Deploy a crontab job to check the health of TiProxy and set VIP
31+
32+
Both ways are not easy to use. This design proposes a solution to enable the TiProxy cluster to manage VIP by itself.
33+
34+
## Goals
35+
36+
- Bind a VIP to an available TiProxy node and switch the VIP when the node becomes unavailable
37+
- Support VIP management on self-hosted TiDB clusters that run on bare metal with Linux
38+
39+
## Non-Goals
40+
41+
- Support configuring weights of TiProxy nodes
42+
- Support configuring multiple VIPs for a TiProxy cluster
43+
- Support VIP management on Docker, Kubernetes, or cloud
44+
- Support VIP management on non-Linux operating systems
45+
46+
## Proposal
47+
48+
### Active Node Election
49+
50+
Firstly, the TiProxy cluster needs to elect an available instance to be the active node. Etcd is built in PD and is capable of leader election, so we can just use the Etcd election. The first instance booted will be the leader for the first election round.
51+
52+
When an instance is chosen to be active, it binds VIP to itself. It unbinds the VIP when:
53+
54+
- It finds that it's no longer the leader, maybe because its network is unstable and Etcd evicts it
55+
- It's shutting down, maybe because TiProxy instances are rolling upgrade
56+
57+
### Failover
58+
59+
The Etcd session TTL determines the RTO. Longer TTL makes the RTO longer, while shorter TTL makes the leader switch frequently in a bad network. We set it to 3 seconds, thus the RTO should be nearly 3 seconds.
60+
61+
During the shutdown of the active node, the active node's unbinding and the standby node's binding happen concurrently. If the unbinding comes first, the clients may fail to connect. To ensure the binding comes first, the active node resigns the leader before graceful waiting and unbinds after graceful waiting.
62+
63+
When the PD leader is down and before a new leader is elected, all TiProxy nodes can't connect to the Etcd server. If the owner unbinds the VIP, the clients can't connect to TiProxy with the VIP. Thus, the owner doesn't unbind the VIP until the next active node is elected.
64+
65+
### Adding and Deleting VIP
66+
67+
Once a node is chosen to be active, it binds the VIP to itself through 2 steps:
68+
69+
1. Attach a secondary IP to the specified NIC through netlink
70+
2. Notify the whole subnet through ARP about the IP and MAC address so that the clients update the ARP cache
71+
72+
There may be some time when the previous active node doesn't unbind the VIP in time and the new active node binds the VIP. The second step ensures that the clients connect to the new node because the ARP cache is updated. The connections to the previous node continue until the clients disconnect them.
73+
74+
These steps are equal to the Linux commands:
75+
76+
```shell
77+
ip addr add 192.168.148.100/24 dev eth0
78+
arping -q -c 1 -U -I eth0 192.168.148.100
79+
```
80+
81+
The secondary IP is used in MySQL HA clusters such as MMM and MHA. The limitation is that the secondary IP should be reserved in the subnet and it only works in the same subnet.
82+
83+
The TiProxy user must be privileged to run `ip addr add`, `ip addr del` and `arping`, meaning that it should be the `root`. However, TiProxy is typically deployed by TiUP and TiUP only needs the `sudo` permission, so TiProxy should retry with `sudo` if the permission is denied, but it requires `ip` and `arping` to be installed.
84+
85+
## Configuration
86+
87+
All TiProxy instances have the same configuration:
88+
89+
```yaml
90+
[ha]
91+
vip="192.168.148.100"
92+
interface="eth0"
93+
```
94+
95+
`vip` declares the VIP and `interface` declares the network interface (or NIC device) to which the VIP is bound. If any of them is not configured, the instance won't preempt VIP.
96+
97+
It's possible to update configurations online but it's unnecessary. The clients need to update the database address and it interrupts the business anyway, so we don't support update configurations online.
98+
99+
## Observability
100+
101+
Besides logs, we can show the current active node on Grafana.
102+
103+
## Alternatives
104+
105+
### Consensus Algorithms
106+
107+
Some products use consensus algorithms such as Raft and Paxos to elect the active node. It's straightforward but has some disadvantages:
108+
109+
- The consensus algorithms need at least 3 nodes, while users usually need only 2.
110+
- If there's a network partition, the elected node must be able to connect to the PD leader, while the active node elected by the consensus algorithm may be in another partition with the PD leader. If so, the node will route to the TiDB instances that are unable to connect to the PD leader either.
111+
112+
### VRRP
113+
114+
VRRP is another VIP solution and is applied by Keepalived, a tool widely used by proxies, including HAProxy.
115+
116+
The problem is that VRRP is too complicated to troubleshoot.
117+
118+
## Future works
119+
120+
### Weight Configuration
121+
122+
Node weights may be useful when users have preferences for active nodes. If the node with the highest weight is available, it holds the VIP until it's down. On top of this, some products also have a preempt mode. That is, when the preferred node recovers, it should take back the VIP even if the current active node is still available.
123+
124+
Although some products support configuring node weights, it's not so straightforward to implement on etcd and may not be necessary. We'll consider it in the future if users require it. Currently, all the nodes share the same possibility of being active.
125+
126+
### Multiple VIPs
127+
128+
Some MySQL clusters use one VIP for write nodes and multiple VIPs for read nodes. Similarly, TiProxy can have multiple VIPs to expose multiple endpoints for resource isolation. It needs to partition TiProxy and TiDB instances into node groups and each TiProxy only routes to the TiDB in the same group.
129+
130+
It changes the election procedure and TiProxy configuration. We'll consider it if users require it.

docs/design/imgs/vip-arch.png

97.6 KB
Loading

0 commit comments

Comments
 (0)