-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Description
Steps to Reproduce
- two servers (serverA and serverB) each with patroni and vip-manager installed and configured
dcs-typeis set topatroni. all other trigger related options are set to default- Currently serverA is Leader and has the VIP
- Stop patroni on serverA (
systemctl stop patroni)
expected Behaviour
- serverB becomes db leader
- vip-manager on serverB takes VIP
- vip-manager on serverA releases VIP
current behaviour (vip-manager 4.0.0)
- serverB becomes the leader
- vip-manager on serverB activates the VIP
- vip-manager on serverA does not release the VIP and even tries to get it back even though its dcs-backend (patroni) is not reachable
- The VIP is switching between serverA and serverB since they both think they have to have it thus making database connection unreliable
Logs
vip-manager on serverA:
Sep 30 13:22:18 serverA vip-manager[803251]: 2025-09-30T13:22:18.668+0200 ERROR patroni REST API error:Get "http://127.0.0.1:8008//leader": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Sep 30 13:22:18 serverA vip-manager[803251]: github.com/cybertec-postgresql/vip-manager/checker.(*PatroniLeaderChecker).GetChangeNotificationStream
Sep 30 13:22:18 serverA vip-manager[803251]: /home/runner/work/vip-manager/vip-manager/checker/patroni_leader_checker.go:52
Sep 30 13:22:18 serverA vip-manager[803251]: main.main.func3
Sep 30 13:22:18 serverA vip-manager[803251]: /home/runner/work/vip-manager/vip-manager/main.go:65
Sep 30 13:22:19 serverA vip-manager[803251]: 2025-09-30T13:22:19.669+0200 ERROR patroni REST API error:Get "http://127.0.0.1:8008//leader": dial tcp 127.0.0.1:8008: connect: connection refused
[...]
Sep 30 13:22:29 serverA vip-manager[803251]: 2025-09-30T13:22:29.681+0200 ERROR patroni REST API error:Get "http://127.0.0.1:8008//leader": dial tcp 127.0.0.1:8008: connect: connection refused
Sep 30 13:22:29 serverA vip-manager[803251]: github.com/cybertec-postgresql/vip-manager/checker.(*PatroniLeaderChecker).GetChangeNotificationStream
Sep 30 13:22:29 serverA vip-manager[803251]: /home/runner/work/vip-manager/vip-manager/checker/patroni_leader_checker.go:52
Sep 30 13:22:29 serverA vip-manager[803251]: main.main.func3
Sep 30 13:22:29 serverA vip-manager[803251]: /home/runner/work/vip-manager/vip-manager/main.go:65
Sep 30 13:22:29 serverA vip-manager[803251]: 2025-09-30T13:22:29.967+0200 INFO IP address 10.0.99.64/24 is up, must be up
vip-manager on serverB:
Sep 30 13:21:49 serverB vip-manager[501796]: 2025-09-30T13:21:49.685+0200 INFO IP address 10.0.99.64/24 is down, must be down
Sep 30 13:21:59 serverB vip-manager[501796]: 2025-09-30T13:21:59.685+0200 INFO IP address 10.0.99.64/24 is down, must be down
Sep 30 13:22:09 serverB vip-manager[501796]: 2025-09-30T13:22:09.686+0200 INFO IP address 10.0.99.64/24 is down, must be down
Sep 30 13:22:19 serverB vip-manager[501796]: 2025-09-30T13:22:19.592+0200 INFO IP address 10.0.99.64/24 is down, must be up
Sep 30 13:22:19 serverB vip-manager[501796]: 2025-09-30T13:22:19.592+0200 INFO Configuring address 10.0.99.64/24 on enp3s0
Sep 30 13:22:29 serverB vip-manager[501796]: 2025-09-30T13:22:29.603+0200 INFO IP address 10.0.99.64/24 is up, must be up
Sep 30 13:22:39 serverB vip-manager[501796]: 2025-09-30T13:22:39.604+0200 INFO IP address 10.0.99.64/24 is up, must be up
Possible Solution
One possible workaround would be to amend the systemd unit of vip-manager so that it starts and stops together with patroni:
[Unit]
Description=Manages Virtual IP for Patroni
After=network-online.target
Before=patroni.service
PartOf=patroni.service
[Service]
Type=simple
ExecStart=/usr/bin/vip-manager --config=/etc/default/vip-manager.yml
Restart=on-failure
[Install]
WantedBy=multi-user.target
WantedBy=patroni.service
However this solution would only work if the systemd unit is stopped (either by a user or by systemd itself in case the main process crashes). This would not trigger if the patroni process hangs for some reason.
A better solution would be to release the VIP if the dcs-endpoint is not reachable since the leader role will probably not be on any server where patroni is not running.
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
To do