-
Notifications
You must be signed in to change notification settings - Fork 77
Description
Hey, followed your great instructions to the letter, but I'm left with a situation that leaves me stumped.
I have a setup with two supermicro's, each connected to 2 12-disk JBOD's with SAS disks, but without a loop, so no multipath (and multipath is not installed). Both JBOD's are used in mirrored vdevs, so I can lose an entire JBOD without much issues.
OS: CentOS Linux release 7.6.1810 (Core)
ZFS: 0.7.13, from the zfs-kmod repo
This setup works fine, until pacemaker decides there is a need to failover. It doesn't matter if that is because the active node is put into standby, because the hardware is switched off, etc.
When pacemaker fails over, the second node tries to import the pool, which fails because something on the first node has placed SCSI reservations on the disk:
[root@nas01 /]# sg_persist -r /dev/sdh
NETAPP X412_HVIPC560A15 NA02
Peripheral device type: disk
PR generation=0x1, Reservation follows:
Key=0x666e0001
scope: LU_SCOPE, type: Write Exclusive, registrants only
as soon as the failover happens, the second node starts to log:
[ 5834.890588] sd 0:0:7:0: reservation conflict
[ 5834.890674] sd 0:0:7:0: reservation conflict
[ 5834.890693] sd 0:0:7:0: [sdh] Test Unit Ready failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 5834.891274] sd 0:0:7:0: reservation conflict
[ 5834.891369] sd 0:0:7:0: reservation conflict
[ 5834.891466] sd 0:0:7:0: reservation conflict
[ 5834.891560] sd 0:0:7:0: reservation conflict
[ 5834.921402] sdh: sdh1 sdh9
[ 5834.922452] sd 0:0:7:0: reservation conflict
[ 5834.957331] sd 0:0:7:0: reservation conflict
[ 5834.958157] sd 0:0:7:0: reservation conflict
[ 5835.052881] sd 0:0:7:0: reservation conflict
which either causes the entire import to fail, or, if the import succeeds, with disks offline due to excessive errors.
I've been pulling my hair out for about two weeks now, but no clue what sets these reservations, or how I can have them released on a cluster start or a cluster failover. There seem to be lots of people building Linux HA clusters with ZFS judging the discussions I found, but no one mentions this issue...