scsi reservations issue on failover

Hey, followed your great instructions to the letter, but I'm left with a situation that leaves me stumped.

I have a setup with two supermicro's, each connected to 2 12-disk JBOD's with SAS disks, but without a loop, so no multipath (and multipath is not installed). Both JBOD's are used in mirrored vdevs, so I can lose an entire JBOD without much issues.

OS: CentOS Linux release 7.6.1810 (Core)
ZFS: 0.7.13, from the zfs-kmod repo

This setup works fine, until pacemaker decides there is a need to failover. It doesn't matter if that is because the active node is put into standby, because the hardware is switched off, etc.

When pacemaker fails over, the second node tries to import the pool, which fails because something on the first node has placed SCSI reservations on the disk:
````
[root@nas01 /]# sg_persist -r /dev/sdh
  NETAPP    X412_HVIPC560A15  NA02
  Peripheral device type: disk
  PR generation=0x1, Reservation follows:
    Key=0x666e0001
    scope: LU_SCOPE,  type: Write Exclusive, registrants only
````
as soon as the failover happens, the second node starts to log:
````
[ 5834.890588] sd 0:0:7:0: reservation conflict
[ 5834.890674] sd 0:0:7:0: reservation conflict
[ 5834.890693] sd 0:0:7:0: [sdh] Test Unit Ready failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 5834.891274] sd 0:0:7:0: reservation conflict
[ 5834.891369] sd 0:0:7:0: reservation conflict
[ 5834.891466] sd 0:0:7:0: reservation conflict
[ 5834.891560] sd 0:0:7:0: reservation conflict
[ 5834.921402]  sdh: sdh1 sdh9
[ 5834.922452] sd 0:0:7:0: reservation conflict
[ 5834.957331] sd 0:0:7:0: reservation conflict
[ 5834.958157] sd 0:0:7:0: reservation conflict
[ 5835.052881] sd 0:0:7:0: reservation conflict
````
which either causes the entire import to fail, or, if the import succeeds, with disks offline due to excessive errors.

I've been pulling my hair out for about two weeks now, but no clue what sets these reservations, or how I can have them released on a cluster start or a cluster failover. There seem to be lots of people building Linux HA clusters with ZFS judging the discussions I found, but no one mentions this issue...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scsi reservations issue on failover #25

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

scsi reservations issue on failover #25

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions