Skip to content

scsi reservations issue on failover #25

@WanWizard

Description

@WanWizard

Hey, followed your great instructions to the letter, but I'm left with a situation that leaves me stumped.

I have a setup with two supermicro's, each connected to 2 12-disk JBOD's with SAS disks, but without a loop, so no multipath (and multipath is not installed). Both JBOD's are used in mirrored vdevs, so I can lose an entire JBOD without much issues.

OS: CentOS Linux release 7.6.1810 (Core)
ZFS: 0.7.13, from the zfs-kmod repo

This setup works fine, until pacemaker decides there is a need to failover. It doesn't matter if that is because the active node is put into standby, because the hardware is switched off, etc.

When pacemaker fails over, the second node tries to import the pool, which fails because something on the first node has placed SCSI reservations on the disk:

[root@nas01 /]# sg_persist -r /dev/sdh
  NETAPP    X412_HVIPC560A15  NA02
  Peripheral device type: disk
  PR generation=0x1, Reservation follows:
    Key=0x666e0001
    scope: LU_SCOPE,  type: Write Exclusive, registrants only

as soon as the failover happens, the second node starts to log:

[ 5834.890588] sd 0:0:7:0: reservation conflict
[ 5834.890674] sd 0:0:7:0: reservation conflict
[ 5834.890693] sd 0:0:7:0: [sdh] Test Unit Ready failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 5834.891274] sd 0:0:7:0: reservation conflict
[ 5834.891369] sd 0:0:7:0: reservation conflict
[ 5834.891466] sd 0:0:7:0: reservation conflict
[ 5834.891560] sd 0:0:7:0: reservation conflict
[ 5834.921402]  sdh: sdh1 sdh9
[ 5834.922452] sd 0:0:7:0: reservation conflict
[ 5834.957331] sd 0:0:7:0: reservation conflict
[ 5834.958157] sd 0:0:7:0: reservation conflict
[ 5835.052881] sd 0:0:7:0: reservation conflict

which either causes the entire import to fail, or, if the import succeeds, with disks offline due to excessive errors.

I've been pulling my hair out for about two weeks now, but no clue what sets these reservations, or how I can have them released on a cluster start or a cluster failover. There seem to be lots of people building Linux HA clusters with ZFS judging the discussions I found, but no one mentions this issue...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions