Skip to content

Commit 4edd8cd

Browse files
mikechristiemartinkpetersen
authored andcommitted
scsi: core: sysfs: Fix hang when device state is set via sysfs
This fixes a regression added with: commit f0f82e2 ("scsi: core: Fix capacity set to zero after offlinining device") The problem is that after iSCSI recovery, iscsid will call into the kernel to set the dev's state to running, and with that patch we now call scsi_rescan_device() with the state_mutex held. If the SCSI error handler thread is just starting to test the device in scsi_send_eh_cmnd() then it's going to try to grab the state_mutex. We are then stuck, because when scsi_rescan_device() tries to send its I/O scsi_queue_rq() calls -> scsi_host_queue_ready() -> scsi_host_in_recovery() which will return true (the host state is still in recovery) and I/O will just be requeued. scsi_send_eh_cmnd() will then never be able to grab the state_mutex to finish error handling. To prevent the deadlock move the rescan-related code to after we drop the state_mutex. This also adds a check for if we are already in the running state. This prevents extra scans and helps the iscsid case where if the transport class has already onlined the device during its recovery process then we don't need userspace to do it again plus possibly block that daemon. Link: https://lore.kernel.org/r/[email protected] Fixes: f0f82e2 ("scsi: core: Fix capacity set to zero after offlinining device") Cc: Bart Van Assche <[email protected]> Cc: lijinlin <[email protected]> Cc: Wu Bo <[email protected]> Reviewed-by: Lee Duncan <[email protected]> Reviewed-by: Wu Bo <[email protected]> Signed-off-by: Mike Christie <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
1 parent a0c2f8b commit 4edd8cd

File tree

1 file changed

+19
-11
lines changed

1 file changed

+19
-11
lines changed

drivers/scsi/scsi_sysfs.c

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -792,6 +792,7 @@ store_state_field(struct device *dev, struct device_attribute *attr,
792792
int i, ret;
793793
struct scsi_device *sdev = to_scsi_device(dev);
794794
enum scsi_device_state state = 0;
795+
bool rescan_dev = false;
795796

796797
for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
797798
const int len = strlen(sdev_states[i].name);
@@ -810,20 +811,27 @@ store_state_field(struct device *dev, struct device_attribute *attr,
810811
}
811812

812813
mutex_lock(&sdev->state_mutex);
813-
ret = scsi_device_set_state(sdev, state);
814-
/*
815-
* If the device state changes to SDEV_RUNNING, we need to
816-
* run the queue to avoid I/O hang, and rescan the device
817-
* to revalidate it. Running the queue first is necessary
818-
* because another thread may be waiting inside
819-
* blk_mq_freeze_queue_wait() and because that call may be
820-
* waiting for pending I/O to finish.
821-
*/
822-
if (ret == 0 && state == SDEV_RUNNING) {
814+
if (sdev->sdev_state == SDEV_RUNNING && state == SDEV_RUNNING) {
815+
ret = count;
816+
} else {
817+
ret = scsi_device_set_state(sdev, state);
818+
if (ret == 0 && state == SDEV_RUNNING)
819+
rescan_dev = true;
820+
}
821+
mutex_unlock(&sdev->state_mutex);
822+
823+
if (rescan_dev) {
824+
/*
825+
* If the device state changes to SDEV_RUNNING, we need to
826+
* run the queue to avoid I/O hang, and rescan the device
827+
* to revalidate it. Running the queue first is necessary
828+
* because another thread may be waiting inside
829+
* blk_mq_freeze_queue_wait() and because that call may be
830+
* waiting for pending I/O to finish.
831+
*/
823832
blk_mq_run_hw_queues(sdev->request_queue, true);
824833
scsi_rescan_device(dev);
825834
}
826-
mutex_unlock(&sdev->state_mutex);
827835

828836
return ret == 0 ? count : -EINVAL;
829837
}

0 commit comments

Comments
 (0)