Skip to content

Commit 7649072

Browse files
bjking1martinkpetersen
authored andcommitted
scsi: ibmvfc: Set default timeout to avoid crash during migration
While testing live partition mobility, we have observed occasional crashes of the Linux partition. What we've seen is that during the live migration, for specific configurations with large amounts of memory, slow network links, and workloads that are changing memory a lot, the partition can end up being suspended for 30 seconds or longer. This resulted in the following scenario: CPU 0 CPU 1 ------------------------------- ---------------------------------- scsi_queue_rq migration_store -> blk_mq_start_request -> rtas_ibm_suspend_me -> blk_add_timer -> on_each_cpu(rtas_percpu_suspend_me _______________________________________V | V -> IPI from CPU 1 -> rtas_percpu_suspend_me -> __rtas_suspend_last_cpu -- Linux partition suspended for > 30 seconds -- -> for_each_online_cpu(cpu) plpar_hcall_norets(H_PROD -> scsi_dispatch_cmd -> scsi_times_out -> scsi_abort_command -> queue_delayed_work -> ibmvfc_queuecommand_lck -> ibmvfc_send_event -> ibmvfc_send_crq - returns H_CLOSED <- returns SCSI_MLQUEUE_HOST_BUSY -> __blk_mq_requeue_request -> scmd_eh_abort_handler -> scsi_try_to_abort_cmd - returns SUCCESS -> scsi_queue_insert Normally, the SCMD_STATE_COMPLETE bit would protect against the command completion and the timeout, but that doesn't work here, since we don't check that at all in the SCSI_MLQUEUE_HOST_BUSY path. In this case we end up calling scsi_queue_insert on a request that has already been queued, or possibly even freed, and we crash. The patch below simply increases the default I/O timeout to avoid this race condition. This is also the timeout value that nearly all IBM SAN storage recommends setting as the default value. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Brian King <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
1 parent 780e138 commit 7649072

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

drivers/scsi/ibmvscsi/ibmvfc.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3007,8 +3007,10 @@ static int ibmvfc_slave_configure(struct scsi_device *sdev)
30073007
unsigned long flags = 0;
30083008

30093009
spin_lock_irqsave(shost->host_lock, flags);
3010-
if (sdev->type == TYPE_DISK)
3010+
if (sdev->type == TYPE_DISK) {
30113011
sdev->allow_restart = 1;
3012+
blk_queue_rq_timeout(sdev->request_queue, 120 * HZ);
3013+
}
30123014
spin_unlock_irqrestore(shost->host_lock, flags);
30133015
return 0;
30143016
}

0 commit comments

Comments
 (0)