Skip to content

Commit 9f98772

Browse files
sagigrimbergChristoph Hellwig
authored andcommitted
nvme-rdma: fix controller reset hang during traffic
commit fe35ec5 ("block: update hctx map when use multiple maps") exposed an issue where we may hang trying to wait for queue freeze during I/O. We call blk_mq_update_nr_hw_queues which in case of multiple queue maps (which we have now for default/read/poll) is attempting to freeze the queue. However we never started queue freeze when starting the reset, which means that we have inflight pending requests that entered the queue that we will not complete once the queue is quiesced. So start a freeze before we quiesce the queue, and unfreeze the queue after we successfully connected the I/O queues (and make sure to call blk_mq_update_nr_hw_queues only after we are sure that the queue was already frozen). This follows to how the pci driver handles resets. Fixes: fe35ec5 ("block: update hctx map when use multiple maps") Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
1 parent 2875b0a commit 9f98772

File tree

1 file changed

+9
-3
lines changed

1 file changed

+9
-3
lines changed

drivers/nvme/host/rdma.c

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -967,15 +967,20 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
967967
ret = PTR_ERR(ctrl->ctrl.connect_q);
968968
goto out_free_tag_set;
969969
}
970-
} else {
971-
blk_mq_update_nr_hw_queues(&ctrl->tag_set,
972-
ctrl->ctrl.queue_count - 1);
973970
}
974971

975972
ret = nvme_rdma_start_io_queues(ctrl);
976973
if (ret)
977974
goto out_cleanup_connect_q;
978975

976+
if (!new) {
977+
nvme_start_queues(&ctrl->ctrl);
978+
nvme_wait_freeze(&ctrl->ctrl);
979+
blk_mq_update_nr_hw_queues(ctrl->ctrl.tagset,
980+
ctrl->ctrl.queue_count - 1);
981+
nvme_unfreeze(&ctrl->ctrl);
982+
}
983+
979984
return 0;
980985

981986
out_cleanup_connect_q:
@@ -1008,6 +1013,7 @@ static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl,
10081013
bool remove)
10091014
{
10101015
if (ctrl->ctrl.queue_count > 1) {
1016+
nvme_start_freeze(&ctrl->ctrl);
10111017
nvme_stop_queues(&ctrl->ctrl);
10121018
nvme_rdma_stop_io_queues(ctrl);
10131019
if (ctrl->ctrl.tagset) {

0 commit comments

Comments
 (0)