You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
nvme: fix deadlock caused by ANA update wrong locking
The deadlock combines 4 flows in parallel:
- ns scanning (triggered from reconnect)
- request timeout
- ANA update (triggered from reconnect)
- I/O coming into the mpath device
(1) ns scanning triggers disk revalidation -> update disk info ->
freeze queue -> but blocked, due to (2)
(2) timeout handler reference the g_usage_counter - > but blocks in
the transport .timeout() handler, due to (3)
(3) the transport timeout handler (indirectly) calls nvme_stop_queue() ->
which takes the (down_read) namespaces_rwsem - > but blocks, due to (4)
(4) ANA update takes the (down_write) namespaces_rwsem -> calls
nvme_mpath_set_live() -> which synchronize the ns_head srcu
(see commit 504db08) -> but blocks, due to (5)
(5) I/O came into nvme_mpath_make_request -> took srcu_read_lock ->
direct_make_request > blk_queue_enter -> but blocked, due to (1)
==> the request queue is under freeze -> deadlock.
The fix is making ANA update take a read lock as the namespaces list
is not manipulated, it is just the ns and ns->head that are being
updated (which is protected with the ns->head lock).
Fixes: 0d0b660 ("nvme: add ANA support")
Signed-off-by: Sagi Grimberg <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
0 commit comments