Skip to content

Commit f0433ee

Browse files
committed
net: don't mix device locking in dev_close_many() calls
Lockdep found the following dependency: &dev_instance_lock_key#3 --> &rdev->wiphy.mtx --> &net->xdp.lock --> &xs->mutex --> &dev_instance_lock_key#3 The first dependency is the problem. wiphy mutex should be outside the instance locks. The problem happens in notifiers (as always) for CLOSE. We only hold the instance lock for ops locked devices during CLOSE, and WiFi netdevs are not ops locked. Unfortunately, when we dev_close_many() during netns dismantle we may be holding the instance lock of _another_ netdev when issuing a CLOSE for a WiFi device. Lockdep's "Possible unsafe locking scenario" only prints 3 locks and we have 4, plus I think we'd need 3 CPUs, like this: CPU0 CPU1 CPU2 ---- ---- ---- lock(&xs->mutex); lock(&dev_instance_lock_key#3); lock(&rdev->wiphy.mtx); lock(&net->xdp.lock); lock(&xs->mutex); lock(&rdev->wiphy.mtx); lock(&dev_instance_lock_key#3); Tho, I don't think that's possible as CPU1 and CPU2 would be under rtnl_lock. Even if we have per-netns rtnl_lock and wiphy can span network namespaces - CPU0 and CPU1 must be in the same netns to see dev_instance_lock, so CPU0 can't be installing a socket as CPU1 is tearing the netns down. Regardless, our expected lock ordering is that wiphy lock is taken before instance locks, so let's fix this. Go over the ops locked and non-locked devices separately. Note that calling dev_close_many() on an empty list is perfectly fine. All processing (including RCU syncs) are conditional on the list not being empty, already. Fixes: 7e4d784 ("net: hold netdev instance lock during rtnetlink operations") Reported-by: [email protected] Acked-by: Stanislav Fomichev <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
1 parent 8c941f1 commit f0433ee

File tree

1 file changed

+13
-4
lines changed

1 file changed

+13
-4
lines changed

net/core/dev.c

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11932,15 +11932,24 @@ void unregister_netdevice_many_notify(struct list_head *head,
1193211932
BUG_ON(dev->reg_state != NETREG_REGISTERED);
1193311933
}
1193411934

11935-
/* If device is running, close it first. */
11935+
/* If device is running, close it first. Start with ops locked... */
1193611936
list_for_each_entry(dev, head, unreg_list) {
11937-
list_add_tail(&dev->close_list, &close_head);
11938-
netdev_lock_ops(dev);
11937+
if (netdev_need_ops_lock(dev)) {
11938+
list_add_tail(&dev->close_list, &close_head);
11939+
netdev_lock(dev);
11940+
}
11941+
}
11942+
dev_close_many(&close_head, true);
11943+
/* ... now unlock them and go over the rest. */
11944+
list_for_each_entry(dev, head, unreg_list) {
11945+
if (netdev_need_ops_lock(dev))
11946+
netdev_unlock(dev);
11947+
else
11948+
list_add_tail(&dev->close_list, &close_head);
1193911949
}
1194011950
dev_close_many(&close_head, true);
1194111951

1194211952
list_for_each_entry(dev, head, unreg_list) {
11943-
netdev_unlock_ops(dev);
1194411953
/* And unlink it from device chain. */
1194511954
unlist_netdevice(dev);
1194611955
netdev_lock(dev);

0 commit comments

Comments
 (0)