Skip to content

Commit c6124f6

Browse files
jacob-kellerkuba-moo
authored andcommitted
iavf: fix circular lock dependency with netdev_lock
We have recently seen reports of lockdep circular lock dependency warnings when loading the iAVF driver: [ 1504.790308] ====================================================== [ 1504.790309] WARNING: possible circular locking dependency detected [ 1504.790310] 6.13.0 #net_next_rt.c2933b2befe2.el9 Not tainted [ 1504.790311] ------------------------------------------------------ [ 1504.790312] kworker/u128:0/13566 is trying to acquire lock: [ 1504.790313] ffff97d0e4738f18 (&dev->lock){+.+.}-{4:4}, at: register_netdevice+0x52c/0x710 [ 1504.790320] [ 1504.790320] but task is already holding lock: [ 1504.790321] ffff97d0e47392e8 (&adapter->crit_lock){+.+.}-{4:4}, at: iavf_finish_config+0x37/0x240 [iavf] [ 1504.790330] [ 1504.790330] which lock already depends on the new lock. [ 1504.790330] [ 1504.790330] [ 1504.790330] the existing dependency chain (in reverse order) is: [ 1504.790331] [ 1504.790331] -> #1 (&adapter->crit_lock){+.+.}-{4:4}: [ 1504.790333] __lock_acquire+0x52d/0xbb0 [ 1504.790337] lock_acquire+0xd9/0x330 [ 1504.790338] mutex_lock_nested+0x4b/0xb0 [ 1504.790341] iavf_finish_config+0x37/0x240 [iavf] [ 1504.790347] process_one_work+0x248/0x6d0 [ 1504.790350] worker_thread+0x18d/0x330 [ 1504.790352] kthread+0x10e/0x250 [ 1504.790354] ret_from_fork+0x30/0x50 [ 1504.790357] ret_from_fork_asm+0x1a/0x30 [ 1504.790361] [ 1504.790361] -> #0 (&dev->lock){+.+.}-{4:4}: [ 1504.790364] check_prev_add+0xf1/0xce0 [ 1504.790366] validate_chain+0x46a/0x570 [ 1504.790368] __lock_acquire+0x52d/0xbb0 [ 1504.790370] lock_acquire+0xd9/0x330 [ 1504.790371] mutex_lock_nested+0x4b/0xb0 [ 1504.790372] register_netdevice+0x52c/0x710 [ 1504.790374] iavf_finish_config+0xfa/0x240 [iavf] [ 1504.790379] process_one_work+0x248/0x6d0 [ 1504.790381] worker_thread+0x18d/0x330 [ 1504.790383] kthread+0x10e/0x250 [ 1504.790385] ret_from_fork+0x30/0x50 [ 1504.790387] ret_from_fork_asm+0x1a/0x30 [ 1504.790389] [ 1504.790389] other info that might help us debug this: [ 1504.790389] [ 1504.790389] Possible unsafe locking scenario: [ 1504.790389] [ 1504.790390] CPU0 CPU1 [ 1504.790391] ---- ---- [ 1504.790391] lock(&adapter->crit_lock); [ 1504.790393] lock(&dev->lock); [ 1504.790394] lock(&adapter->crit_lock); [ 1504.790395] lock(&dev->lock); [ 1504.790397] [ 1504.790397] *** DEADLOCK *** This appears to be caused by the change in commit 5fda3f3 ("net: make netdev_lock() protect netdev->reg_state"), which added a netdev_lock() in register_netdevice. The iAVF driver calls register_netdevice() from iavf_finish_config(), as a final stage of its state machine post-probe. It currently takes the RTNL lock, then the netdev lock, and then the device critical lock. This pattern is used throughout the driver. Thus there is a strong dependency that the crit_lock should not be acquired before the net device lock. The change to register_netdevice creates an ABBA lock order violation because the iAVF driver is holding the crit_lock while calling register_netdevice, which then takes the netdev_lock. It seems likely that future refactors could result in netdev APIs which hold the netdev_lock while calling into the driver. This means that we should not re-order the locks so that netdev_lock is acquired after the device private crit_lock. Instead, notice that we already release the netdev_lock prior to calling the register_netdevice. This flow only happens during the early driver initialization as we transition through the __IAVF_STARTUP, __IAVF_INIT_VERSION_CHECK, __IAVF_INIT_GET_RESOURCES, etc. Analyzing the places where we take crit_lock in the driver there are two sources: a) several of the work queue tasks including adminq_task, watchdog_task, reset_task, and the finish_config task. b) various callbacks which ultimately stem back to .ndo operations or ethtool operations. The latter cannot be triggered until after the netdevice registration is completed successfully. The iAVF driver uses alloc_ordered_workqueue, which is an unbound workqueue that has a max limit of 1, and thus guarantees that only a single work item on the queue is executing at any given time, so none of the other work threads could be executing due to the ordered workqueue guarantees. The iavf_finish_config() function also does not do anything else after register_netdevice, unless it fails. It seems unlikely that the driver private crit_lock is protecting anything that register_netdevice() itself touches. Thus, to fix this ABBA lock violation, lets simply release the adapter->crit_lock as well as netdev_lock prior to calling register_netdevice(). We do still keep holding the RTNL lock as required by the function. If we do fail to register the netdevice, then we re-acquire the adapter critical lock to finish the transition back to __IAVF_INIT_CONFIG_ADAPTER. This ensures every call where both netdev_lock and the adapter->crit_lock are acquired under the same ordering. Fixes: afc6649 ("eth: iavf: extend the netdev_lock usage") Signed-off-by: Jacob Keller <[email protected]> Tested-by: Przemek Kitszel <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Tested-by: Rafal Romanowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
1 parent 5c07be9 commit c6124f6

File tree

1 file changed

+8
-4
lines changed

1 file changed

+8
-4
lines changed

drivers/net/ethernet/intel/iavf/iavf_main.c

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1983,7 +1983,7 @@ static int iavf_reinit_interrupt_scheme(struct iavf_adapter *adapter, bool runni
19831983
static void iavf_finish_config(struct work_struct *work)
19841984
{
19851985
struct iavf_adapter *adapter;
1986-
bool netdev_released = false;
1986+
bool locks_released = false;
19871987
int pairs, err;
19881988

19891989
adapter = container_of(work, struct iavf_adapter, finish_config);
@@ -2012,19 +2012,22 @@ static void iavf_finish_config(struct work_struct *work)
20122012
netif_set_real_num_tx_queues(adapter->netdev, pairs);
20132013

20142014
if (adapter->netdev->reg_state != NETREG_REGISTERED) {
2015+
mutex_unlock(&adapter->crit_lock);
20152016
netdev_unlock(adapter->netdev);
2016-
netdev_released = true;
2017+
locks_released = true;
20172018
err = register_netdevice(adapter->netdev);
20182019
if (err) {
20192020
dev_err(&adapter->pdev->dev, "Unable to register netdev (%d)\n",
20202021
err);
20212022

20222023
/* go back and try again.*/
2024+
mutex_lock(&adapter->crit_lock);
20232025
iavf_free_rss(adapter);
20242026
iavf_free_misc_irq(adapter);
20252027
iavf_reset_interrupt_capability(adapter);
20262028
iavf_change_state(adapter,
20272029
__IAVF_INIT_CONFIG_ADAPTER);
2030+
mutex_unlock(&adapter->crit_lock);
20282031
goto out;
20292032
}
20302033
}
@@ -2040,9 +2043,10 @@ static void iavf_finish_config(struct work_struct *work)
20402043
}
20412044

20422045
out:
2043-
mutex_unlock(&adapter->crit_lock);
2044-
if (!netdev_released)
2046+
if (!locks_released) {
2047+
mutex_unlock(&adapter->crit_lock);
20452048
netdev_unlock(adapter->netdev);
2049+
}
20462050
rtnl_unlock();
20472051
}
20482052

0 commit comments

Comments
 (0)