Skip to content

Commit 55f0bfc

Browse files
Yun Ludavem330
authored andcommitted
af_packet: fix soft lockup issue caused by tpacket_snd()
When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for pending_refcnt to decrement to zero before returning. The pending_refcnt is decremented by 1 when the skb->destructor function is called, indicating that the skb has been successfully sent and needs to be destroyed. If an error occurs during this process, the tpacket_snd() function will exit and return error, but pending_refcnt may not yet have decremented to zero. Assuming the next send operation is executed immediately, but there are no available frames to be sent in tx_ring (i.e., packet_current_frame returns NULL), and skb is also NULL, the function will not execute wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it will enter a do-while loop, waiting for pending_refcnt to be zero. Even if the previous skb has completed transmission, the skb->destructor function can only be invoked in the ksoftirqd thread (assuming NAPI threading is enabled). When both the ksoftirqd thread and the tpacket_snd operation happen to run on the same CPU, and the CPU trapped in the do-while loop without yielding, the ksoftirqd thread will not get scheduled to run. As a result, pending_refcnt will never be reduced to zero, and the do-while loop cannot exit, eventually leading to a CPU soft lockup issue. In fact, skb is true for all but the first iterations of that loop, and as long as pending_refcnt is not zero, even if incremented by a previous call, wait_for_completion_interruptible_timeout() should be executed to yield the CPU, allowing the ksoftirqd thread to be scheduled. Therefore, the execution condition of this function should be modified to check if pending_refcnt is not zero, instead of check skb. - if (need_wait && skb) { + if (need_wait && packet_read_pending(&po->tx_ring)) { As a result, the judgment conditions are duplicated with the end code of the while loop, and packet_read_pending() is a very expensive function. Actually, this loop can only exit when ph is NULL, so the loop condition can be changed to while (1), and in the "ph = NULL" branch, if the subsequent condition of if is not met, the loop can break directly. Now, the loop logic remains the same as origin but is clearer and more obvious. Fixes: 89ed5b5 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET") Cc: [email protected] Suggested-by: LongJun Tang <[email protected]> Signed-off-by: Yun Lu <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
1 parent c1ba3c0 commit 55f0bfc

File tree

1 file changed

+11
-12
lines changed

1 file changed

+11
-12
lines changed

net/packet/af_packet.c

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2846,15 +2846,21 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
28462846
ph = packet_current_frame(po, &po->tx_ring,
28472847
TP_STATUS_SEND_REQUEST);
28482848
if (unlikely(ph == NULL)) {
2849-
if (need_wait && skb) {
2849+
/* Note: packet_read_pending() might be slow if we
2850+
* have to call it as it's per_cpu variable, but in
2851+
* fast-path we don't have to call it, only when ph
2852+
* is NULL, we need to check the pending_refcnt.
2853+
*/
2854+
if (need_wait && packet_read_pending(&po->tx_ring)) {
28502855
timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
28512856
if (timeo <= 0) {
28522857
err = !timeo ? -ETIMEDOUT : -ERESTARTSYS;
28532858
goto out_put;
28542859
}
2855-
}
2856-
/* check for additional frames */
2857-
continue;
2860+
/* check for additional frames */
2861+
continue;
2862+
} else
2863+
break;
28582864
}
28592865

28602866
skb = NULL;
@@ -2943,14 +2949,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
29432949
}
29442950
packet_increment_head(&po->tx_ring);
29452951
len_sum += tp_len;
2946-
} while (likely((ph != NULL) ||
2947-
/* Note: packet_read_pending() might be slow if we have
2948-
* to call it as it's per_cpu variable, but in fast-path
2949-
* we already short-circuit the loop with the first
2950-
* condition, and luckily don't have to go that path
2951-
* anyway.
2952-
*/
2953-
(need_wait && packet_read_pending(&po->tx_ring))));
2952+
} while (1);
29542953

29552954
err = len_sum;
29562955
goto out_put;

0 commit comments

Comments
 (0)