Skip to content

Commit 89ed5b5

Browse files
nhormandavem330
authored andcommitted
af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET
When an application is run that: a) Sets its scheduler to be SCHED_FIFO and b) Opens a memory mapped AF_PACKET socket, and sends frames with the MSG_DONTWAIT flag cleared, its possible for the application to hang forever in the kernel. This occurs because when waiting, the code in tpacket_snd calls schedule, which under normal circumstances allows other tasks to run, including ksoftirqd, which in some cases is responsible for freeing the transmitted skb (which in AF_PACKET calls a destructor that flips the status bit of the transmitted frame back to available, allowing the transmitting task to complete). However, when the calling application is SCHED_FIFO, its priority is such that the schedule call immediately places the task back on the cpu, preventing ksoftirqd from freeing the skb, which in turn prevents the transmitting task from detecting that the transmission is complete. We can fix this by converting the schedule call to a completion mechanism. By using a completion queue, we force the calling task, when it detects there are no more frames to send, to schedule itself off the cpu until such time as the last transmitted skb is freed, allowing forward progress to be made. Tested by myself and the reporter, with good results Change Notes: V1->V2: Enhance the sleep logic to support being interruptible and allowing for honoring to SK_SNDTIMEO (Willem de Bruijn) V2->V3: Rearrage the point at which we wait for the completion queue, to avoid needing to check for ph/skb being null at the end of the loop. Also move the complete call to the skb destructor to avoid needing to modify __packet_set_status. Also gate calling complete on packet_read_pending returning zero to avoid multiple calls to complete. (Willem de Bruijn) Move timeo computation within loop, to re-fetch the socket timeout since we also use the timeo variable to record the return code from the wait_for_complete call (Neil Horman) V3->V4: Willem has requested that the control flow be restored to the previous state. Doing so lets us eliminate the need for the po->wait_on_complete flag variable, and lets us get rid of the packet_next_frame function, but introduces another complexity. Specifically, but using the packet pending count, we can, if an applications calls sendmsg multiple times with MSG_DONTWAIT set, each set of transmitted frames, when complete, will cause tpacket_destruct_skb to issue a complete call, for which there will never be a wait_on_completion call. This imbalance will lead to any future call to wait_for_completion here to return early, when the frames they sent may not have completed. To correct this, we need to re-init the completion queue on every call to tpacket_snd before we enter the loop so as to ensure we wait properly for the frames we send in this iteration. Change the timeout and interrupted gotos to out_put rather than out_status so that we don't try to free a non-existant skb Clean up some extra newlines (Willem de Bruijn) Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Neil Horman <[email protected]> Reported-by: Matteo Croce <[email protected]> Signed-off-by: David S. Miller <[email protected]>
1 parent 25bff6d commit 89ed5b5

File tree

2 files changed

+18
-3
lines changed

2 files changed

+18
-3
lines changed

net/packet/af_packet.c

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2401,6 +2401,9 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
24012401

24022402
ts = __packet_set_timestamp(po, ph, skb);
24032403
__packet_set_status(po, ph, TP_STATUS_AVAILABLE | ts);
2404+
2405+
if (!packet_read_pending(&po->tx_ring))
2406+
complete(&po->skb_completion);
24042407
}
24052408

24062409
sock_wfree(skb);
@@ -2585,7 +2588,7 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame,
25852588

25862589
static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
25872590
{
2588-
struct sk_buff *skb;
2591+
struct sk_buff *skb = NULL;
25892592
struct net_device *dev;
25902593
struct virtio_net_hdr *vnet_hdr = NULL;
25912594
struct sockcm_cookie sockc;
@@ -2600,6 +2603,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
26002603
int len_sum = 0;
26012604
int status = TP_STATUS_AVAILABLE;
26022605
int hlen, tlen, copylen = 0;
2606+
long timeo = 0;
26032607

26042608
mutex_lock(&po->pg_vec_lock);
26052609

@@ -2646,12 +2650,21 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
26462650
if ((size_max > dev->mtu + reserve + VLAN_HLEN) && !po->has_vnet_hdr)
26472651
size_max = dev->mtu + reserve + VLAN_HLEN;
26482652

2653+
reinit_completion(&po->skb_completion);
2654+
26492655
do {
26502656
ph = packet_current_frame(po, &po->tx_ring,
26512657
TP_STATUS_SEND_REQUEST);
26522658
if (unlikely(ph == NULL)) {
2653-
if (need_wait && need_resched())
2654-
schedule();
2659+
if (need_wait && skb) {
2660+
timeo = sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT);
2661+
timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
2662+
if (timeo <= 0) {
2663+
err = !timeo ? -ETIMEDOUT : -ERESTARTSYS;
2664+
goto out_put;
2665+
}
2666+
}
2667+
/* check for additional frames */
26552668
continue;
26562669
}
26572670

@@ -3207,6 +3220,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
32073220
sock_init_data(sock, sk);
32083221

32093222
po = pkt_sk(sk);
3223+
init_completion(&po->skb_completion);
32103224
sk->sk_family = PF_PACKET;
32113225
po->num = proto;
32123226
po->xmit = dev_queue_xmit;

net/packet/internal.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,7 @@ struct packet_sock {
128128
unsigned int tp_hdrlen;
129129
unsigned int tp_reserve;
130130
unsigned int tp_tstamp;
131+
struct completion skb_completion;
131132
struct net_device __rcu *cached_dev;
132133
int (*xmit)(struct sk_buff *skb);
133134
struct packet_type prot_hook ____cacheline_aligned_in_smp;

0 commit comments

Comments
 (0)