Skip to content

Commit f4ce91c

Browse files
nealcardwelldavem330
authored andcommitted
tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited
This commit fixes a bug in the tracking of max_packets_out and is_cwnd_limited. This bug can cause the connection to fail to remember that is_cwnd_limited is true, causing the connection to fail to grow cwnd when it should, causing throughput to be lower than it should be. The following event sequence is an example that triggers the bug: (a) The connection is cwnd_limited, but packets_out is not at its peak due to TSO deferral deciding not to send another skb yet. In such cases the connection can advance max_packets_seq and set tp->is_cwnd_limited to true and max_packets_out to a small number. (b) Then later in the round trip the connection is pacing-limited (not cwnd-limited), and packets_out is larger. In such cases the connection would raise max_packets_out to a bigger number but (unexpectedly) flip tp->is_cwnd_limited from true to false. This commit fixes that bug. One straightforward fix would be to separately track (a) the next window after max_packets_out reaches a maximum, and (b) the next window after tp->is_cwnd_limited is set to true. But this would require consuming an extra u32 sequence number. Instead, to save space we track only the most important information. Specifically, we track the strongest available signal of the degree to which the cwnd is fully utilized: (1) If the connection is cwnd-limited then we remember that fact for the current window. (2) If the connection not cwnd-limited then we track the maximum number of outstanding packets in the current window. In particular, note that the new logic cannot trigger the buggy (a)/(b) sequence above because with the new logic a condition where tp->packets_out > tp->max_packets_out can only trigger an update of tp->is_cwnd_limited if tp->is_cwnd_limited is false. This first showed up in a testing of a BBRv2 dev branch, but this buggy behavior highlighted a general issue with the tcp_cwnd_validate() logic that can cause cwnd to fail to increase at the proper rate for any TCP congestion control, including Reno or CUBIC. Fixes: ca8a226 ("tcp: make cwnd-limited checks measurement-based, and gentler") Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Kevin(Yudong) Yang <[email protected]> Signed-off-by: Yuchung Cheng <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
1 parent 022152a commit f4ce91c

File tree

4 files changed

+19
-9
lines changed

4 files changed

+19
-9
lines changed

include/linux/tcp.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -295,7 +295,7 @@ struct tcp_sock {
295295
u32 packets_out; /* Packets which are "in flight" */
296296
u32 retrans_out; /* Retransmitted packets out */
297297
u32 max_packets_out; /* max packets_out in last window */
298-
u32 max_packets_seq; /* right edge of max_packets_out flight */
298+
u32 cwnd_usage_seq; /* right edge of cwnd usage tracking flight */
299299

300300
u16 urg_data; /* Saved octet of OOB data and control flags */
301301
u8 ecn_flags; /* ECN status bits. */

include/net/tcp.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1295,11 +1295,14 @@ static inline bool tcp_is_cwnd_limited(const struct sock *sk)
12951295
{
12961296
const struct tcp_sock *tp = tcp_sk(sk);
12971297

1298+
if (tp->is_cwnd_limited)
1299+
return true;
1300+
12981301
/* If in slow start, ensure cwnd grows to twice what was ACKed. */
12991302
if (tcp_in_slow_start(tp))
13001303
return tcp_snd_cwnd(tp) < 2 * tp->max_packets_out;
13011304

1302-
return tp->is_cwnd_limited;
1305+
return false;
13031306
}
13041307

13051308
/* BBR congestion control needs pacing.

net/ipv4/tcp.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3137,6 +3137,8 @@ int tcp_disconnect(struct sock *sk, int flags)
31373137
tp->snd_ssthresh = TCP_INFINITE_SSTHRESH;
31383138
tcp_snd_cwnd_set(tp, TCP_INIT_CWND);
31393139
tp->snd_cwnd_cnt = 0;
3140+
tp->is_cwnd_limited = 0;
3141+
tp->max_packets_out = 0;
31403142
tp->window_clamp = 0;
31413143
tp->delivered = 0;
31423144
tp->delivered_ce = 0;

net/ipv4/tcp_output.c

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1875,15 +1875,20 @@ static void tcp_cwnd_validate(struct sock *sk, bool is_cwnd_limited)
18751875
const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
18761876
struct tcp_sock *tp = tcp_sk(sk);
18771877

1878-
/* Track the maximum number of outstanding packets in each
1879-
* window, and remember whether we were cwnd-limited then.
1878+
/* Track the strongest available signal of the degree to which the cwnd
1879+
* is fully utilized. If cwnd-limited then remember that fact for the
1880+
* current window. If not cwnd-limited then track the maximum number of
1881+
* outstanding packets in the current window. (If cwnd-limited then we
1882+
* chose to not update tp->max_packets_out to avoid an extra else
1883+
* clause with no functional impact.)
18801884
*/
1881-
if (!before(tp->snd_una, tp->max_packets_seq) ||
1882-
tp->packets_out > tp->max_packets_out ||
1883-
is_cwnd_limited) {
1884-
tp->max_packets_out = tp->packets_out;
1885-
tp->max_packets_seq = tp->snd_nxt;
1885+
if (!before(tp->snd_una, tp->cwnd_usage_seq) ||
1886+
is_cwnd_limited ||
1887+
(!tp->is_cwnd_limited &&
1888+
tp->packets_out > tp->max_packets_out)) {
18861889
tp->is_cwnd_limited = is_cwnd_limited;
1890+
tp->max_packets_out = tp->packets_out;
1891+
tp->cwnd_usage_seq = tp->snd_nxt;
18871892
}
18881893

18891894
if (tcp_is_cwnd_limited(sk)) {

0 commit comments

Comments
 (0)