Skip to content

Commit 8f15a8e

Browse files
committed
post: new article: update and v6.18
Various updates from the last 9 months. Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
1 parent fc36936 commit 8f15a8e

File tree

2 files changed

+220
-0
lines changed

2 files changed

+220
-0
lines changed
Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
---
2+
layout: post
3+
title: "Summer update and MPTCP features in Linux v6.18"
4+
---
5+
6+
*Long time no see* (or *read*?) as we could say! The [last update]({% post_url
7+
2025-01-27-one-year-NGI0-Core %}) was in January. Since then, we have been very
8+
busy! Read on to find out what happened around MPTCP during the last few months,
9+
and which new features will be present in the future v6.18.
10+
11+
<!--more-->
12+
13+
## Activities
14+
15+
In March, I was at [Netdev 0x19](https://netdevconf.info/0x19/) in Zagreb, and I
16+
had a BoF session: [MPTCP: present, future, and its development workflow
17+
(CI)](https://netdevconf.info/0x19/sessions/bof/mptcp-present-future-and-its-development-workflow-ci.html).
18+
Do not hesitate to check the [video](https://youtu.be/lo8biurYw5s) or the
19+
[slides](https://netdevconf.info/0x19/docs/netdev-0x19-paper41-talk-slides/NetdevConf%200x19%20-%20MPTCP.pdf).
20+
This session covered different aspects about MPTCP: what is MPTCP, its
21+
use-cases, and the different components. Then how easy it is to use MPTCP today
22+
with a recent and up-to-date Linux environment. There were some words about the
23+
current status, what was planned, and some discussions. There was also a second
24+
part about the development workflow, and how a CI with a specific setup can
25+
greatly help!
26+
27+
Soon after, I started a temporally part-time contract at
28+
[UCLouvain](https://www.uclouvain.be/en), as a Research assistant in the [IP
29+
Networking Lab](https://inl.info.ucl.ac.be). That was a great opportunity to
30+
work with excellent colleagues, learn more about current research in the
31+
academic world, contribute to
32+
[different](https://dial.uclouvain.be/pr/boreal/object/boreal:303829)
33+
[scientific](https://datatracker.ietf.org/meeting/123/materials/slides-123-tcpm-optimistic-ack-attack-00)
34+
[research](https://datatracker.ietf.org/doc/draft-baerts-tcpm-mptcpext/). It was
35+
also a way to get financial support for the MPTCP maintenance, plus access to
36+
some servers to run [SyzKaller](https://github.com/google/syzkaller/), an
37+
excellent kernel fuzzer, to continue finding bugs in the current implementation.
38+
39+
A few months ago, I got a mission to find a solution for middleboxes
40+
intercepting TCP connections, and thus forcing MPTCP to fallback to "plain" TCP.
41+
This resulted in the [TCP-in-UDP](https://github.com/multipath-tcp/tcp-in-udp)
42+
eBPF program. Please check the [dedicated blog post]({% post_url
43+
2025-07-14-TCP-in-UDP %}) for more details about that.
44+
45+
In July, I
46+
[presented](https://datatracker.ietf.org/meeting/123/materials/slides-123-tcpm-mptcp-extensions-00)
47+
two unrelated and independent extensions to the MPTCP protocol. The first one
48+
extends the Data-Level Length (DLL) size to allow MPTCP packets of more than 64
49+
KB, mainly to allow internal egress packets of more than 64 KB, and improve
50+
performances in a data centre. It can also be helpful when [IPv6
51+
jumbograms](https://datatracker.ietf.org/doc/html/rfc2675) packets are used. See
52+
this [draft](https://datatracker.ietf.org/doc/draft-baerts-tcpm-mptcpdss/) for
53+
more details. The second extension suggests using application-level keys to
54+
better secure MPTCP when establishing new subflows, announcing addresses, and
55+
resetting connections. See this other
56+
[draft](https://datatracker.ietf.org/doc/draft-baerts-tcpm-mptcpext/) for more
57+
explanations about this idea. If you know a company or an actor present on the
58+
Internet interested in these extensions and can help to push them to be
59+
accepted, feel free to contact me.
60+
61+
Finally, it is important to note that more funding around MPTCP recently got
62+
[accepted](https://nlnet.nl/project/MPTCP-C-Flag/)! 🎉 Thanks again
63+
[NLnet](https://nlnet.nl) for your invaluable your support!
64+
65+
66+
## New features
67+
68+
### Better `MPCapable`'s C-flag support on the client side
69+
70+
The MPTCP protocol and its implementation in the Linux kernel support
71+
deployments behind load-balancers. This is typically used by CDNs. When a
72+
layer-4 load-balancer is in place, it means a connection will be handled by one
73+
server out of many placed behind it. In other words, it means multiple servers
74+
are accepting connections to the same IP address and port. An MPTCP connection
75+
can be composed of ... multiple TCP subflows (path), and it is important to make
76+
sure new path requests (`MPJoin`) reach the right end-server. If such path
77+
request is sent to the original IP address and port, there is a high change the
78+
load-balancers will route it to a different end-server. To cope with that, the
79+
[MPTCP protocol](https://datatracker.ietf.org/doc/html/rfc8684#section-3.1-20.6)
80+
allows a host to set a flag (C-flag) in the connection request (`MPCapable`) to
81+
tell the receiver it cannot try to open any additional subflows toward this
82+
address and port. Instead, the same host will announce a unique IP address and
83+
port that can be used to reach the right end-server. For more details about this
84+
case, please see this page: [Deployment behind a load
85+
balancer](https://www.mptcp.dev/load-balancer.html)
86+
87+
The implementation on the server side has been supported for a few years now on
88+
Linux, and is already well-used. A server simply has to set the
89+
[`net.mptcp.allow_join_initial_addr_port`](https://docs.kernel.org/networking/mptcp-sysctl.html)
90+
sysctl knob to `0`, and add a `signal` MPTCP endpoint with a dedicated IP
91+
address and an optional port.
92+
93+
So far, it looks like this setup was mainly used when interacting with iOS
94+
devices, so not using the Linux kernel on the client side then. On this side,
95+
the in-kernel path-manager will respect the C flag by not establishing new paths
96+
to the initial address, but that was it. By default, in such situations with the
97+
C flag and the in-kernel path-manager, if a client has multiple interfaces, the
98+
non-primary ones were *not* being used to establish extra paths. This was not
99+
done because the extra interfaces are by default only used to create new paths
100+
to the initial address of the server, not allowed in this case. This was not
101+
good behaviour. A [fix](https://git.kernel.org/torvalds/c/4b1ff850e0c1) has been
102+
recently sent to improve this situation. Now, in this particular case, the
103+
in-kernel path-manager considers using the other [MPTCP
104+
endpoints](https://www.mptcp.dev/pm.html) to establish new paths to the
105+
announced address.
106+
107+
With the userspace path-manager, the userspace daemon didn't know when the other
108+
peer has set this C-flag. That means it was not able to respect the protocol
109+
when it is set. The kernel [now](https://git.kernel.org/torvalds/c/2293c57484ae)
110+
announces this info, and the "official" userspace daemon (`mptcpd`) will support
111+
it [soon](https://github.com/multipath-tcp/mptcpd/pull/323).
112+
113+
### New `laminar` endpoints
114+
115+
Up to Linux v6.18, upon the reception of an `ADD_ADDR` (and when the `fullmesh`
116+
flag was not used), the in-kernel PM was only creating new subflows using the
117+
local address picked by the routing configuration. That works well when the
118+
announced addresses can be predicted, but not on the Internet with servers
119+
controlled by someone else. Instead, it is easier to pick local addresses from a
120+
selected list of endpoints, and use them only once, than relying on routing
121+
rules. `laminar` endpoints have been
122+
[added](https://git.kernel.org/torvalds/c/539f6b9de39e) in v6.18.
123+
124+
In other words, on the client side, it is now recommended to set both `subflow`
125+
and `laminar` flags by default. If both the client and the server sides have
126+
multiple network interfaces they want to use, it might be interesting to use
127+
only the `laminar` flag on all client side MPTCP endpoints, and only the
128+
`signal` one on all server side MPTCP endpoints.
129+
130+
### mptcpd: security report & improvements
131+
132+
Thanks to the NLnet funding, [Radically Open Security
133+
B.V.](https://www.radicallyopensecurity.com) did a security review of
134+
[mptcpd](https://mptcpd.mptcp.dev). Thank you, Tim and Marcus, for this great
135+
work! No security issues have been found 🎉
136+
137+
The report mentioned one attention point: the plugin directory should not be
138+
world writeable, not to let other pieces of code executed with extra permissions
139+
(`CAP_NET_ADMIN`). The full report is available
140+
[here](assets/202503-mptcpd-security-report.pdf).
141+
142+
In terms of improvements, it is good to note that mptcpd is now available in
143+
more Linux distributions: OpenWrt, Alpine Linux, NixOS, etc. A future v0.14
144+
version is planned, and it will include some new features around `mptcpize`:
145+
setting the `GODEBUG=multipathtcp=1` environment variable, and also appending
146+
`LD_PRELOAD` if previously set, instead of overriding it. This version should
147+
also support new `laminar` endpoints, and the new `deny_join_id0` parameter.
148+
149+
### User applications
150+
151+
Quite a few new applications now have a dedicated option to enable MPTCP
152+
support: [IPerf3](https://github.com/esnet/iperf/pull/1661),
153+
[sing-box](https://github.com/SagerNet/sing-box/commit/1019ecfdcfb7),
154+
[Valkey](https://github.com/valkey-io/valkey/pull/1811),
155+
[FreeNginx](https://freenginx.org/hg/nginx/rev/cb20978439c8), etc. Please also
156+
note that since GoLang [1.24](https://go-review.googlesource.com/c/go/+/607715),
157+
all applications written in Go have MPTCP enabled by default on the server side!
158+
This includes Caddy, Traefik, Shadowsocks Go, and many more!
159+
160+
### Miscellaneous
161+
162+
When working on current and future features around the **path-manager**, a lot
163+
of **clean-ups** have been done by Geliang and me. Some were required to allow
164+
new features, but others have been also added to improve the code itself by
165+
renaming variables, splitting large functions, regrouping code per purpose, etc.
166+
This might cause a bit more of attention during the backports, but it will help
167+
with the maintenance in the long term.
168+
169+
To help with the debugging, new **MIB counters** for the rejected `MPJoin` and
170+
for fallbacks to TCP have been added by Paolo and me. Some of them have been
171+
validated by Gang when working on improving the code coverage when running the
172+
whole test suite.
173+
174+
The [**MPTCP CI**](https://ci-results.mptcp.dev) was taking more and more time
175+
due to the addition of new tests. To accelerate the whole process, more builders
176+
are used in parallel: now the `mptcp_join` selftest is executed in a dedicated
177+
job for the *normal* and *debug* modes. Results can now be shared after ~1h15
178+
instead of 2h.
179+
180+
**Performances** are being improved thanks to the work from Paolo and Christoph!
181+
More work is still ongoing, and a proper perf regression lab should be put in
182+
place soon. More explanations will be shared in a later blog post.
183+
184+
Regarding the **socket options**, `TCP_MAXSEG` has been added by Geliang, and an
185+
MPTCP version of `SO_MAX_PACING_RATE` from Christoph is in discussion. More work
186+
will be done around the socket options to simplify the code and improve the
187+
maintenance in the long term.
188+
189+
When an address is announced by a peer via an **`ADD_ADDR`**, the signalling
190+
packet carried in a TCP ACK can be lost. Up to v6.18, the **retransmissions**
191+
were done after a timeout controlled by the
192+
[`net.mptcp.add_addr_timeout`](https://docs.kernel.org/networking/mptcp-sysctl.html)
193+
sysctl knob. The default value is set to 2 minutes, which is a safe choice, but
194+
certainly too high for most use-cases. Geliang changed its behaviour to be used
195+
as a maximum value for the timeout, and instead, the timeout now depends on the
196+
connection's round-trip-time (RTT) to better adapt to the situation.
197+
198+
Last but not least, thanks to Paolo for helping with some fixes, to Mat for the
199+
code review, and to everybody who have reported issues, sent fixes and promoted
200+
MPTCP! A great community!
201+
202+
203+
## Conclusion
204+
205+
Quite a lot of new features and improvements will be present in the future Linux
206+
kernel LTS version (v6.18)! Looking forward for even more of them in the coming
207+
months!
208+
209+
<br/>
210+
211+
--------------------------------------------------------------------------------
212+
213+
If you like my work and wish me to continue doing so, you can become a sponsor
214+
via [LiberaPay](https://liberapay.com/matttbe),
215+
[GitHub](https://github.com/sponsors/matttbe) or
216+
[Patreon](https://patreon.com/matttbe).
217+
218+
Please [contact me](mailto:[email protected]) for professional collaborations,
219+
short or long missions, or for financial support for my contributions to the
220+
maintenance of MPTCP and various apps around it.
345 KB
Binary file not shown.

0 commit comments

Comments
 (0)