|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Summer update and MPTCP features in Linux v6.18" |
| 4 | +--- |
| 5 | + |
| 6 | +*Long time no see* (or *read*?) as we could say! The [last update]({% post_url |
| 7 | +2025-01-27-one-year-NGI0-Core %}) was in January. Since then, we have been very |
| 8 | +busy! Read on to find out what happened around MPTCP during the last few months, |
| 9 | +and which new features will be present in the future v6.18. |
| 10 | + |
| 11 | +<!--more--> |
| 12 | + |
| 13 | +## Activities |
| 14 | + |
| 15 | +In March, I was at [Netdev 0x19](https://netdevconf.info/0x19/) in Zagreb, and I |
| 16 | +had a BoF session: [MPTCP: present, future, and its development workflow |
| 17 | +(CI)](https://netdevconf.info/0x19/sessions/bof/mptcp-present-future-and-its-development-workflow-ci.html). |
| 18 | +Do not hesitate to check the [video](https://youtu.be/lo8biurYw5s) or the |
| 19 | +[slides](https://netdevconf.info/0x19/docs/netdev-0x19-paper41-talk-slides/NetdevConf%200x19%20-%20MPTCP.pdf). |
| 20 | +This session covered different aspects about MPTCP: what is MPTCP, its |
| 21 | +use-cases, and the different components. Then how easy it is to use MPTCP today |
| 22 | +with a recent and up-to-date Linux environment. There were some words about the |
| 23 | +current status, what was planned, and some discussions. There was also a second |
| 24 | +part about the development workflow, and how a CI with a specific setup can |
| 25 | +greatly help! |
| 26 | + |
| 27 | +Soon after, I started a temporally part-time contract at |
| 28 | +[UCLouvain](https://www.uclouvain.be/en), as a Research assistant in the [IP |
| 29 | +Networking Lab](https://inl.info.ucl.ac.be). That was a great opportunity to |
| 30 | +work with excellent colleagues, learn more about current research in the |
| 31 | +academic world, contribute to |
| 32 | +[different](https://dial.uclouvain.be/pr/boreal/object/boreal:303829) |
| 33 | +[scientific](https://datatracker.ietf.org/meeting/123/materials/slides-123-tcpm-optimistic-ack-attack-00) |
| 34 | +[research](https://datatracker.ietf.org/doc/draft-baerts-tcpm-mptcpext/). It was |
| 35 | +also a way to get financial support for the MPTCP maintenance, plus access to |
| 36 | +some servers to run [SyzKaller](https://github.com/google/syzkaller/), an |
| 37 | +excellent kernel fuzzer, to continue finding bugs in the current implementation. |
| 38 | + |
| 39 | +A few months ago, I got a mission to find a solution for middleboxes |
| 40 | +intercepting TCP connections, and thus forcing MPTCP to fallback to "plain" TCP. |
| 41 | +This resulted in the [TCP-in-UDP](https://github.com/multipath-tcp/tcp-in-udp) |
| 42 | +eBPF program. Please check the [dedicated blog post]({% post_url |
| 43 | +2025-07-14-TCP-in-UDP %}) for more details about that. |
| 44 | + |
| 45 | +In July, I |
| 46 | +[presented](https://datatracker.ietf.org/meeting/123/materials/slides-123-tcpm-mptcp-extensions-00) |
| 47 | +two unrelated and independent extensions to the MPTCP protocol. The first one |
| 48 | +extends the Data-Level Length (DLL) size to allow MPTCP packets of more than 64 |
| 49 | +KB, mainly to allow internal egress packets of more than 64 KB, and improve |
| 50 | +performances in a data centre. It can also be helpful when [IPv6 |
| 51 | +jumbograms](https://datatracker.ietf.org/doc/html/rfc2675) packets are used. See |
| 52 | +this [draft](https://datatracker.ietf.org/doc/draft-baerts-tcpm-mptcpdss/) for |
| 53 | +more details. The second extension suggests using application-level keys to |
| 54 | +better secure MPTCP when establishing new subflows, announcing addresses, and |
| 55 | +resetting connections. See this other |
| 56 | +[draft](https://datatracker.ietf.org/doc/draft-baerts-tcpm-mptcpext/) for more |
| 57 | +explanations about this idea. If you know a company or an actor present on the |
| 58 | +Internet interested in these extensions and can help to push them to be |
| 59 | +accepted, feel free to contact me. |
| 60 | + |
| 61 | +Finally, it is important to note that more funding around MPTCP recently got |
| 62 | +[accepted](https://nlnet.nl/project/MPTCP-C-Flag/)! 🎉 Thanks again |
| 63 | +[NLnet](https://nlnet.nl) for your invaluable your support! |
| 64 | + |
| 65 | + |
| 66 | +## New features |
| 67 | + |
| 68 | +### Better `MPCapable`'s C-flag support on the client side |
| 69 | + |
| 70 | +The MPTCP protocol and its implementation in the Linux kernel support |
| 71 | +deployments behind load-balancers. This is typically used by CDNs. When a |
| 72 | +layer-4 load-balancer is in place, it means a connection will be handled by one |
| 73 | +server out of many placed behind it. In other words, it means multiple servers |
| 74 | +are accepting connections to the same IP address and port. An MPTCP connection |
| 75 | +can be composed of ... multiple TCP subflows (path), and it is important to make |
| 76 | +sure new path requests (`MPJoin`) reach the right end-server. If such path |
| 77 | +request is sent to the original IP address and port, there is a high change the |
| 78 | +load-balancers will route it to a different end-server. To cope with that, the |
| 79 | +[MPTCP protocol](https://datatracker.ietf.org/doc/html/rfc8684#section-3.1-20.6) |
| 80 | +allows a host to set a flag (C-flag) in the connection request (`MPCapable`) to |
| 81 | +tell the receiver it cannot try to open any additional subflows toward this |
| 82 | +address and port. Instead, the same host will announce a unique IP address and |
| 83 | +port that can be used to reach the right end-server. For more details about this |
| 84 | +case, please see this page: [Deployment behind a load |
| 85 | +balancer](https://www.mptcp.dev/load-balancer.html) |
| 86 | + |
| 87 | +The implementation on the server side has been supported for a few years now on |
| 88 | +Linux, and is already well-used. A server simply has to set the |
| 89 | +[`net.mptcp.allow_join_initial_addr_port`](https://docs.kernel.org/networking/mptcp-sysctl.html) |
| 90 | +sysctl knob to `0`, and add a `signal` MPTCP endpoint with a dedicated IP |
| 91 | +address and an optional port. |
| 92 | + |
| 93 | +So far, it looks like this setup was mainly used when interacting with iOS |
| 94 | +devices, so not using the Linux kernel on the client side then. On this side, |
| 95 | +the in-kernel path-manager will respect the C flag by not establishing new paths |
| 96 | +to the initial address, but that was it. By default, in such situations with the |
| 97 | +C flag and the in-kernel path-manager, if a client has multiple interfaces, the |
| 98 | +non-primary ones were *not* being used to establish extra paths. This was not |
| 99 | +done because the extra interfaces are by default only used to create new paths |
| 100 | +to the initial address of the server, not allowed in this case. This was not |
| 101 | +good behaviour. A [fix](https://git.kernel.org/torvalds/c/4b1ff850e0c1) has been |
| 102 | +recently sent to improve this situation. Now, in this particular case, the |
| 103 | +in-kernel path-manager considers using the other [MPTCP |
| 104 | +endpoints](https://www.mptcp.dev/pm.html) to establish new paths to the |
| 105 | +announced address. |
| 106 | + |
| 107 | +With the userspace path-manager, the userspace daemon didn't know when the other |
| 108 | +peer has set this C-flag. That means it was not able to respect the protocol |
| 109 | +when it is set. The kernel [now](https://git.kernel.org/torvalds/c/2293c57484ae) |
| 110 | +announces this info, and the "official" userspace daemon (`mptcpd`) will support |
| 111 | +it [soon](https://github.com/multipath-tcp/mptcpd/pull/323). |
| 112 | + |
| 113 | +### New `laminar` endpoints |
| 114 | + |
| 115 | +Up to Linux v6.18, upon the reception of an `ADD_ADDR` (and when the `fullmesh` |
| 116 | +flag was not used), the in-kernel PM was only creating new subflows using the |
| 117 | +local address picked by the routing configuration. That works well when the |
| 118 | +announced addresses can be predicted, but not on the Internet with servers |
| 119 | +controlled by someone else. Instead, it is easier to pick local addresses from a |
| 120 | +selected list of endpoints, and use them only once, than relying on routing |
| 121 | +rules. `laminar` endpoints have been |
| 122 | +[added](https://git.kernel.org/torvalds/c/539f6b9de39e) in v6.18. |
| 123 | + |
| 124 | +In other words, on the client side, it is now recommended to set both `subflow` |
| 125 | +and `laminar` flags by default. If both the client and the server sides have |
| 126 | +multiple network interfaces they want to use, it might be interesting to use |
| 127 | +only the `laminar` flag on all client side MPTCP endpoints, and only the |
| 128 | +`signal` one on all server side MPTCP endpoints. |
| 129 | + |
| 130 | +### mptcpd: security report & improvements |
| 131 | + |
| 132 | +Thanks to the NLnet funding, [Radically Open Security |
| 133 | +B.V.](https://www.radicallyopensecurity.com) did a security review of |
| 134 | +[mptcpd](https://mptcpd.mptcp.dev). Thank you, Tim and Marcus, for this great |
| 135 | +work! No security issues have been found 🎉 |
| 136 | + |
| 137 | +The report mentioned one attention point: the plugin directory should not be |
| 138 | +world writeable, not to let other pieces of code executed with extra permissions |
| 139 | +(`CAP_NET_ADMIN`). The full report is available |
| 140 | +[here](assets/202503-mptcpd-security-report.pdf). |
| 141 | + |
| 142 | +In terms of improvements, it is good to note that mptcpd is now available in |
| 143 | +more Linux distributions: OpenWrt, Alpine Linux, NixOS, etc. A future v0.14 |
| 144 | +version is planned, and it will include some new features around `mptcpize`: |
| 145 | +setting the `GODEBUG=multipathtcp=1` environment variable, and also appending |
| 146 | +`LD_PRELOAD` if previously set, instead of overriding it. This version should |
| 147 | +also support new `laminar` endpoints, and the new `deny_join_id0` parameter. |
| 148 | + |
| 149 | +### User applications |
| 150 | + |
| 151 | +Quite a few new applications now have a dedicated option to enable MPTCP |
| 152 | +support: [IPerf3](https://github.com/esnet/iperf/pull/1661), |
| 153 | +[sing-box](https://github.com/SagerNet/sing-box/commit/1019ecfdcfb7), |
| 154 | +[Valkey](https://github.com/valkey-io/valkey/pull/1811), |
| 155 | +[FreeNginx](https://freenginx.org/hg/nginx/rev/cb20978439c8), etc. Please also |
| 156 | +note that since GoLang [1.24](https://go-review.googlesource.com/c/go/+/607715), |
| 157 | +all applications written in Go have MPTCP enabled by default on the server side! |
| 158 | +This includes Caddy, Traefik, Shadowsocks Go, and many more! |
| 159 | + |
| 160 | +### Miscellaneous |
| 161 | + |
| 162 | +When working on current and future features around the **path-manager**, a lot |
| 163 | +of **clean-ups** have been done by Geliang and me. Some were required to allow |
| 164 | +new features, but others have been also added to improve the code itself by |
| 165 | +renaming variables, splitting large functions, regrouping code per purpose, etc. |
| 166 | +This might cause a bit more of attention during the backports, but it will help |
| 167 | +with the maintenance in the long term. |
| 168 | + |
| 169 | +To help with the debugging, new **MIB counters** for the rejected `MPJoin` and |
| 170 | +for fallbacks to TCP have been added by Paolo and me. Some of them have been |
| 171 | +validated by Gang when working on improving the code coverage when running the |
| 172 | +whole test suite. |
| 173 | + |
| 174 | +The [**MPTCP CI**](https://ci-results.mptcp.dev) was taking more and more time |
| 175 | +due to the addition of new tests. To accelerate the whole process, more builders |
| 176 | +are used in parallel: now the `mptcp_join` selftest is executed in a dedicated |
| 177 | +job for the *normal* and *debug* modes. Results can now be shared after ~1h15 |
| 178 | +instead of 2h. |
| 179 | + |
| 180 | +**Performances** are being improved thanks to the work from Paolo and Christoph! |
| 181 | +More work is still ongoing, and a proper perf regression lab should be put in |
| 182 | +place soon. More explanations will be shared in a later blog post. |
| 183 | + |
| 184 | +Regarding the **socket options**, `TCP_MAXSEG` has been added by Geliang, and an |
| 185 | +MPTCP version of `SO_MAX_PACING_RATE` from Christoph is in discussion. More work |
| 186 | +will be done around the socket options to simplify the code and improve the |
| 187 | +maintenance in the long term. |
| 188 | + |
| 189 | +When an address is announced by a peer via an **`ADD_ADDR`**, the signalling |
| 190 | +packet carried in a TCP ACK can be lost. Up to v6.18, the **retransmissions** |
| 191 | +were done after a timeout controlled by the |
| 192 | +[`net.mptcp.add_addr_timeout`](https://docs.kernel.org/networking/mptcp-sysctl.html) |
| 193 | +sysctl knob. The default value is set to 2 minutes, which is a safe choice, but |
| 194 | +certainly too high for most use-cases. Geliang changed its behaviour to be used |
| 195 | +as a maximum value for the timeout, and instead, the timeout now depends on the |
| 196 | +connection's round-trip-time (RTT) to better adapt to the situation. |
| 197 | + |
| 198 | +Last but not least, thanks to Paolo for helping with some fixes, to Mat for the |
| 199 | +code review, and to everybody who have reported issues, sent fixes and promoted |
| 200 | +MPTCP! A great community! |
| 201 | + |
| 202 | + |
| 203 | +## Conclusion |
| 204 | + |
| 205 | +Quite a lot of new features and improvements will be present in the future Linux |
| 206 | +kernel LTS version (v6.18)! Looking forward for even more of them in the coming |
| 207 | +months! |
| 208 | + |
| 209 | +<br/> |
| 210 | + |
| 211 | +-------------------------------------------------------------------------------- |
| 212 | + |
| 213 | +If you like my work and wish me to continue doing so, you can become a sponsor |
| 214 | +via [LiberaPay](https://liberapay.com/matttbe), |
| 215 | +[GitHub](https://github.com/sponsors/matttbe) or |
| 216 | +[Patreon](https://patreon.com/matttbe). |
| 217 | + |
| 218 | +Please [contact me ](mailto:[email protected]) for professional collaborations, |
| 219 | +short or long missions, or for financial support for my contributions to the |
| 220 | +maintenance of MPTCP and various apps around it. |
0 commit comments