Skip to content

Commit 803381f

Browse files
committed
Merge branch 'icmp-account-for-NAT-when-sending-icmps-from-ndo-layer'
Jason A. Donenfeld says: ==================== icmp: account for NAT when sending icmps from ndo layer The ICMP routines use the source address for two reasons: 1. Rate-limiting ICMP transmissions based on source address, so that one source address cannot provoke a flood of replies. If the source address is wrong, the rate limiting will be incorrectly applied. 2. Choosing the interface and hence new source address of the generated ICMP packet. If the original packet source address is wrong, ICMP replies will be sent from the wrong source address, resulting in either a misdelivery, infoleak, or just general network admin confusion. Most of the time, the icmp_send and icmpv6_send routines can just reach down into the skb's IP header to determine the saddr. However, if icmp_send or icmpv6_send is being called from a network device driver -- there are a few in the tree -- then it's possible that by the time icmp_send or icmpv6_send looks at the packet, the packet's source address has already been transformed by SNAT or MASQUERADE or some other transformation that CONNTRACK knows about. In this case, the packet's source address is most certainly the *wrong* source address to be used for the purpose of ICMP replies. Rather, the source address we want to use for ICMP replies is the original one, from before the transformation occurred. Fortunately, it's very easy to just ask CONNTRACK if it knows about this packet, and if so, how to fix it up. The saddr is the only field in the header we need to fix up, for the purposes of the subsequent processing in the icmp_send and icmpv6_send functions, so we do the lookup very early on, so that the rest of the ICMP machinery can progress as usual. Changes v3->v4: - Add back the skb_shared checking, since the previous assumption isn't actually true [Eric]. This implies dropping the additional patches v3 had for removing skb_share_check from various drivers. We can revisit that general set of ideas later, but that's probably better suited as a net-next patchset rather than this stable one which is geared at fixing bugs. So, this implements things in the safe conservative way. Changes v2->v3: - Add selftest to ensure this actually does what we want and never regresses. - Check the size of the skb header before operating on it. - Use skb_ensure_writable to ensure we can modify the cloned skb [Florian]. - Conditionalize this on IPS_SRC_NAT so we don't do anything unnecessarily [Florian]. - It turns out that since we're calling these from the xmit path, skb_share_check isn't required, so remove that [Florian]. This simplifes the code a bit too. **The supposition here is that skbs passed to ndo_start_xmit are _never_ shared. If this is not correct NOW IS THE TIME TO PIPE UP, for doom awaits us later.** - While investigating the shared skb business, several drivers appeared to be calling it incorrectly in the xmit path, so this series also removes those unnecessary calls, based on the supposition mentioned in the previous point. Changes v1->v2: - icmpv6 takes subtly different types than icmpv4, like u32 instead of be32, u8 instead of int. - Since we're technically writing to the skb, we need to make sure it's not a shared one [Dave, 2017]. - Restore the original skb data after icmp_send returns. All current users are freeing the packet right after, so it doesn't matter, but future users might not. - Remove superfluous route lookup in sunvnet [Dave]. - Use NF_NAT instead of NF_CONNTRACK for condition [Florian]. - Include this cover letter [Dave]. ==================== Signed-off-by: David S. Miller <[email protected]>
2 parents 07134cf + 45942ba commit 803381f

File tree

9 files changed

+101
-26
lines changed

9 files changed

+101
-26
lines changed

drivers/net/ethernet/sun/sunvnet_common.c

Lines changed: 4 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1350,27 +1350,12 @@ sunvnet_start_xmit_common(struct sk_buff *skb, struct net_device *dev,
13501350
if (vio_version_after_eq(&port->vio, 1, 3))
13511351
localmtu -= VLAN_HLEN;
13521352

1353-
if (skb->protocol == htons(ETH_P_IP)) {
1354-
struct flowi4 fl4;
1355-
struct rtable *rt = NULL;
1356-
1357-
memset(&fl4, 0, sizeof(fl4));
1358-
fl4.flowi4_oif = dev->ifindex;
1359-
fl4.flowi4_tos = RT_TOS(ip_hdr(skb)->tos);
1360-
fl4.daddr = ip_hdr(skb)->daddr;
1361-
fl4.saddr = ip_hdr(skb)->saddr;
1362-
1363-
rt = ip_route_output_key(dev_net(dev), &fl4);
1364-
if (!IS_ERR(rt)) {
1365-
skb_dst_set(skb, &rt->dst);
1366-
icmp_send(skb, ICMP_DEST_UNREACH,
1367-
ICMP_FRAG_NEEDED,
1368-
htonl(localmtu));
1369-
}
1370-
}
1353+
if (skb->protocol == htons(ETH_P_IP))
1354+
icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
1355+
htonl(localmtu));
13711356
#if IS_ENABLED(CONFIG_IPV6)
13721357
else if (skb->protocol == htons(ETH_P_IPV6))
1373-
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, localmtu);
1358+
icmpv6_ndo_send(skb, ICMPV6_PKT_TOOBIG, 0, localmtu);
13741359
#endif
13751360
goto out_dropped;
13761361
}

drivers/net/gtp.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -546,8 +546,8 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
546546
mtu < ntohs(iph->tot_len)) {
547547
netdev_dbg(dev, "packet too big, fragmentation needed\n");
548548
memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
549-
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
550-
htonl(mtu));
549+
icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
550+
htonl(mtu));
551551
goto err_rt;
552552
}
553553

drivers/net/wireguard/device.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -203,9 +203,9 @@ static netdev_tx_t wg_xmit(struct sk_buff *skb, struct net_device *dev)
203203
err:
204204
++dev->stats.tx_errors;
205205
if (skb->protocol == htons(ETH_P_IP))
206-
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
206+
icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
207207
else if (skb->protocol == htons(ETH_P_IPV6))
208-
icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0);
208+
icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0);
209209
kfree_skb(skb);
210210
return ret;
211211
}

include/linux/icmpv6.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,12 @@ static inline void icmpv6_send(struct sk_buff *skb,
3131
}
3232
#endif
3333

34+
#if IS_ENABLED(CONFIG_NF_NAT)
35+
void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info);
36+
#else
37+
#define icmpv6_ndo_send icmpv6_send
38+
#endif
39+
3440
extern int icmpv6_init(void);
3541
extern int icmpv6_err_convert(u8 type, u8 code,
3642
int *err);

include/net/icmp.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,12 @@ static inline void icmp_send(struct sk_buff *skb_in, int type, int code, __be32
4343
__icmp_send(skb_in, type, code, info, &IPCB(skb_in)->opt);
4444
}
4545

46+
#if IS_ENABLED(CONFIG_NF_NAT)
47+
void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info);
48+
#else
49+
#define icmp_ndo_send icmp_send
50+
#endif
51+
4652
int icmp_rcv(struct sk_buff *skb);
4753
int icmp_err(struct sk_buff *skb, u32 info);
4854
int icmp_init(void);

net/ipv4/icmp.c

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -748,6 +748,39 @@ out:;
748748
}
749749
EXPORT_SYMBOL(__icmp_send);
750750

751+
#if IS_ENABLED(CONFIG_NF_NAT)
752+
#include <net/netfilter/nf_conntrack.h>
753+
void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
754+
{
755+
struct sk_buff *cloned_skb = NULL;
756+
enum ip_conntrack_info ctinfo;
757+
struct nf_conn *ct;
758+
__be32 orig_ip;
759+
760+
ct = nf_ct_get(skb_in, &ctinfo);
761+
if (!ct || !(ct->status & IPS_SRC_NAT)) {
762+
icmp_send(skb_in, type, code, info);
763+
return;
764+
}
765+
766+
if (skb_shared(skb_in))
767+
skb_in = cloned_skb = skb_clone(skb_in, GFP_ATOMIC);
768+
769+
if (unlikely(!skb_in || skb_network_header(skb_in) < skb_in->head ||
770+
(skb_network_header(skb_in) + sizeof(struct iphdr)) >
771+
skb_tail_pointer(skb_in) || skb_ensure_writable(skb_in,
772+
skb_network_offset(skb_in) + sizeof(struct iphdr))))
773+
goto out;
774+
775+
orig_ip = ip_hdr(skb_in)->saddr;
776+
ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip;
777+
icmp_send(skb_in, type, code, info);
778+
ip_hdr(skb_in)->saddr = orig_ip;
779+
out:
780+
consume_skb(cloned_skb);
781+
}
782+
EXPORT_SYMBOL(icmp_ndo_send);
783+
#endif
751784

752785
static void icmp_socket_deliver(struct sk_buff *skb, u32 info)
753786
{

net/ipv6/ip6_icmp.c

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,4 +45,38 @@ void icmpv6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info)
4545
rcu_read_unlock();
4646
}
4747
EXPORT_SYMBOL(icmpv6_send);
48+
49+
#if IS_ENABLED(CONFIG_NF_NAT)
50+
#include <net/netfilter/nf_conntrack.h>
51+
void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info)
52+
{
53+
struct sk_buff *cloned_skb = NULL;
54+
enum ip_conntrack_info ctinfo;
55+
struct in6_addr orig_ip;
56+
struct nf_conn *ct;
57+
58+
ct = nf_ct_get(skb_in, &ctinfo);
59+
if (!ct || !(ct->status & IPS_SRC_NAT)) {
60+
icmpv6_send(skb_in, type, code, info);
61+
return;
62+
}
63+
64+
if (skb_shared(skb_in))
65+
skb_in = cloned_skb = skb_clone(skb_in, GFP_ATOMIC);
66+
67+
if (unlikely(!skb_in || skb_network_header(skb_in) < skb_in->head ||
68+
(skb_network_header(skb_in) + sizeof(struct ipv6hdr)) >
69+
skb_tail_pointer(skb_in) || skb_ensure_writable(skb_in,
70+
skb_network_offset(skb_in) + sizeof(struct ipv6hdr))))
71+
goto out;
72+
73+
orig_ip = ipv6_hdr(skb_in)->saddr;
74+
ipv6_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.in6;
75+
icmpv6_send(skb_in, type, code, info);
76+
ipv6_hdr(skb_in)->saddr = orig_ip;
77+
out:
78+
consume_skb(cloned_skb);
79+
}
80+
EXPORT_SYMBOL(icmpv6_ndo_send);
81+
#endif
4882
#endif

net/xfrm/xfrm_interface.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -300,10 +300,10 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
300300
if (mtu < IPV6_MIN_MTU)
301301
mtu = IPV6_MIN_MTU;
302302

303-
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
303+
icmpv6_ndo_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
304304
} else {
305-
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
306-
htonl(mtu));
305+
icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
306+
htonl(mtu));
307307
}
308308

309309
dst_release(dst);

tools/testing/selftests/wireguard/netns.sh

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
set -e
2525

2626
exec 3>&1
27+
export LANG=C
2728
export WG_HIDE_KEYS=never
2829
netns0="wg-test-$$-0"
2930
netns1="wg-test-$$-1"
@@ -297,7 +298,17 @@ ip1 -4 rule add table main suppress_prefixlength 0
297298
n1 ping -W 1 -c 100 -f 192.168.99.7
298299
n1 ping -W 1 -c 100 -f abab::1111
299300

301+
# Have ns2 NAT into wg0 packets from ns0, but return an icmp error along the right route.
302+
n2 iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -d 192.168.241.0/24 -j SNAT --to 192.168.241.2
303+
n0 iptables -t filter -A INPUT \! -s 10.0.0.0/24 -i vethrs -j DROP # Manual rpfilter just to be explicit.
304+
n2 bash -c 'printf 1 > /proc/sys/net/ipv4/ip_forward'
305+
ip0 -4 route add 192.168.241.1 via 10.0.0.100
306+
n2 wg set wg0 peer "$pub1" remove
307+
[[ $(! n0 ping -W 1 -c 1 192.168.241.1 || false) == *"From 10.0.0.100 icmp_seq=1 Destination Host Unreachable"* ]]
308+
300309
n0 iptables -t nat -F
310+
n0 iptables -t filter -F
311+
n2 iptables -t nat -F
301312
ip0 link del vethrc
302313
ip0 link del vethrs
303314
ip1 link del wg0

0 commit comments

Comments
 (0)