Skip to content

Commit 75dd73f

Browse files
committed
Isolate default network services from UDN pods
Currently the behaviour of an advertised UDN is that UDN pods are able to talk any service in the default network in local gateway mode. Expected behaviour is that advertised UDN pods should only be able to reach kapi and dns services in the default network and rest of them should not be reachable. Solution adopted is to change all the relevant flows that were using masIP subnet for services isolation on br-ex to use the podsubnets. Caveat is: if user advertises overlapping podSubnets then things will fall apart and assumption is users can't do that. Flow details (post fix I don't want to call out the current behaviour to avoid confusions -> current behaviour was using the default network flows for UDN pods which was wrong): Onward packet 1) cookie=0xdeff105, duration=1472.742s, table=0, n_packets=9, n_bytes=666, priority=550,ip,in_port=LOCAL,nw_src=103.103.0.0/16,nw_dst=10.96.0.0/16 actions=ct(commit,table=2,zone=64001) this flow is same as that for UDNs with the masSubnet as the src 2) cookie=0xdeff105, duration=1472.742s, table=2, n_packets=9, n_bytes=666, priority=200,ip,nw_src=103.103.0.0/16 actions=mod_dl_dst:02:42:ac:12:00:03,output:3 this flow is also same flow as that for UDNs with the masIP as the src So in this above way ^ any traffic from any advertised UDNs towards clusterIP range will always be taken into the patch port of that respective UDN and this guarantees isolation towards services belonging in other UDNs since that given UDN won't have any LBs for those services For KAPI/DNS services specially we add: cookie=0xdeff105, duration=2319.685s, table=2, n_packets=496, n_bytes=67111, priority=300, ip,nw_dst=10.96.0.1 actions=mod_dl_dst:02:42:ac:12:00:03,output:"patch-breth0_ov" cookie=0xdeff105, duration=2319.685s, table=2, n_packets=0, n_bytes=0, priority=300, ip,nw_dst=10.96.0.10 actions=mod_dl_dst:02:42:ac:12:00:03,output:"patch-breth0_ov" these two genetic flows that will work for advertised and non-advertised networks to send the packet into the defaultnetwork patch port Return packet 3) There is no need for returning any packets correctly unless its for kapi and/or dns for UDNs, I see this flow: cookie=0xdeff105, duration=684.087s, table=0, n_packets=0, n_bytes=0, idle_age=684, priority=500,ip,in_port=2,nw_src=10.96.0.0/16,nw_dst=169.254.0.0/17 actions=ct(table=3,zone=64001,nat) cookie=0xdeff105, duration=684.050s, table=0, n_packets=0, n_bytes=0, idle_age=684, priority=500,ip,in_port=4,nw_src=10.96.0.0/16,nw_dst=169.254.0.0/17 actions=ct(table=3,zone=64001,nat) but for BGP I add these two flows: cookie=0xdeff105, duration=264.196s, table=0, n_packets=0, n_bytes=0, priority=490, in_port="patch-breth0_ov",nw_src=10.96.0.10,actions=ct(table=3,zone=64001,nat) cookie=0xdeff105, duration=264.196s, table=0, n_packets=0, n_bytes=0, priority=490, in_port="patch-breth0_ov",nw_src=10.96.0.1,actions=ct(table=3,zone=64001,nat) generic enough to match all BGP networks There is scope for scale improvement by combining the onward packet flows at tables 0 and 2 into one entity for onward traffic, but maybe that could also be done for UDNs? There is now isolation established between UDN pods and default network because except kapi and dns all other traffic matching clusterIP is sent to the respective UDN patch ports where its black-holed (in l3 it goes back and forth from breth0 to GR and back where its dropped using 105 flow). However in L2 we do see the expected bad behaviour where packet is looping from management port -> breth0 -> GR -> management port -> breth0 and so on which is a never ending loop If we decide to perhaps ban the circular looping and optimize the flow better using a proper drop rule: inport=LOCAL, srcIP=UDNSubnet, dstIP=serviceSubnet -> 1 per advertised UDN “DROP” that can be considered as a future improvement here dump: tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 20:13:18.881036 33dfd1008b804_3 P ifindex 15 0a:58:67:67:00:05 ethertype IPv4 (0x0800), length 80: (tos 0x0, ttl 64, id 25988, offset 0, flags [DF], proto TCP (6), length 60) 103.103.0.5.36363 > 10.96.164.25.80: Flags [S], cksum 0x1614 (incorrect -> 0x5b5d), seq 2541324161, win 65280, options [mss 1360,sackOK,TS val 984084514 ecr 0,nop,wscale 7], length 0 20:13:18.881626 ovn-k8s-mp1 In ifindex 8 0a:58:64:41:00:03 ethertype IPv4 (0x0800), length 80: (tos 0x0, ttl 63, id 25988, offset 0, flags [DF], proto TCP (6), length 60) 103.103.0.5.36363 > 10.96.164.25.80: Flags [S], cksum 0x5b5d (correct), seq 2541324161, win 65280, options [mss 1360,sackOK,TS val 984084514 ecr 0,nop,wscale 7], length 0 20:13:18.881651 breth0 Out ifindex 4 02:42:ac:12:00:02 ethertype IPv4 (0x0800), length 80: (tos 0x0, ttl 62, id 25988, offset 0, flags [DF], proto TCP (6), length 60) 103.103.0.5.36363 > 10.96.164.25.80: Flags [S], cksum 0x5b5d (correct), seq 2541324161, win 65280, options [mss 1360,sackOK,TS val 984084514 ecr 0,nop,wscale 7], length 0 20:13:18.882283 ovn-k8s-mp1 In ifindex 8 0a:58:64:41:00:03 ethertype IPv4 (0x0800), length 80: (tos 0x0, ttl 61, id 25988, offset 0, flags [DF], proto TCP (6), length 60) 103.103.0.5.36363 > 10.96.164.25.80: Flags [S], cksum 0x5b5d (correct), seq 2541324161, win 65280, options [mss 1360,sackOK,TS val 984084514 ecr 0,nop,wscale 7], length 0 20:13:18.882298 breth0 Out ifindex 4 02:42:ac:12:00:02 ethertype IPv4 (0x0800), length 80: (tos 0x0, ttl 60, id 25988, offset 0, flags [DF], proto TCP (6), length 60) 103.103.0.5.36363 > 10.96.164.25.80: Flags [S], cksum 0x5b5d (correct), seq 2541324161, win 65280, options [mss 1360,sackOK,TS val 984084514 ecr 0,nop,wscale 7], length 0 20:13:18.882402 ovn-k8s-mp1 In ifindex 8 0a:58:64:41:00:03 ethertype IPv4 (0x0800), length 80: (tos 0x0, ttl 59, id 25988, offset 0, flags [DF], proto TCP (6), length 60) 103.103.0.5.36363 > 10.96.164.25.80: Flags [S], cksum 0x5b5d (correct), seq 2541324161, win 65280, options [mss 1360,sackOK,TS val 984084514 ecr 0,nop,wscale 7], length 0 20:13:18.882414 breth0 Out ifindex 4 02:42:ac:12:00:02 ethertype IPv4 (0x0800), length 80: (tos 0x0, ttl 58, id 25988, offset 0, flags [DF], proto TCP (6), length 60) 103.103.0.5.36363 > 10.96.164.25.80: Flags [S], cksum 0x5b5d (correct), seq 2541324161, win 65280, options [mss 1360,sackOK,TS val 984084514 ecr 0,nop,wscale 7], length $ oc rsh -n blue blue3 / # curl --local-port 36363 10.96.164.25:80 curl: (7) Failed to connect to 10.96.164.25 port 80 after 3 ms: Host is unreachable Signed-off-by: Surya Seetharaman <[email protected]>
1 parent 14237a8 commit 75dd73f

File tree

3 files changed

+370
-11
lines changed

3 files changed

+370
-11
lines changed

go-controller/pkg/node/gateway_shared_intf.go

Lines changed: 87 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -411,18 +411,33 @@ func (npw *nodePortWatcher) updateServiceFlowCache(service *corev1.Service, netI
411411
}
412412

413413
ipPrefix := "ip"
414-
masqueradeSubnet := config.Gateway.V4MasqueradeSubnet
415414
if !utilnet.IsIPv4String(service.Spec.ClusterIP) {
416415
ipPrefix = "ipv6"
417-
masqueradeSubnet = config.Gateway.V6MasqueradeSubnet
418416
}
419417
// table 2, user-defined network host -> OVN towards default cluster network services
420418
defaultNetConfig := npw.ofm.defaultBridge.getActiveNetworkBridgeConfig(types.DefaultNetworkName)
421-
422-
npw.ofm.updateFlowCacheEntry(key, []string{fmt.Sprintf("cookie=%s, priority=300, table=2, %s, %s_src=%s, %s_dst=%s, "+
419+
// sample flow: cookie=0xdeff105, duration=2319.685s, table=2, n_packets=496, n_bytes=67111, priority=300,
420+
// ip,nw_dst=10.96.0.1 actions=mod_dl_dst:02:42:ac:12:00:03,output:"patch-breth0_ov"
421+
// This flow is used for UDNs and advertised UDNs to be able to reach kapi and dns services alone on default network
422+
flows := []string{fmt.Sprintf("cookie=%s, priority=300, table=2, %s, %s_dst=%s, "+
423423
"actions=set_field:%s->eth_dst,output:%s",
424-
defaultOpenFlowCookie, ipPrefix, ipPrefix, masqueradeSubnet, ipPrefix, service.Spec.ClusterIP,
425-
npw.ofm.getDefaultBridgeMAC().String(), defaultNetConfig.ofPortPatch)})
424+
defaultOpenFlowCookie, ipPrefix, ipPrefix, service.Spec.ClusterIP,
425+
npw.ofm.getDefaultBridgeMAC().String(), defaultNetConfig.ofPortPatch)}
426+
if util.IsRouteAdvertisementsEnabled() {
427+
// if the network is advertised, then for the reply from kapi and dns services to go back
428+
// into the UDN's VRF we need flows that statically send this to the local port
429+
// sample flow: cookie=0xdeff105, duration=264.196s, table=0, n_packets=0, n_bytes=0, priority=490,ip,
430+
// in_port="patch-breth0_ov",nw_src=10.96.0.10,actions=ct(table=3,zone=64001,nat)
431+
// this flow is meant to match all advertised UDNs and then the ip rules on the host will take
432+
// this packet into the corresponding UDNs
433+
// NOTE: We chose priority 490 to differentiate this flow from the flow at priority 500 added for the
434+
// non-advertised UDNs reponse for debugging purposes:
435+
// sample flow for non-advertised UDNs: cookie=0xdeff105, duration=684.087s, table=0, n_packets=0, n_bytes=0,
436+
// idle_age=684, priority=500,ip,in_port=2,nw_src=10.96.0.0/16,nw_dst=169.254.0.0/17 actions=ct(table=3,zone=64001,nat)
437+
flows = append(flows, fmt.Sprintf("cookie=%s, priority=490, in_port=%s, ip, ip_src=%s,actions=ct(zone=%d,nat,table=3)",
438+
defaultOpenFlowCookie, defaultNetConfig.ofPortPatch, service.Spec.ClusterIP, config.Default.HostMasqConntrackZone))
439+
}
440+
npw.ofm.updateFlowCacheEntry(key, flows)
426441
}
427442
}
428443
return utilerrors.Join(errors...)
@@ -1593,6 +1608,37 @@ func flowsForDefaultBridge(bridge *bridgeConfiguration, extraIPs []net.IP) ([]st
15931608
"actions=ct(commit,zone=%d,table=2)",
15941609
defaultOpenFlowCookie, ofPortHost, protoPrefix, protoPrefix,
15951610
masqSubnet, protoPrefix, svcCIDR, config.Default.HostMasqConntrackZone))
1611+
if util.IsRouteAdvertisementsEnabled() {
1612+
// If the UDN is advertised then instead of matching on the masqSubnet
1613+
// we match on the UDNPodSubnet itself and we also don't SNAT to 169.254.0.2
1614+
// sample flow: cookie=0xdeff105, duration=1472.742s, table=0, n_packets=9, n_bytes=666, priority=550
1615+
// ip,in_port=LOCAL,nw_src=103.103.0.0/16,nw_dst=10.96.0.0/16 actions=ct(commit,table=2,zone=64001)
1616+
for _, netConfig := range bridge.patchedNetConfigs() {
1617+
if netConfig.isDefaultNetwork() {
1618+
continue
1619+
}
1620+
if netConfig.advertised.Load() {
1621+
var udnAdvertisedSubnets []*net.IPNet
1622+
for _, clusterEntry := range netConfig.subnets {
1623+
udnAdvertisedSubnets = append(udnAdvertisedSubnets, clusterEntry.CIDR)
1624+
}
1625+
// Filter subnets based on the clusterIP service family
1626+
// NOTE: We don't support more than 1 subnet CIDR of same family type; we only pick the first one
1627+
matchingIPFamilySubnet, err := util.MatchFirstIPNetFamily(utilnet.IsIPv6CIDR(svcCIDR), udnAdvertisedSubnets)
1628+
if err != nil {
1629+
klog.Infof("Unable to determine UDN subnet for the provided family isIPV6: %t, %v", utilnet.IsIPv6CIDR(svcCIDR), err)
1630+
continue
1631+
}
1632+
1633+
// Use the filtered subnet for the flow compute instead of the masqueradeIP
1634+
dftFlows = append(dftFlows,
1635+
fmt.Sprintf("cookie=%s, priority=550, in_port=%s, %s, %s_src=%s, %s_dst=%s, "+
1636+
"actions=ct(commit,zone=%d,table=2)",
1637+
defaultOpenFlowCookie, ofPortHost, protoPrefix, protoPrefix,
1638+
matchingIPFamilySubnet.String(), protoPrefix, svcCIDR, config.Default.HostMasqConntrackZone))
1639+
}
1640+
}
1641+
}
15961642
}
15971643

15981644
masqDst := masqIP
@@ -1706,10 +1752,27 @@ func flowsForDefaultBridge(bridge *bridgeConfiguration, extraIPs []net.IP) ([]st
17061752
if netConfig.isDefaultNetwork() {
17071753
continue
17081754
}
1755+
srcIPOrSubnet := netConfig.v4MasqIPs.ManagementPort.IP.String()
1756+
if util.IsRouteAdvertisementsEnabled() && netConfig.advertised.Load() {
1757+
var udnAdvertisedSubnets []*net.IPNet
1758+
for _, clusterEntry := range netConfig.subnets {
1759+
udnAdvertisedSubnets = append(udnAdvertisedSubnets, clusterEntry.CIDR)
1760+
}
1761+
// Filter subnets based on the clusterIP service family
1762+
// NOTE: We don't support more than 1 subnet CIDR of same family type; we only pick the first one
1763+
matchingIPFamilySubnet, err := util.MatchFirstIPNetFamily(false, udnAdvertisedSubnets)
1764+
if err != nil {
1765+
klog.Infof("Unable to determine IPV4 UDN subnet for the provided family isIPV6: %v", err)
1766+
continue
1767+
}
1768+
1769+
// Use the filtered subnets for the flow compute instead of the masqueradeIP
1770+
srcIPOrSubnet = matchingIPFamilySubnet.String()
1771+
}
17091772
dftFlows = append(dftFlows,
17101773
fmt.Sprintf("cookie=%s, priority=200, table=2, ip, ip_src=%s, "+
17111774
"actions=set_field:%s->eth_dst,output:%s",
1712-
defaultOpenFlowCookie, netConfig.v4MasqIPs.ManagementPort.IP,
1775+
defaultOpenFlowCookie, srcIPOrSubnet,
17131776
bridgeMacAddress, netConfig.ofPortPatch))
17141777
dftFlows = append(dftFlows,
17151778
fmt.Sprintf("cookie=%s, priority=200, table=2, ip, pkt_mark=%s, "+
@@ -1724,11 +1787,27 @@ func flowsForDefaultBridge(bridge *bridgeConfiguration, extraIPs []net.IP) ([]st
17241787
if netConfig.isDefaultNetwork() {
17251788
continue
17261789
}
1790+
srcIPOrSubnet := netConfig.v6MasqIPs.ManagementPort.IP.String()
1791+
if util.IsRouteAdvertisementsEnabled() && netConfig.advertised.Load() {
1792+
var udnAdvertisedSubnets []*net.IPNet
1793+
for _, clusterEntry := range netConfig.subnets {
1794+
udnAdvertisedSubnets = append(udnAdvertisedSubnets, clusterEntry.CIDR)
1795+
}
1796+
// Filter subnets based on the clusterIP service family
1797+
// NOTE: We don't support more than 1 subnet CIDR of same family type; we only pick the first one
1798+
matchingIPFamilySubnet, err := util.MatchFirstIPNetFamily(true, udnAdvertisedSubnets)
1799+
if err != nil {
1800+
klog.Infof("Unable to determine IPV6 UDN subnet for the provided family isIPV6: %v", err)
1801+
continue
1802+
}
17271803

1804+
// Use the filtered subnets for the flow compute instead of the masqueradeIP
1805+
srcIPOrSubnet = matchingIPFamilySubnet.String()
1806+
}
17281807
dftFlows = append(dftFlows,
17291808
fmt.Sprintf("cookie=%s, priority=200, table=2, ip6, ipv6_src=%s, "+
17301809
"actions=set_field:%s->eth_dst,output:%s",
1731-
defaultOpenFlowCookie, netConfig.v6MasqIPs.ManagementPort.IP,
1810+
defaultOpenFlowCookie, srcIPOrSubnet,
17321811
bridgeMacAddress, netConfig.ofPortPatch))
17331812
dftFlows = append(dftFlows,
17341813
fmt.Sprintf("cookie=%s, priority=200, table=2, ip6, pkt_mark=%s, "+

0 commit comments

Comments
 (0)