-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Pods from secondary IPPool cannot reach pods/services in primary IPPool when K8s NetworkPolicy with podSelector is applied
Description
When using multiple Calico IPPools where the secondary pool is outside the Kubernetes --cluster-cidr, pods from the secondary pool cannot communicate with pods/services in the primary pool when standard Kubernetes NetworkPolicy with podSelector or namespaceSelector is applied.
This appears to be related to issue #4004, but persists even after expanding --cluster-cidr to cover both pools.
Environment
- Calico version: 3.24.3
- Kubernetes version: 1.26.5
- Orchestrator: kubeadm / kubespray
- Datastore: Kubernetes (kdd)
- IPIP Mode: CrossSubnet (required due to 4 physical subnets in production)
- VXLAN Mode: Never
Note: Switching to ipipMode: Always is not acceptable due to encapsulation overhead for intra-subnet traffic.
IPPool Configuration
# Primary pool (matches original --cluster-cidr)
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
name: default-ipv4-ippool
spec:
cidr: 192.168.0.0/18
ipipMode: CrossSubnet
vxlanMode: Never
natOutgoing: true
nodeSelector: all()
# Secondary pool (added later, outside original --cluster-cidr)
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
name: second-ipv4-ippool
spec:
cidr: 192.168.192.0/18
ipipMode: CrossSubnet
vxlanMode: Never
natOutgoing: true
nodeSelector: all()Expected Behavior
Pods from second-ipv4-ippool (192.168.192.0/18) should be able to communicate with pods from default-ipv4-ippool (192.168.0.0/18) when NetworkPolicy allows traffic via podSelector or namespaceSelector.
Current Behavior
- Job pod gets IP from secondary pool (192.168.192.x)
- Target pod has IP from primary pool (192.168.0.x)
- NetworkPolicy with
podSelector: {}is applied - Job pod cannot reach target pod via Service or direct Pod IP
- Removing NetworkPolicy - communication works
- Adding Calico NetworkPolicy with
nets: ["192.168.0.0/0"]- works - Adding Calico NetworkPolicy with
nets: ["192.168.0.0/18", "192.168.192.0/18"]- does NOT work
Attempted Fixes
| Fix | Result |
|---|---|
| Verified routing works via tunl0 | OK - routing is functional |
Modified --cluster-cidr in kube-controller-manager to 192.168.0.0/16 |
No effect |
Modified clusterCIDR in kube-proxy ConfigMap to include both CIDRs |
No effect |
| Restarted kube-proxy DaemonSet | No effect |
Created Calico NetworkPolicy with nets: ["192.168.0.0/18", "192.168.192.0/18"] |
No effect |
Created Calico NetworkPolicy with nets: ["192.168.0.0/16"] |
No effect |
Calico NetworkPolicy with nets: ["192.168.0.0/0"] |
Works (but not acceptable - too permissive) |
| Removing all K8s NetworkPolicy | Works (but not acceptable - no segmentation) |
Observations
- WorkloadEndpoints are created for pods from both pools
- Routing between pools works (verified via ping without NetworkPolicy)
- Issue only manifests when K8s NetworkPolicy is present
- Calico NetworkPolicy with specific CIDRs does NOT help
- Only overly permissive policies resolve the issue
Context
Production environment has 4 physical subnets, which requires ipipMode: CrossSubnet for inter-subnet communication. Switching to ipipMode: Always would introduce unnecessary encapsulation overhead for pods communicating within the same physical subnet.
The secondary IPPool was added to expand cluster capacity after initial deployment.
Possible Root Cause
When Calico translates K8s NetworkPolicy podSelector rules, it may not properly recognize pods from IPPools outside the original --cluster-cidr as valid workload endpoints, even after the CIDR is expanded.
Related Issues
- Calico pods in calico pools outside of podCIDR in Kubernetes cannot access clusterIP services when networkpolicy is applied #4004 - Calico pods in calico pools outside of podCIDR in Kubernetes cannot access clusterIP services when networkpolicy is applied
Questions
- How does Calico Felix determine if a source IP belongs to a "known" workload when evaluating NetworkPolicy?
- Is there a configuration to make Calico trust all configured IPPools regardless of Kubernetes
--cluster-cidr? - Is there a race condition in workloadEndpoint registration for pods from secondary pools?
- Why does
nets: ["192.168.0.0/0"]work butnets: ["192.168.0.0/16"]does not?