Skip to content

HELP NEEDED with extra calico pool #11769

@ramili4aktobe-prog

Description

@ramili4aktobe-prog

Pods from secondary IPPool cannot reach pods/services in primary IPPool when K8s NetworkPolicy with podSelector is applied

Description

When using multiple Calico IPPools where the secondary pool is outside the Kubernetes --cluster-cidr, pods from the secondary pool cannot communicate with pods/services in the primary pool when standard Kubernetes NetworkPolicy with podSelector or namespaceSelector is applied.

This appears to be related to issue #4004, but persists even after expanding --cluster-cidr to cover both pools.

Environment

  • Calico version: 3.24.3
  • Kubernetes version: 1.26.5
  • Orchestrator: kubeadm / kubespray
  • Datastore: Kubernetes (kdd)
  • IPIP Mode: CrossSubnet (required due to 4 physical subnets in production)
  • VXLAN Mode: Never

Note: Switching to ipipMode: Always is not acceptable due to encapsulation overhead for intra-subnet traffic.

IPPool Configuration

# Primary pool (matches original --cluster-cidr)
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  name: default-ipv4-ippool
spec:
  cidr: 192.168.0.0/18
  ipipMode: CrossSubnet
  vxlanMode: Never
  natOutgoing: true
  nodeSelector: all()

# Secondary pool (added later, outside original --cluster-cidr)
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  name: second-ipv4-ippool
spec:
  cidr: 192.168.192.0/18
  ipipMode: CrossSubnet
  vxlanMode: Never
  natOutgoing: true
  nodeSelector: all()

Expected Behavior

Pods from second-ipv4-ippool (192.168.192.0/18) should be able to communicate with pods from default-ipv4-ippool (192.168.0.0/18) when NetworkPolicy allows traffic via podSelector or namespaceSelector.

Current Behavior

  1. Job pod gets IP from secondary pool (192.168.192.x)
  2. Target pod has IP from primary pool (192.168.0.x)
  3. NetworkPolicy with podSelector: {} is applied
  4. Job pod cannot reach target pod via Service or direct Pod IP
  5. Removing NetworkPolicy - communication works
  6. Adding Calico NetworkPolicy with nets: ["192.168.0.0/0"] - works
  7. Adding Calico NetworkPolicy with nets: ["192.168.0.0/18", "192.168.192.0/18"] - does NOT work

Attempted Fixes

Fix Result
Verified routing works via tunl0 OK - routing is functional
Modified --cluster-cidr in kube-controller-manager to 192.168.0.0/16 No effect
Modified clusterCIDR in kube-proxy ConfigMap to include both CIDRs No effect
Restarted kube-proxy DaemonSet No effect
Created Calico NetworkPolicy with nets: ["192.168.0.0/18", "192.168.192.0/18"] No effect
Created Calico NetworkPolicy with nets: ["192.168.0.0/16"] No effect
Calico NetworkPolicy with nets: ["192.168.0.0/0"] Works (but not acceptable - too permissive)
Removing all K8s NetworkPolicy Works (but not acceptable - no segmentation)

Observations

  • WorkloadEndpoints are created for pods from both pools
  • Routing between pools works (verified via ping without NetworkPolicy)
  • Issue only manifests when K8s NetworkPolicy is present
  • Calico NetworkPolicy with specific CIDRs does NOT help
  • Only overly permissive policies resolve the issue

Context

Production environment has 4 physical subnets, which requires ipipMode: CrossSubnet for inter-subnet communication. Switching to ipipMode: Always would introduce unnecessary encapsulation overhead for pods communicating within the same physical subnet.

The secondary IPPool was added to expand cluster capacity after initial deployment.

Possible Root Cause

When Calico translates K8s NetworkPolicy podSelector rules, it may not properly recognize pods from IPPools outside the original --cluster-cidr as valid workload endpoints, even after the CIDR is expanded.

Related Issues

Questions

  1. How does Calico Felix determine if a source IP belongs to a "known" workload when evaluating NetworkPolicy?
  2. Is there a configuration to make Calico trust all configured IPPools regardless of Kubernetes --cluster-cidr?
  3. Is there a race condition in workloadEndpoint registration for pods from secondary pools?
  4. Why does nets: ["192.168.0.0/0"] work but nets: ["192.168.0.0/16"] does not?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions