Skip to content

NetworkPolicy that results in multiple PolicyEndpoint resources has unexpected periods of "Default Deny enabled" #201

@essh

Description

@essh

When a NetworkPolicy resource results in multiple PolicyEndpoint resources due to a large number of endpoints (e.g. when they have been chunked) we have observed log messages from https://github.com/aws/aws-network-policy-agent during pod startup with the following content: Default Deny enabled on Egress (from https://github.com/aws/aws-network-policy-agent/blob/17489e56193ee53793831e7f07346ddd15173a6a/controllers/policyendpoints_controller.go#L422).

This is unexpected as the NetworkPolicy resources seeing this issue define both egress & ingress in the spec and have both Ingress & Egress listed in policyTypes so no default deny should take place. Additionally this occurs when NETWORK_POLICY_ENFORCING_MODE=standard so we don't expect blocking like this to occur during startup for a NetworkPolicy that contains the appropriate spec & policyTypes.

What we believe to be happening is that chunking is resulting in PolicyEndpoint resources that all have podIsolation set to both Ingress & Egress values (which reflects the underlying policy) but in some of these PolicyEndpoint resources there will only be ingress rules listed in the spec resulting in the egress rules count being equal to 0 and having this default deny logic applying for some period of time until a later PolicyEndpoint is applied that has the egress rules to apply.

To better demonstrate this I've included some sample resources below which I've redacted and trimmed to focus on the relevant parts. I can provide additional logs and/or configuration privately if required but only at info level for now as that is what we currently log at to avoid policy decision logs at ACCEPT level being logged due to aws/aws-network-policy-agent#467. I've also only captured the contents of /var/log/aws-routed-eni as the log script from https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md#collecting-node-level-tech-support-bundle-for-offline-troubleshooting does not currently work on Bottlerocket which we run.

NetworkPolicy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
...
spec:
  egress:
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: example
      podSelector:
        matchLabels:
          app.kubernetes.io/name: prometheus
    ports:
    - port: 4000
      protocol: TCP
  - from:
    - ipBlock:
        cidr: removed
    - ipBlock:
        cidr: removed
    - ipBlock:
        cidr: removed
    ports:
    - port: 3000
      protocol: TCP
  podSelector:
    matchLabels:
      app.kubernetes.io/name: example
  policyTypes:
  - Ingress
  - Egress

PolicyEndpoints

(1) - the one missing egress but has Egress listed in podIsolation

apiVersion: networking.k8s.aws/v1alpha1
kind: PolicyEndpoint
...
spec:
  ingress:
  - cidr: removed
    ports:
    - port: 4000
      protocol: TCP
  podIsolation:
  - Ingress
  - Egress
  podSelector:
    matchLabels:
      app.kubernetes.io/name: example
  podSelectorEndpoints:
  - removed (there's a lot)

(2)

apiVersion: networking.k8s.aws/v1alpha1
kind: PolicyEndpoint
...
spec:
  egress:
  - cidr: 0.0.0.0/0
  ingress:
  - cidr: removed
    ports:
    - port: 3000
      protocol: TCP
  - cidr: removed
    ports:
    - port: 3000
      protocol: TCP
  - cidr: removed
    ports:
    - port: 3000
      protocol: TCP
  podIsolation:
  - Ingress
  - Egress
  podSelector:
    matchLabels:
      app.kubernetes.io/name: example
  podSelectorEndpoints:
  - removed (there's a lot)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions