-
Notifications
You must be signed in to change notification settings - Fork 18
Description
When a NetworkPolicy resource results in multiple PolicyEndpoint resources due to a large number of endpoints (e.g. when they have been chunked) we have observed log messages from https://github.com/aws/aws-network-policy-agent during pod startup with the following content: Default Deny enabled on Egress (from https://github.com/aws/aws-network-policy-agent/blob/17489e56193ee53793831e7f07346ddd15173a6a/controllers/policyendpoints_controller.go#L422).
This is unexpected as the NetworkPolicy resources seeing this issue define both egress & ingress in the spec and have both Ingress & Egress listed in policyTypes so no default deny should take place. Additionally this occurs when NETWORK_POLICY_ENFORCING_MODE=standard so we don't expect blocking like this to occur during startup for a NetworkPolicy that contains the appropriate spec & policyTypes.
What we believe to be happening is that chunking is resulting in PolicyEndpoint resources that all have podIsolation set to both Ingress & Egress values (which reflects the underlying policy) but in some of these PolicyEndpoint resources there will only be ingress rules listed in the spec resulting in the egress rules count being equal to 0 and having this default deny logic applying for some period of time until a later PolicyEndpoint is applied that has the egress rules to apply.
To better demonstrate this I've included some sample resources below which I've redacted and trimmed to focus on the relevant parts. I can provide additional logs and/or configuration privately if required but only at info level for now as that is what we currently log at to avoid policy decision logs at ACCEPT level being logged due to aws/aws-network-policy-agent#467. I've also only captured the contents of /var/log/aws-routed-eni as the log script from https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md#collecting-node-level-tech-support-bundle-for-offline-troubleshooting does not currently work on Bottlerocket which we run.
NetworkPolicy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
...
spec:
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: example
podSelector:
matchLabels:
app.kubernetes.io/name: prometheus
ports:
- port: 4000
protocol: TCP
- from:
- ipBlock:
cidr: removed
- ipBlock:
cidr: removed
- ipBlock:
cidr: removed
ports:
- port: 3000
protocol: TCP
podSelector:
matchLabels:
app.kubernetes.io/name: example
policyTypes:
- Ingress
- Egress
PolicyEndpoints
(1) - the one missing egress but has Egress listed in podIsolation
apiVersion: networking.k8s.aws/v1alpha1
kind: PolicyEndpoint
...
spec:
ingress:
- cidr: removed
ports:
- port: 4000
protocol: TCP
podIsolation:
- Ingress
- Egress
podSelector:
matchLabels:
app.kubernetes.io/name: example
podSelectorEndpoints:
- removed (there's a lot)
(2)
apiVersion: networking.k8s.aws/v1alpha1
kind: PolicyEndpoint
...
spec:
egress:
- cidr: 0.0.0.0/0
ingress:
- cidr: removed
ports:
- port: 3000
protocol: TCP
- cidr: removed
ports:
- port: 3000
protocol: TCP
- cidr: removed
ports:
- port: 3000
protocol: TCP
podIsolation:
- Ingress
- Egress
podSelector:
matchLabels:
app.kubernetes.io/name: example
podSelectorEndpoints:
- removed (there's a lot)