Skip to content

Commit 5ebcf6c

Browse files
authored
Add AWS VPC CNI IP exhaustion (#102)
* feat(rules): add nginx ingress SSL certificate crisis detection Add new rule CRE-2025-0120 to detect critical SSL certificate failures in NGINX Ingress Controllers * feat(rules): add AWS VPC CNI IP exhaustion crisis rule and tags Add new rule for detecting and mitigating AWS VPC CNI IP address exhaustion scenarios. Includes related tags for IP exhaustion, ENI allocation, pod scheduling, and cluster scaling issues.
1 parent 573fbb0 commit 5ebcf6c

File tree

3 files changed

+120
-0
lines changed

3 files changed

+120
-0
lines changed
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
rules:
2+
- cre:
3+
id: CRE-2025-0121
4+
severity: 0
5+
title: AWS VPC CNI IP Address Exhaustion Crisis
6+
category: networking-problem
7+
author: Prequel
8+
description: |
9+
Critical AWS VPC CNI IP address exhaustion detected. This pattern indicates cascading failures
10+
where subnet IP exhaustion leads to ENI allocation failures, pod scheduling failures, and
11+
complete service unavailability. The failure sequence shows IP allocation errors, ENI attachment
12+
failures, and resulting pod startup failures that affect cluster scalability and workload deployment.
13+
cause: |
14+
- Subnet IP address pool exhaustion in VPC
15+
- Maximum ENI limit reached per EC2 instance
16+
- Secondary IP allocation failures on existing ENIs
17+
- VPC CNI plugin configuration errors
18+
- Insufficient subnet CIDR block size for cluster scale
19+
- ENI warm pool depletion during traffic spikes
20+
- AWS API rate limiting on EC2 ENI operations
21+
- Security group or NACL blocking ENI operations
22+
- IAM permissions missing for ENI management
23+
- Cross-AZ networking constraints affecting IP allocation
24+
impact: |
25+
- CRITICAL: Complete inability to schedule new pods
26+
- Existing pods fail to restart or scale
27+
- Service degradation due to reduced pod capacity
28+
- Cluster autoscaling failures and node provisioning issues
29+
- Application deployment failures and rollback complications
30+
- Load balancer health check failures due to unreachable pods
31+
- Cascading failures across microservices architecture
32+
- Data plane connectivity loss between pods
33+
- Revenue loss from service unavailability
34+
- Compliance violations for high-availability requirements
35+
impactScore: 10
36+
tags:
37+
- aws
38+
- vpc-cni
39+
- kubernetes
40+
- networking
41+
- ip-exhaustion
42+
- eni-allocation
43+
- pod-scheduling
44+
- cluster-scaling
45+
- high-availability
46+
- service-unavailability
47+
mitigation: |
48+
IMMEDIATE ACTIONS:
49+
- Check available IPs in subnets: `aws ec2 describe-subnets --subnet-ids subnet-xxx`
50+
- Verify ENI limits: `aws ec2 describe-network-interfaces --filters Name=attachment.instance-id,Values=i-xxx`
51+
- Monitor VPC CNI logs: `kubectl logs -n kube-system -l app=aws-node`
52+
- Check pod scheduling: `kubectl get pods --all-namespaces | grep Pending`
53+
- Verify CNI configuration: `kubectl get configmap -n kube-system aws-node -o yaml`
54+
55+
RECOVERY STEPS:
56+
1. Add additional subnets with larger CIDR blocks
57+
2. Increase ENI warm pool size: `kubectl set env daemonset aws-node -n kube-system WARM_ENI_TARGET=2`
58+
3. Enable prefix delegation: `kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true`
59+
4. Scale down non-critical workloads to free IPs
60+
5. Restart VPC CNI daemonset: `kubectl rollout restart daemonset/aws-node -n kube-system`
61+
6. Monitor IP allocation recovery: `kubectl get pods -n kube-system -l app=aws-node`
62+
63+
PREVENTION:
64+
- Implement IP address monitoring and alerting
65+
- Configure subnet auto-scaling with larger CIDR blocks
66+
- Set up VPC CNI metrics monitoring in CloudWatch
67+
- Implement pod density limits per node
68+
- Use prefix delegation for improved IP efficiency
69+
- Regular capacity planning for cluster growth
70+
- Implement network policy optimization
71+
- Set up automated subnet provisioning
72+
references:
73+
- https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
74+
- https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md
75+
- https://aws.amazon.com/blogs/containers/amazon-vpc-cni-increases-pods-per-node-limits/
76+
- https://docs.aws.amazon.com/eks/latest/userguide/cni-custom-network.html
77+
- https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/
78+
applications:
79+
- name: amazon-vpc-cni-k8s
80+
version: ">= 1.7.0"
81+
- name: kubernetes
82+
version: ">= 1.18.0"
83+
mitigationScore: 6
84+
metadata:
85+
gen: 1
86+
id: 6E7meYDEvC5c6yub5dVgkW
87+
kind: prequel
88+
rule:
89+
set:
90+
event:
91+
source: cre.log.aws-vpc-cni
92+
match:
93+
- regex: "failed to allocate a private IP address.*no available IP addresses|ENI allocation failed.*insufficient IP addresses|failed to assign private IP.*AddressLimitExceeded|pod.*failed.*no available IP|insufficient IP addresses in subnet|failed to create ENI.*AddressLimitExceeded|unable to provision ENI.*IP address limit|failed to allocate IP.*subnet has no available addresses|pod scheduling failed.*insufficient IP addresses|CNI failed to allocate IP.*no free addresses"

rules/cre-2025-0122/test.log

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
2025/07/02 08:29:03 [ERROR] aws-node-daemonset-xyz: ipamd.go:1234 failed to allocate ENI: AddressLimitExceeded: The maximum number of addresses has been reached.
2+
2025/07/02 08:29:03 [ERROR] aws-node-daemonset-xyz: ipamd.go:1235 no available IP addresses in subnet
3+
2025/07/02 08:29:03 [WARN] aws-node-daemonset-xyz: ipamd.go:1236 insufficient IP addresses available for new pods
4+
2025/07/02 08:29:03 [ERROR] kubelet: event.go:294 FailedScheduling: 0/3 nodes are available: 3 Insufficient IP addresses in subnet
5+
2025/07/02 08:29:03 [ERROR] kubelet: event.go:295 FailedScheduling: pod "test-app-deployment-abc123-xyz" failed to fit in any node
6+
2025/07/02 08:29:03 [ERROR] scheduler: scheduler.go:456 Failed to schedule pod test-app/test-pod-789: Insufficient IP
7+
2025/07/02 08:29:03 [ERROR] aws-node: cni.go:123 failed to assign an IP address to container: no available IP addresses in subnet
8+
2025/07/02 08:29:03 [ERROR] aws-node: eni.go:234 failed to allocate ENI for pod test-pod-456: NetworkInterfaceLimitExceeded
9+
2025/07/02 08:29:03 [ERROR] aws-node: ipam.go:345 IPAM: failed to get IP address from datastore: no available IP addresses
10+
2025/07/02 08:29:03 [ERROR] aws-node: ec2.go:567 EC2 API error: AddressLimitExceeded - The maximum number of addresses has been reached
11+
2025/07/02 08:29:03 [ERROR] aws-node: ec2.go:568 EC2 API error: NetworkInterfaceLimitExceeded - The maximum number of network interfaces has been reached
12+
2025/07/02 08:29:03 [ERROR] aws-node: vpc.go:789 VPC CNI error: insufficient IP addresses in subnet for pod allocation
13+
2025/07/02 08:29:03 [ERROR] cluster-autoscaler: scale_up.go:123 failed to scale up: nodes cannot accommodate new pods due to IP exhaustion in VPC
14+
2025/07/02 08:29:03 [ERROR] karpenter: provisioner.go:234 failed to provision new node: insufficient IP addresses in subnet
15+
2025/07/02 08:29:03 [ERROR] aws-load-balancer-controller: controller.go:345 failed to create target group: no available IP addresses
16+
2025/07/02 08:29:03 [ERROR] deployment-controller: deployment.go:456 Deployment "critical-app" failed: pods cannot be scheduled due to IP exhaustion
17+
2025/07/02 08:29:03 [ERROR] replicaset-controller: replicaset.go:567 ReplicaSet "web-app-rs" failed to create pods: Insufficient IP addresses
18+
2025/07/02 08:29:03 [ERROR] statefulset-controller: statefulset.go:678 StatefulSet "database" stuck: cannot allocate IP addresses for new pods
19+
2025/07/02 08:29:03 [ERROR] service-controller: service.go:789 Service "api-service" endpoints unavailable: pods failed to start due to IP exhaustion
20+
2025/07/02 08:29:03 [ERROR] ingress-controller: ingress.go:890 Ingress "web-ingress" backend unavailable: target pods cannot be scheduled
21+
2025/07/02 08:29:03 [ERROR] dns-controller: dns.go:901 DNS resolution failing: CoreDNS pods cannot be scheduled due to IP exhaustion

rules/tags/tags.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -833,3 +833,9 @@ tags:
833833
- name: certificate-verification
834834
displayName: Certificate Verification
835835
description: Issues with SSL/TLS certificate verification including trust chain validation, certificate authority verification, and hostname matching
836+
- name: pod-scheduling
837+
displayName: Pod Scheduling
838+
description: Issues with Kubernetes pod scheduling due to resource constraints or networking problems
839+
- name: cluster-scaling
840+
displayName: Cluster Scaling
841+
description: Problems related to Kubernetes cluster scaling operations and capacity management

0 commit comments

Comments
 (0)