Skip to content

Misconfigured PDB blocks draining node indefinitely #2802

@felipecruz91

Description

@felipecruz91

Description

Observed Behavior: Karpenter cannot disrupt a node if there's a pod with PDB (minAvailable: 100%) despite having a NodeClaim with both expireAfter (720h) and terminationGracePeriod (1h).

Expected Behavior: Once the terminationGracePeriod elapses, remaining pods should be forcibly deleted and the underlying instance will be terminated. Source: https://karpenter.sh/docs/concepts/disruption/#terminationgraceperiod

Reproduction Steps (Please include YAML):

  1. Deploy a Pod with a PDB like the following:
apiVersion: v1
kind: Pod
metadata:
  name: nginx-test-pdb-pod
  labels:
    app: nginx-test-pdb-pod
spec:
  nodeSelector:
    karpenter.sh/nodepool: test-huge-crowd-general
  tolerations:
  - key: taint.nr-ops.net/general-pool
    operator: Exists
    effect: NoSchedule
  containers:
  - name: nginx
    image: nginx:latest
    ports:
    - containerPort: 80
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: nginx-test-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: nginx-test-pdb-pod
  1. Manually drain the node with kubectl drain node <node> ...

  2. Observe that the node gets in unschedulable state and never gets deleted (even after the grace period of 1h has expired), causing a costs increase. (FWIW if the pod is deleted manually then Karpenter removes the node eventually).

│ Events:                                                                                                                                                                                                                       │
│   Type    Reason              Age                 From       Message                                                                                                                                                          │
│   ----    ------              ----                ----       -------                                                                                                                                                          │
│   Normal  Unconsolidatable    13m (x86 over 23h)  karpenter  Can't replace with a cheaper node                                                                                                                                │
│   Normal  DisruptionBlocked   26s                 karpenter  Pdb "default/nginx-test-pdb" prevents pod evictions

Versions:

  • Chart Version: 1.3.2
  • Kubernetes Version (kubectl version): v1.32.9-0
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/supportCategorizes issue or PR as a support question.needs-prioritytriage/solvedIndicates an issue that has been considered solved by the maintainers.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions