Skip to content
This repository was archived by the owner on Nov 17, 2022. It is now read-only.

k8s-spot-rescheduler doesn't handle pod disruption budgets nicely, leaving nodes underutilized and tainted #54

@morganwalker

Description

@morganwalker

We're using kops 1.10.0 and k8s 1.10.11. We're using two separate instance groups (IG), nodes (on-demand) and spots (spot), both spread across 3 availability zones. I've applied the appropriate nodeLabels and have defined the following in my k8s-spot-rescheduler deployment manifest:

- --on-demand-node-label=on-demand
- --spot-node-label=spot

The nodes IG has the spot=false:PreferNoSchedule taint so the spots IG is preferred. I'm using the cluster autoscaler to autodiscover both IGs via the --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,kubernetes.io/cluster/kubernetes.metis.wtf and these tags exist on both IGs. I've confirmed that pods on most nodes nodes are able to be drained and moved to spots nodes. With an exception:

  • k8s-spot-reschedule picks a node and states

    moved. Will drain node.
    

    which isn't true

  • It then figures out it's unable to drain the node due to PDBs

    E0117 14:03:51.801764       1 rescheduler.go:302] Failed to drain node: Failed to drain node /ip-172- 
    20-61-39.ec2.internal, due to following errors: [Failed to evict pod skafos-notebooks/hub- 
    deployment-cf799d494-gp6z4 within allowed timeout (last error: Cannot evict pod as it would 
    violate the pod's disruption budget.)]
    

    and aborts the drain.

Now we're left with an on-demand node that has had all of its pods evicted except those with PDBs, leaving the on-demand node underutilized and tainted with ToBeDeletedByClusterAutoscaler. It seems like it should check if it can drain all pods, taking into consideration PDBs, and if it can't, don't evict any pods and don't taint with ToBeDeletedByClusterAutoscaler.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions