Skip to content

Updating MachinePool, AWSMachinePool, and KubeadmConfig resources does not trigger an ASG instanceRefreshΒ #4071

@wmgroot

Description

@wmgroot

/kind bug

What steps did you take and what happened:
Updating the MachinePool, AWSMachinePool or KubeadmConfig resources does not trigger an instanceRefresh on the AWS ASG.

I expect that with the awsmachinepool.spec.refreshPreferences.disable left on it's default value or false, that changes to the MachinePool, AWSMachinePool, and KubeadmConfig would automatically trigger an instance refresh to rotate nodes in the pool to use the updated settings. Currently, I must manually start instance refreshes using the AWS UI or CLI in order for instances to be replaced when my specs change.

What did you expect to happen:
These are the MachinePool, AWSMachinePool, and KubeadmConfig I'm working with.

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachinePool
metadata:
  annotations:
    cluster.x-k8s.io/replicas-managed-by: external-autoscaler
  name: worker-private-efficient
spec:
  clusterName: awscmhdev2
  replicas: 2
  template:
    metadata: {}
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfig
          name: worker-private-efficient
      clusterName: awscmhdev2
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
        kind: AWSMachinePool
        name: worker-private-efficient
      version: v1.24.7
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachinePool
metadata:
  annotations:
    cluster-api-provider-aws: "true"
  name: worker-private-efficient
spec:
  additionalTags:
    k8s.io/cluster-autoscaler/awscmhdev2: owned
    k8s.io/cluster-autoscaler/enabled: "true"
  awsLaunchTemplate:
    additionalSecurityGroups:
    - filters:
      - name: tag:Name
        values:
        - capi-hostports
      - name: tag:network-zone
        values:
        - qa
      - name: tag:region
        values:
        - us-east-2
    ami:
      id: ami-0f6b6efcd422b9d85
    iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
    instanceType: c5a.2xlarge
    rootVolume:
      deviceName: /dev/sda1
      encrypted: true
      iops: 16000
      size: 100
      throughput: 1000
      type: gp3
    sshKeyName: ""
  capacityRebalance: true
  defaultCoolDown: 5m0s
  maxSize: 5
  minSize: 0
  mixedInstancesPolicy:
    overrides:
    - instanceType: c5a.2xlarge
    - instanceType: m5a.2xlarge
  subnets:
  - filters:
    - name: availability-zone-id
      values:
      - use2-az1
    - name: tag:type
      values:
      - nodes
  - filters:
    - name: availability-zone-id
      values:
      - use2-az2
    - name: tag:type
      values:
      - nodes
  - filters:
    - name: availability-zone-id
      values:
      - use2-az3
    - name: tag:type
      values:
      - nodes
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfig
metadata:
  name: worker-private-efficient
spec:
  files:
  - content: |
      vm.max_map_count=262144
    path: /etc/sysctl.d/90-vm-max-map-count.conf
  - content: |
      fs.inotify.max_user_instances=256
    path: /etc/sysctl.d/91-fs-inotify.conf
  format: cloud-config
  joinConfiguration:
    nodeRegistration:
      kubeletExtraArgs:
        cloud-provider: aws
        eviction-hard: memory.available<500Mi,nodefs.available<10%
        kube-reserved: cpu=500m,memory=2Gi,ephemeral-storage=1Gi
        node-labels: role.node.kubernetes.io/worker=true
        protect-kernel-defaults: "true"
        system-reserved: cpu=500m,memory=1Gi,ephemeral-storage=1Gi
      name: '{{ ds.meta_data.local_hostname }}'
  preKubeadmCommands:
  - sudo systemctl restart systemd-sysctl

I have not set disable: true in my refreshPreferences in the AWSMachinePool spec.

$ kubectl explain awsmachinepool.spec.refreshPreferences.disable
KIND:     AWSMachinePool
VERSION:  infrastructure.cluster.x-k8s.io/v1beta2

FIELD:    disable <boolean>

DESCRIPTION:
     Disable, if true, disables instance refresh from triggering when new launch
     templates are detected. This is useful in scenarios where ASG nodes are
     externally managed.

This is the current state of the runcmd in the LaunchTemplate in AWS, version 1566.

runcmd:
  - "sudo systemctl restart systemd-sysctl"
  - kubeadm join --config /run/kubeadm/kubeadm-join-config.yaml  && echo success > /run/cluster-api/bootstrap-success.complete

I apply a change to add a command to the KubeadmConfig, such as this.

  preKubeadmCommands:
  - sudo systemctl restart systemd-sysctl
  - echo "hello world"

I see the LaunchTemplate, has a new version, and wait for 10 minutes.

I notice that there is no active instance refresh started for my ASG in the instance refresh tab, and my instances are still using the old LaunchTemplate version.

Environment:

  • Cluster-api-provider-aws version: v2.0.2
  • Kubernetes version: (use kubectl version):
$ kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.25.1
Kustomize Version: v4.5.7
Server Version: v1.24.7

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-priorityneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions