Skip to content

[Help] AL2023 nodes are not joining the EKS cluster #8468

@cshiv

Description

@cshiv

What help do you need?

I am trying to create EKS cluster with AL2023 nodes and trying to check swap features in k8s.

Here is my config file

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: swap-al2023         # Name of the EKS cluster
  region: us-east-1            # Change to your AWS region
  version: "1.32"              # Change if needed

vpc:
  subnets:
    private:
      us-east-1a: { id: <redacted> }
      us-east-1b: { id: <redacted> }
      us-east-1c: { id: <redacted> }

# nodeGroups:
managedNodeGroups:
  - name: my-node-group
    amiFamily: AmazonLinux2023
    iam:
      attachPolicyARNs:
        - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

    ssh:
      allow: true
      publicKeyName: test
    preBootstrapCommands: 
      - dd if=/dev/zero of=/swapfile bs=128M count=32
      - chmod 600 /swapfile
      - mkswap /swapfile
      - swapon /swapfile
      - swapon -s
      - "echo '/swapfile swap swap defaults 0 0' >> /etc/fstab"
  
    instanceType: m5.xlarge
    minSize: 1
    desiredCapacity: 3
    maxSize: 3
    privateNetworking: true
    tags:
      Name: swap-testing-nodes

    volumeSize: 20 # Adjust volume size as needed
    subnets:
      - <redacted>
      - <redacted>
      - <redacted>

    overrideBootstrapCommand: |
      apiVersion: node.eks.aws/v1alpha1
      kind: NodeConfig
      spec:
        kubelet:
          config:
            failSwapOn: false
            memorySwap:
              swapBehavior: LimitedSwap  

iam:
  withOIDC: true  # Enables OIDC provider for IAM roles

addons:
  - name: aws-ebs-csi-driver
    wellKnownPolicies:
      ebsCSIController: true  # Automatically attaches IAM policies for EBS

  - name: vpc-cni
    version: latest  # AWS VPC CNI plugin for networking

  - name: coredns
    version: latest  # CoreDNS for service discovery

  - name: kube-proxy
    version: latest  # Manages network rules for Kubernetes services

With this I am getting following issues:

  1. SSM is not working, I am getting issue The SSM Agent was unable to connect to a Systems Manager endpoint to register itself with the service. . I assume ssm agent is installed and started by default in AL2023.

  2. Nodes are not joining the cluster. I did not get much information from the CF stacks or the system logs. When I checked the user data, I couldn't get assess if it's right or wrong as there is no clear documentation on bootstrap commands for AL 2023. This was the user data

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: application/node.eks.aws

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    apiServerEndpoint: <redacted>
    certificateAuthority: <redacted>
    cidr: 172.20.0.0/16
    name: swap-al2023
  kubelet:
    config:
      maxPods: 58
      clusterDNS:
      - 172.20.0.10
    flags:
    - "--node-labels=eks.amazonaws.com/sourceLaunchTemplateVersion=1,alpha.eksctl.io/cluster-name=swap-al2023,alpha.eksctl.io/nodegroup-name=my-node-group,eks.amazonaws.com/nodegroup-image=ami-0c208a4dab792bb18,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup=my-node-group,eks.amazonaws.com/sourceLaunchTemplateId=lt-0b11f5b7bae039f92"

--//
Content-Type: text/x-shellscript
Content-Type: charset="us-ascii"

#!/bin/bash
dd if=/dev/zero of=/swapfile bs=128M count=32
--//
Content-Type: text/x-shellscript
Content-Type: charset="us-ascii"

#!/bin/bash
chmod 600 /swapfile
--//
Content-Type: text/x-shellscript
Content-Type: charset="us-ascii"

#!/bin/bash
mkswap /swapfile
--//
Content-Type: text/x-shellscript
Content-Type: charset="us-ascii"

#!/bin/bash
swapon /swapfile
--//
Content-Type: text/x-shellscript
Content-Type: charset="us-ascii"

#!/bin/bash
swapon -s
--//
Content-Type: text/x-shellscript
Content-Type: charset="us-ascii"

#!/bin/bash
echo '/swapfile swap swap defaults 0 0' >> /etc/fstab
--//
Content-Type: application/node.eks.aws

apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
metadata:
  creationTimestamp: null
spec:
  cluster: {}
  containerd: {}
  instance:
    localStorage: {}
  kubelet:
    config:
      failSwapOn: false
      memorySwap:
        swapBehavior: LimitedSwap

--//--

Is this expected? Should I add any specific instructions to add nodes to the cluster, I assume they are added automatically. How can I fix this? Merging of the commands also looks different, Does the user data look good ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions