-
Notifications
You must be signed in to change notification settings - Fork 108
Migrate AWS EKS AMI from Amazon Linux 2 to Amazon Linux 2023 #3166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Migrate AWS EKS AMI from Amazon Linux 2 to Amazon Linux 2023 #3166
Conversation
AWS is deprecating Amazon Linux 2 (AL2) AMIs for EKS after November 26, 2025. Kubernetes 1.32 is the last version that will support AL2 AMIs. From version 1.33 onwards, only AL2023 and Bottlerocket AMIs will be available. This change updates the default AMI types for EKS node groups: - AL2_x86_64 → AL2023_x86_64 - AL2_x86_64_GPU → AL2023_x86_64_GPU References: - https://docs.aws.amazon.com/eks/latest/userguide/eks-ami-deprecation-faqs.html 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
I haven't tested this yet, but the code changes are minimal. I think we should deploy on AWS and do our standard release workflows to test. |
|
This one is a very welcoming PR, thanks @Adam-D-Lewis. I can review this once Claude finishes this. This one is imporrtante since AWS will be deprecating those AMI soon |
@viniciusdc Claude is done editing. If you want to take over testing that is welcome. Thank you! |
|
The Keycloak PostgreSQL pod can't start because it's not getting volume to work with from the EBS CSI controller. Looks like we can either set a custom launch template with Relavant docs:
|
|
The autoscaler is also not working for the same reason. Relevant docs |
|
Working now, had to add a few recommended permissions. See https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md |
| "arn:${local.partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy", | ||
| "arn:${local.partition}:iam::aws:policy/AmazonEKS_CNI_Policy", | ||
| "arn:${local.partition}:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy", | ||
| aws_iam_policy.worker_autoscaling.arn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the role is no longer needed on the node
|
|
||
| repository = "https://kubernetes.github.io/autoscaler" | ||
| chart = "cluster-autoscaler" | ||
| version = "9.19.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also took the opportunity to update the version here
|
Unit Tests / Pytest (3.10) (pull_request) is failing due to a warning about the upcoming Python 3.10 EOL in a year. I've opened #3170 to mitigate that. |
| aws = { | ||
| source = "hashicorp/aws" | ||
| version = "5.33.0" | ||
| version = "6.18.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
| if launch_template and getattr(launch_template, "ami_id", None): | ||
| return "CUSTOM" | ||
|
|
||
| if gpu_enabled: | ||
| return "AL2_x86_64_GPU" | ||
| return "AL2023_x86_64_NVIDIA" | ||
|
|
||
| return "AL2_x86_64" | ||
| return "AL2023_x86_64_STANDARD" | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check the nvidia works with those new ami's ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently waiting for quota increase approval since we're on OT Nebari dev AWS account now instead of QS.
|
@dcmcand recommends we use https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html instead of IRSA |
I looked into it a bit (including #3171). It appears that IRSA is much closer to GKE Workload Identity and Azure Workload Identity. IRSA is still fully supported and already enabled on the cluster. I'm not opposed to migrating to Pod Identity on EKS, but I think it should be done in a separate issue and not be a blocker for this PR. Even if we switch, we'd still need to maintain IRSA capability in the short term at least. Nebari MLflow plugin (and possibly others uses it). |
|
Hey @Adam-D-Lewis thanks for working on this, what was the last status of this? |
|
It's ready for review, @viniciusdc . I have kept the IRSA approach for now for the reasons I listed above. |
Closes #3152
AWS is deprecating Amazon Linux 2 (AL2) AMIs for EKS after November 26, 2025. Kubernetes 1.32 is the last version that will support AL2 AMIs. From version 1.33 onwards, only AL2023 and Bottlerocket AMIs will be available.
This change updates the default AMI types for EKS node groups:
References:
🤖 Generated with Claude Code
Additional References
Reference Issues or PRs
What does this implement/fix?
Put a
xin the boxes that applyDocumentation
Testing
How to test this PR?
Any other comments?