Skip to content

Conversation

@Adam-D-Lewis
Copy link
Member

@Adam-D-Lewis Adam-D-Lewis commented Oct 23, 2025

Closes #3152

AWS is deprecating Amazon Linux 2 (AL2) AMIs for EKS after November 26, 2025. Kubernetes 1.32 is the last version that will support AL2 AMIs. From version 1.33 onwards, only AL2023 and Bottlerocket AMIs will be available.

This change updates the default AMI types for EKS node groups:

  • AL2_x86_64 → AL2023_x86_64
  • AL2_x86_64_GPU → AL2023_x86_64_GPU

References:

🤖 Generated with Claude Code

Additional References

Reference Issues or PRs

What does this implement/fix?

Put a x in the boxes that apply

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds a feature)
  • Breaking change (fix or feature that would cause existing features not to work as expected)
  • Documentation Update
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Build related changes
  • Other (please describe):

Documentation

  • For new features or enhancements, a corresponding PR has been opened in the documentation repository (if applicable)
    • Link to docs PR:

Testing

  • Did you test the pull request locally?
  • Did you add new tests?

How to test this PR?

Any other comments?

AWS is deprecating Amazon Linux 2 (AL2) AMIs for EKS after November 26, 2025.
Kubernetes 1.32 is the last version that will support AL2 AMIs. From version 1.33
onwards, only AL2023 and Bottlerocket AMIs will be available.

This change updates the default AMI types for EKS node groups:
- AL2_x86_64 → AL2023_x86_64
- AL2_x86_64_GPU → AL2023_x86_64_GPU

References:
- https://docs.aws.amazon.com/eks/latest/userguide/eks-ami-deprecation-faqs.html

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@Adam-D-Lewis
Copy link
Member Author

Adam-D-Lewis commented Oct 23, 2025

I haven't tested this yet, but the code changes are minimal. I think we should deploy on AWS and do our standard release workflows to test.

@viniciusdc
Copy link
Contributor

This one is a very welcoming PR, thanks @Adam-D-Lewis. I can review this once Claude finishes this. This one is imporrtante since AWS will be deprecating those AMI soon

@viniciusdc viniciusdc self-requested a review October 23, 2025 16:15
@Adam-D-Lewis
Copy link
Member Author

Adam-D-Lewis commented Oct 23, 2025

This one is a very welcoming PR, thanks @Adam-D-Lewis. I can review this once Claude finishes this. This one is imporrtante since AWS will be deprecating those AMI soon

@viniciusdc Claude is done editing. If you want to take over testing that is welcome. Thank you!

@Adam-D-Lewis Adam-D-Lewis marked this pull request as ready for review October 23, 2025 16:21
@Adam-D-Lewis Adam-D-Lewis requested a review from a team as a code owner October 23, 2025 16:21
@Adam-D-Lewis Adam-D-Lewis requested review from dcmcand and removed request for a team October 23, 2025 16:21
@Adam-D-Lewis
Copy link
Member Author

Adam-D-Lewis commented Oct 27, 2025

The Keycloak PostgreSQL pod can't start because it's not getting volume to work with from the EBS CSI controller.

Looks like we can either set a custom launch template with http_put_response_hop_limit = 2 in the metadata_options or we can use IRSA to give the EBS CSI driver pod the needed permissions.

Relavant docs:

@Adam-D-Lewis
Copy link
Member Author

Adam-D-Lewis commented Oct 27, 2025

The autoscaler is also not working for the same reason.

Relevant docs

@Adam-D-Lewis
Copy link
Member Author

Working now, had to add a few recommended permissions. See https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md

"arn:${local.partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy",
"arn:${local.partition}:iam::aws:policy/AmazonEKS_CNI_Policy",
"arn:${local.partition}:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy",
aws_iam_policy.worker_autoscaling.arn
Copy link
Member Author

@Adam-D-Lewis Adam-D-Lewis Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the role is no longer needed on the node


repository = "https://kubernetes.github.io/autoscaler"
chart = "cluster-autoscaler"
version = "9.19.0"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also took the opportunity to update the version here

@Adam-D-Lewis
Copy link
Member Author

Unit Tests / Pytest (3.10) (pull_request) is failing due to a warning about the upcoming Python 3.10 EOL in a year. I've opened #3170 to mitigate that.

aws = {
source = "hashicorp/aws"
version = "5.33.0"
version = "6.18.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Comment on lines 228 to 236
if launch_template and getattr(launch_template, "ami_id", None):
return "CUSTOM"

if gpu_enabled:
return "AL2_x86_64_GPU"
return "AL2023_x86_64_NVIDIA"

return "AL2_x86_64"
return "AL2023_x86_64_STANDARD"


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check the nvidia works with those new ami's ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently waiting for quota increase approval since we're on OT Nebari dev AWS account now instead of QS.

@Adam-D-Lewis
Copy link
Member Author

@dcmcand recommends we use https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html instead of IRSA

@Adam-D-Lewis
Copy link
Member Author

Adam-D-Lewis commented Oct 28, 2025

@dcmcand recommends we use https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html instead of IRSA

I looked into it a bit (including #3171). It appears that IRSA is much closer to GKE Workload Identity and Azure Workload Identity. IRSA is still fully supported and already enabled on the cluster. I'm not opposed to migrating to Pod Identity on EKS, but I think it should be done in a separate issue and not be a blocker for this PR.

Even if we switch, we'd still need to maintain IRSA capability in the short term at least. Nebari MLflow plugin (and possibly others uses it).

@viniciusdc
Copy link
Contributor

Hey @Adam-D-Lewis thanks for working on this, what was the last status of this?

@Adam-D-Lewis
Copy link
Member Author

It's ready for review, @viniciusdc . I have kept the IRSA approach for now for the reasons I listed above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New 🚦

Development

Successfully merging this pull request may close these issues.

[BUG] - Deprecation warning for AWS: AL2 Images will no longer be published November 26th 2025

4 participants