✨Add support for AMD SEV-SNP instances #5598

fangge1212 · 2025-07-23T10:02:34Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR adds support for AMD SEV-SNP instances.

Special notes for your reviewer:

I'm unsure how best to represent AmdSevSNP enablement/disablement in AWSMachineSpec.
Currently, I’ve introduced a cpuOptions struct with a enum type field AmdSevSNP(values: enabled, disabled), which is consistent with AWS's approach.
Alternatively, we could use an enum with values like AmdSevSNP(and TDX in the fugure), which would make it easier to support other confidential computing technologies(I submitted another pr ✨ Add support for AMD SEV-SNP instances #5605). I'm not sure which one is better.
Do we need to check the specified instance type is valid for AmdSevSnp?
Currently, I hard coded the supported instance type list, but it may be a burden to keep the list up-to-date. Perhaps we skip the instance type validation and can rely on the AWS end to fail if a unsupported instance type is used for AmdSevSnp?

Checklist:

Release note:

Add support for AMD SEV-SNP instances

linux-foundation-easycla · 2025-07-23T10:02:41Z

The committers listed above are authorized under a signed CLA.

✅ login: fangge1212 / name: Fangge Jin (bbe6376)

k8s-ci-robot · 2025-07-23T10:02:42Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign neolit123 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-07-23T10:02:43Z

Welcome @fangge1212!

It looks like this is your first PR to kubernetes-sigs/cluster-api-provider-aws 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-provider-aws has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-07-23T10:02:44Z

Hi @fangge1212. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

bgartzi · 2025-07-23T15:46:40Z

api/v1beta2/awsmachine_types.go


+	// CpuOptions is the set of cpu options for the instance
+	// +optional
+	CpuOptions *CpuOptions `json:"cpuOptions,omitempty"`


I'm not personally sure about this one. It mimics the way it is structured on the AWS end, that's okay. But what if TDX support arrives at AWS at some point? I think it could be easier to extend an enum for confidential cpu/vm types (e.g. Disabled/SNP/TDX) than one boolean flag for each supported confidential VM type. This is also the approach that others cluster-api-providers went for IIUC.

In that case, I would personally propose a dedicated enum field such as ConfidentialComputing in the API, that would latter end up configuring the appropriate EC2 CpuOptions. However, let's wait for feedback for people that know about this API and codebase.

Wait for more feedback before changing it

It looks like CAPZ uses a pair of data fields (https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/azure/services/virtualmachines/spec_test.go#L114) and CAPG uses an enum (https://github.com/kubernetes-sigs/cluster-api-provider-gcp/blob/b0fd7d672dfb152eedb49be3dabcd7c9c6cb31fe/api/v1beta1/gcpmachine_types.go#L130).

Since these appear to be CPU extensions, is there ever a case where having both SNP and TDX is a valid configuration?

Currently TDX is unsupported for AWS instance. I think both of them can appear in the spec configuration, but only one of them can be enabled.

In case TDX is supported in AWS at some point, having both SNP and TDX in the same instance should not be a valid configuration. SNP is an AMD technology, while TDX is Intel's.

bgartzi · 2025-07-23T15:50:46Z

api/v1beta2/awsmachine_webhook.go

+func (r *AWSMachine) validateInstanceTypeForConfidentialCompute() field.ErrorList {
+	var allErrs field.ErrorList
+	if r.Spec.CpuOptions != nil {
+		if r.Spec.CpuOptions.AmdSevSnp != nil && *r.Spec.CpuOptions.AmdSevSnp && !slices.Contains(instanceTypesSupportingAmdSevsnp, r.Spec.InstanceType) {


Is there a way of also verifying that an AMI with uefi or uefi-preferred boot modes is being configured for the instance? I couldn't find much following AMIReference.

One method is to use the describe-images API to retrieve the AMI BootMode:

# aws ec2 describe-images --image-ids ami-0fe07c1aadd2e4ac9 \ --query 'Images[*].BootMode' [ "uefi-preferred" ]

But AFAIK, it is not possible to do this in webhook

Okay, thanks!

Yes, we would need an ec2 client, which does not seem to be available within this context.

bgartzi · 2025-07-24T10:46:37Z

pkg/cloud/services/ec2/instances.go

+	if i.CpuOptions != nil {
+		input.CpuOptions = &ec2.CpuOptionsRequest{}
+
+		if i.CpuOptions.AmdSevSnp != nil {
+			val := ec2.AmdSevSnpSpecificationDisabled
+			if *i.CpuOptions.AmdSevSnp {
+				val = ec2.AmdSevSnpSpecificationEnabled
+			}
+			input.CpuOptions.AmdSevSnp = aws.String(val)
+		}
+	}


Just a style nit: could we instead just focus on the condition that we are interested in? such as

Suggested change

if i.CpuOptions != nil {

input.CpuOptions = &ec2.CpuOptionsRequest{}

if i.CpuOptions.AmdSevSnp != nil {

val := ec2.AmdSevSnpSpecificationDisabled

if *i.CpuOptions.AmdSevSnp {

val = ec2.AmdSevSnpSpecificationEnabled

}

input.CpuOptions.AmdSevSnp = aws.String(val)

}

}

if i.CpuOptions != nil && i.CpuOptions.AmdSevSnp != nil && *i.CpuOptions.AmdSevSnp {

input.CpuOptions = &ec2.CpuOptionsRequest{

AmdSevSnp: aws.String(ec2.AmdSevSnpSpecificationEnabled),

}

}

One question: If user sets CpuOptions.AmdSevSnp to false in AWSMachineSpec, do we need to reflect this in runInstancesInput? Or just leave it to default?

I expect to add other confidential computing technology in a same "if i.CpuOptions != nil" condition, so I have such logic.

If we decide to use a dedicated enum field such as ConfidentialComputing in the API, I will use switch-case logic here.
If we decide to use CpuOptions to mimic the way on the AWS end, I will change the type of CpuOptions.AmdSevSnp to string to make it consistent with the AWS end, so no type convesion is needed here.

One question: If user sets CpuOptions.AmdSevSnp to false in AWSMachineSpec, do we need to reflect this in runInstancesInput? Or just leave it to default?

We expect CpuOptions.AmdSevSnp = false to be the default, so nothing should change if the user sets it explicitly. Right now the logic does not set any SEV-SNP field in the runInstancesInput struct, so I wouldn't be concerned about reflecting that configuration.

I expect to add other confidential computing technology in a same "if i.CpuOptions != nil" condition, so I have such logic.
If we decide to use a dedicated enum field such as ConfidentialComputing in the API, I will use switch-case logic here.

That makes sense,

I will change the type of CpuOptions.AmdSevSnp to string to make it consistent with the AWS end, so no type convesion is needed here.

Okay, let's see what maintainers think about those API enum opetion naming conventions.

nrb · 2025-07-29T20:22:12Z

/ok-to-test

nrb · 2025-07-29T21:19:31Z

api/v1beta2/awsmachine_types.go

+
+// Confidentail computing support depends on the instance type.
+// Only certain instance types in M6a, R6a and C6a series support AMD SEV-SNP. Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/sev-snp.html
+var (


While having this list allows us to avoid an API call to validate whether the instance type is supported or not, it's also going to be a burden to maintain and keep up-to-date.

I wonder if we could describe the instance and check for support via the returned API object, raising an error if it's not present, something like how CAPZ does it, though obviously their data types are different.

CAPZ uses cache to store the sku data, and then get data from the cache. It seems a lot of code changes if we want to do the same.
Or we don't do the check and rely on the AWS end to fail when creating the instance?

fangge1212 · 2025-07-30T08:47:55Z

/retest

fangge1212 · 2025-07-31T03:10:36Z

/retest

This commit adds support for AMD SEV-SNP instances, so users can utilize confidential computing technology on cluster nodes. Signed-off-by: Fangge Jin <[email protected]>

k8s-ci-robot added the needs-priority label Jul 23, 2025

k8s-ci-robot requested review from Ankitasw and dlipovetsky July 23, 2025 10:02

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 23, 2025

fangge1212 changed the title ~~[WIP]Add support for AMD SEV-SNP instances~~ [WIP]:sparkles:Add support for AMD SEV-SNP instances Jul 23, 2025

fangge1212 changed the title ~~[WIP]:sparkles:Add support for AMD SEV-SNP instances~~ [WIP] ✨Add support for AMD SEV-SNP instances Jul 23, 2025

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jul 23, 2025

fangge1212 force-pushed the amd_sev_snp branch 2 times, most recently from e68885d to bbf4d9c Compare July 24, 2025 10:08

bgartzi reviewed Jul 24, 2025

View reviewed changes

fangge1212 changed the title ~~[WIP] ✨Add support for AMD SEV-SNP instances~~ ✨Add support for AMD SEV-SNP instances Jul 25, 2025

fangge1212 marked this pull request as ready for review July 25, 2025 01:20

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 25, 2025

k8s-ci-robot requested review from damdo and fiunchinho July 25, 2025 01:20

fangge1212 mentioned this pull request Jul 28, 2025

Support AMD SEV-SNP on AWS openshift/api#2424

Merged

fangge1212 force-pushed the amd_sev_snp branch from bbf4d9c to 7f675da Compare July 29, 2025 07:42

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 29, 2025

nrb reviewed Jul 29, 2025

View reviewed changes

fangge1212 force-pushed the amd_sev_snp branch from 7f675da to e4a06eb Compare July 30, 2025 11:57

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 30, 2025

fangge1212 force-pushed the amd_sev_snp branch 2 times, most recently from 4d01d6f to c1591a8 Compare July 31, 2025 02:17

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 31, 2025

fangge1212 force-pushed the amd_sev_snp branch from c1591a8 to 102a876 Compare July 31, 2025 02:47

fangge1212 force-pushed the amd_sev_snp branch 5 times, most recently from 4423997 to 37b4865 Compare July 31, 2025 04:25

Add support for AMD SEV-SNP instances

bbe6376

This commit adds support for AMD SEV-SNP instances, so users can utilize confidential computing technology on cluster nodes. Signed-off-by: Fangge Jin <[email protected]>

fangge1212 force-pushed the amd_sev_snp branch from 37b4865 to bbe6376 Compare August 1, 2025 01:19

fangge1212 mentioned this pull request Aug 1, 2025

✨ Add support for AMD SEV-SNP instances #5605

Merged

4 tasks

fangge1212 closed this Aug 6, 2025

✨Add support for AMD SEV-SNP instances #5598

✨Add support for AMD SEV-SNP instances #5598

Uh oh!

Conversation

fangge1212 commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linux-foundation-easycla bot commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jul 23, 2025

Uh oh!

k8s-ci-robot commented Jul 23, 2025

Uh oh!

k8s-ci-robot commented Jul 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fangge1212 Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nrb commented Jul 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fangge1212 commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fangge1212 commented Jul 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fangge1212 commented Jul 23, 2025 •

edited

Loading

linux-foundation-easycla bot commented Jul 23, 2025 •

edited

Loading

fangge1212 Jul 25, 2025 •

edited

Loading

fangge1212 commented Jul 30, 2025 •

edited

Loading