Skip to content

✨Add support for AMD SEV-SNP instances #5598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

fangge1212
Copy link

@fangge1212 fangge1212 commented Jul 23, 2025

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR adds support for AMD SEV-SNP instances.

Special notes for your reviewer:

  1. I'm unsure how best to represent AmdSevSNP enablement/disablement in AWSMachineSpec.
    Currently, I’ve introduced a cpuOptions struct with a enum type field AmdSevSNP(values: enabled, disabled), which is consistent with AWS's approach.
    Alternatively, we could use an enum with values like AmdSevSNP(and TDX in the fugure), which would make it easier to support other confidential computing technologies(I submitted another pr ✨ Add support for AMD SEV-SNP instances #5605). I'm not sure which one is better.
  2. Do we need to check the specified instance type is valid for AmdSevSnp?
    Currently, I hard coded the supported instance type list, but it may be a burden to keep the list up-to-date. Perhaps we skip the instance type validation and can rely on the AWS end to fail if a unsupported instance type is used for AmdSevSnp?

Checklist:

  • squashed commits
  • includes documentation
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:

Add support for AMD SEV-SNP instances

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Jul 23, 2025
Copy link

linux-foundation-easycla bot commented Jul 23, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: fangge1212 / name: Fangge Jin (bbe6376)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign neolit123 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Welcome @fangge1212!

It looks like this is your first PR to kubernetes-sigs/cluster-api-provider-aws 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-provider-aws has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 23, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @fangge1212. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@fangge1212 fangge1212 changed the title [WIP]Add support for AMD SEV-SNP instances [WIP]:sparkles:Add support for AMD SEV-SNP instances Jul 23, 2025
@fangge1212 fangge1212 changed the title [WIP]:sparkles:Add support for AMD SEV-SNP instances [WIP] ✨Add support for AMD SEV-SNP instances Jul 23, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jul 23, 2025
@fangge1212 fangge1212 force-pushed the amd_sev_snp branch 2 times, most recently from e68885d to bbf4d9c Compare July 24, 2025 10:08
@@ -116,6 +132,10 @@ type AWSMachineSpec struct {
// +kubebuilder:validation:MinLength:=2
InstanceType string `json:"instanceType"`

// CpuOptions is the set of cpu options for the instance
// +optional
CpuOptions *CpuOptions `json:"cpuOptions,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not personally sure about this one. It mimics the way it is structured on the AWS end, that's okay. But what if TDX support arrives at AWS at some point? I think it could be easier to extend an enum for confidential cpu/vm types (e.g. Disabled/SNP/TDX) than one boolean flag for each supported confidential VM type. This is also the approach that others cluster-api-providers went for IIUC.

In that case, I would personally propose a dedicated enum field such as ConfidentialComputing in the API, that would latter end up configuring the appropriate EC2 CpuOptions. However, let's wait for feedback for people that know about this API and codebase.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for more feedback before changing it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like CAPZ uses a pair of data fields (https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/azure/services/virtualmachines/spec_test.go#L114) and CAPG uses an enum (https://github.com/kubernetes-sigs/cluster-api-provider-gcp/blob/b0fd7d672dfb152eedb49be3dabcd7c9c6cb31fe/api/v1beta1/gcpmachine_types.go#L130).

Since these appear to be CPU extensions, is there ever a case where having both SNP and TDX is a valid configuration?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently TDX is unsupported for AWS instance. I think both of them can appear in the spec configuration, but only one of them can be enabled.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case TDX is supported in AWS at some point, having both SNP and TDX in the same instance should not be a valid configuration. SNP is an AMD technology, while TDX is Intel's.

func (r *AWSMachine) validateInstanceTypeForConfidentialCompute() field.ErrorList {
var allErrs field.ErrorList
if r.Spec.CpuOptions != nil {
if r.Spec.CpuOptions.AmdSevSnp != nil && *r.Spec.CpuOptions.AmdSevSnp && !slices.Contains(instanceTypesSupportingAmdSevsnp, r.Spec.InstanceType) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way of also verifying that an AMI with uefi or uefi-preferred boot modes is being configured for the instance? I couldn't find much following AMIReference.

Copy link
Author

@fangge1212 fangge1212 Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One method is to use the describe-images API to retrieve the AMI BootMode:

# aws ec2 describe-images --image-ids ami-0fe07c1aadd2e4ac9 \
  --query 'Images[*].BootMode'
[
    "uefi-preferred"
]

But AFAIK, it is not possible to do this in webhook

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks!

Yes, we would need an ec2 client, which does not seem to be available within this context.

Comment on lines 566 to 571
if i.CpuOptions != nil {
input.CpuOptions = &ec2.CpuOptionsRequest{}

if i.CpuOptions.AmdSevSnp != nil {
val := ec2.AmdSevSnpSpecificationDisabled
if *i.CpuOptions.AmdSevSnp {
val = ec2.AmdSevSnpSpecificationEnabled
}
input.CpuOptions.AmdSevSnp = aws.String(val)
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a style nit: could we instead just focus on the condition that we are interested in? such as

Suggested change
if i.CpuOptions != nil {
input.CpuOptions = &ec2.CpuOptionsRequest{}
if i.CpuOptions.AmdSevSnp != nil {
val := ec2.AmdSevSnpSpecificationDisabled
if *i.CpuOptions.AmdSevSnp {
val = ec2.AmdSevSnpSpecificationEnabled
}
input.CpuOptions.AmdSevSnp = aws.String(val)
}
}
if i.CpuOptions != nil && i.CpuOptions.AmdSevSnp != nil && *i.CpuOptions.AmdSevSnp {
input.CpuOptions = &ec2.CpuOptionsRequest{
AmdSevSnp: aws.String(ec2.AmdSevSnpSpecificationEnabled),
}
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. One question: If user sets CpuOptions.AmdSevSnp to false in AWSMachineSpec, do we need to reflect this in runInstancesInput? Or just leave it to default?
  2. I expect to add other confidential computing technology in a same "if i.CpuOptions != nil" condition, so I have such logic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to use a dedicated enum field such as ConfidentialComputing in the API, I will use switch-case logic here.
If we decide to use CpuOptions to mimic the way on the AWS end, I will change the type of CpuOptions.AmdSevSnp to string to make it consistent with the AWS end, so no type convesion is needed here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question: If user sets CpuOptions.AmdSevSnp to false in AWSMachineSpec, do we need to reflect this in runInstancesInput? Or just leave it to default?

We expect CpuOptions.AmdSevSnp = false to be the default, so nothing should change if the user sets it explicitly. Right now the logic does not set any SEV-SNP field in the runInstancesInput struct, so I wouldn't be concerned about reflecting that configuration.

I expect to add other confidential computing technology in a same "if i.CpuOptions != nil" condition, so I have such logic.
If we decide to use a dedicated enum field such as ConfidentialComputing in the API, I will use switch-case logic here.

That makes sense,

I will change the type of CpuOptions.AmdSevSnp to string to make it consistent with the AWS end, so no type convesion is needed here.

Okay, let's see what maintainers think about those API enum opetion naming conventions.

@fangge1212 fangge1212 changed the title [WIP] ✨Add support for AMD SEV-SNP instances ✨Add support for AMD SEV-SNP instances Jul 25, 2025
@fangge1212 fangge1212 marked this pull request as ready for review July 25, 2025 01:20
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 25, 2025
@k8s-ci-robot k8s-ci-robot requested review from damdo and fiunchinho July 25, 2025 01:20
@nrb
Copy link
Contributor

nrb commented Jul 29, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 29, 2025

// Confidentail computing support depends on the instance type.
// Only certain instance types in M6a, R6a and C6a series support AMD SEV-SNP. Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/sev-snp.html
var (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While having this list allows us to avoid an API call to validate whether the instance type is supported or not, it's also going to be a burden to maintain and keep up-to-date.

I wonder if we could describe the instance and check for support via the returned API object, raising an error if it's not present, something like how CAPZ does it, though obviously their data types are different.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CAPZ uses cache to store the sku data, and then get data from the cache. It seems a lot of code changes if we want to do the same.
Or we don't do the check and rely on the AWS end to fail when creating the instance?

@fangge1212
Copy link
Author

fangge1212 commented Jul 30, 2025

/retest

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 30, 2025
@fangge1212 fangge1212 force-pushed the amd_sev_snp branch 2 times, most recently from 4d01d6f to c1591a8 Compare July 31, 2025 02:17
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 31, 2025
@fangge1212
Copy link
Author

/retest

@fangge1212 fangge1212 force-pushed the amd_sev_snp branch 5 times, most recently from 4423997 to 37b4865 Compare July 31, 2025 04:25
This commit adds support for AMD SEV-SNP instances, so users can
utilize confidential computing technology on cluster nodes.

Signed-off-by: Fangge Jin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants