-
Notifications
You must be signed in to change notification settings - Fork 623
✨Add support for AMD SEV-SNP instances #5598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @fangge1212! |
Hi @fangge1212. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
e68885d
to
bbf4d9c
Compare
api/v1beta2/awsmachine_types.go
Outdated
@@ -116,6 +132,10 @@ type AWSMachineSpec struct { | |||
// +kubebuilder:validation:MinLength:=2 | |||
InstanceType string `json:"instanceType"` | |||
|
|||
// CpuOptions is the set of cpu options for the instance | |||
// +optional | |||
CpuOptions *CpuOptions `json:"cpuOptions,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not personally sure about this one. It mimics the way it is structured on the AWS end, that's okay. But what if TDX support arrives at AWS at some point? I think it could be easier to extend an enum for confidential cpu/vm types (e.g. Disabled
/SNP
/TDX
) than one boolean flag for each supported confidential VM type. This is also the approach that others cluster-api-providers went for IIUC.
In that case, I would personally propose a dedicated enum field such as ConfidentialComputing in the API, that would latter end up configuring the appropriate EC2 CpuOptions. However, let's wait for feedback for people that know about this API and codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait for more feedback before changing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like CAPZ uses a pair of data fields (https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/azure/services/virtualmachines/spec_test.go#L114) and CAPG uses an enum (https://github.com/kubernetes-sigs/cluster-api-provider-gcp/blob/b0fd7d672dfb152eedb49be3dabcd7c9c6cb31fe/api/v1beta1/gcpmachine_types.go#L130).
Since these appear to be CPU extensions, is there ever a case where having both SNP and TDX is a valid configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently TDX is unsupported for AWS instance. I think both of them can appear in the spec configuration, but only one of them can be enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case TDX is supported in AWS at some point, having both SNP and TDX in the same instance should not be a valid configuration. SNP is an AMD technology, while TDX is Intel's.
api/v1beta2/awsmachine_webhook.go
Outdated
func (r *AWSMachine) validateInstanceTypeForConfidentialCompute() field.ErrorList { | ||
var allErrs field.ErrorList | ||
if r.Spec.CpuOptions != nil { | ||
if r.Spec.CpuOptions.AmdSevSnp != nil && *r.Spec.CpuOptions.AmdSevSnp && !slices.Contains(instanceTypesSupportingAmdSevsnp, r.Spec.InstanceType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way of also verifying that an AMI with uefi
or uefi-preferred
boot modes is being configured for the instance? I couldn't find much following AMIReference
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One method is to use the describe-images API to retrieve the AMI BootMode:
# aws ec2 describe-images --image-ids ami-0fe07c1aadd2e4ac9 \
--query 'Images[*].BootMode'
[
"uefi-preferred"
]
But AFAIK, it is not possible to do this in webhook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks!
Yes, we would need an ec2 client, which does not seem to be available within this context.
pkg/cloud/services/ec2/instances.go
Outdated
if i.CpuOptions != nil { | ||
input.CpuOptions = &ec2.CpuOptionsRequest{} | ||
|
||
if i.CpuOptions.AmdSevSnp != nil { | ||
val := ec2.AmdSevSnpSpecificationDisabled | ||
if *i.CpuOptions.AmdSevSnp { | ||
val = ec2.AmdSevSnpSpecificationEnabled | ||
} | ||
input.CpuOptions.AmdSevSnp = aws.String(val) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a style nit: could we instead just focus on the condition that we are interested in? such as
if i.CpuOptions != nil { | |
input.CpuOptions = &ec2.CpuOptionsRequest{} | |
if i.CpuOptions.AmdSevSnp != nil { | |
val := ec2.AmdSevSnpSpecificationDisabled | |
if *i.CpuOptions.AmdSevSnp { | |
val = ec2.AmdSevSnpSpecificationEnabled | |
} | |
input.CpuOptions.AmdSevSnp = aws.String(val) | |
} | |
} | |
if i.CpuOptions != nil && i.CpuOptions.AmdSevSnp != nil && *i.CpuOptions.AmdSevSnp { | |
input.CpuOptions = &ec2.CpuOptionsRequest{ | |
AmdSevSnp: aws.String(ec2.AmdSevSnpSpecificationEnabled), | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- One question: If user sets CpuOptions.AmdSevSnp to false in AWSMachineSpec, do we need to reflect this in runInstancesInput? Or just leave it to default?
- I expect to add other confidential computing technology in a same "if i.CpuOptions != nil" condition, so I have such logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to use a dedicated enum field such as ConfidentialComputing in the API, I will use switch-case logic here.
If we decide to use CpuOptions to mimic the way on the AWS end, I will change the type of CpuOptions.AmdSevSnp to string to make it consistent with the AWS end, so no type convesion is needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question: If user sets CpuOptions.AmdSevSnp to false in AWSMachineSpec, do we need to reflect this in runInstancesInput? Or just leave it to default?
We expect CpuOptions.AmdSevSnp = false
to be the default, so nothing should change if the user sets it explicitly. Right now the logic does not set any SEV-SNP field in the runInstancesInput struct, so I wouldn't be concerned about reflecting that configuration.
I expect to add other confidential computing technology in a same "if i.CpuOptions != nil" condition, so I have such logic.
If we decide to use a dedicated enum field such as ConfidentialComputing in the API, I will use switch-case logic here.
That makes sense,
I will change the type of CpuOptions.AmdSevSnp to string to make it consistent with the AWS end, so no type convesion is needed here.
Okay, let's see what maintainers think about those API enum opetion naming conventions.
/ok-to-test |
|
||
// Confidentail computing support depends on the instance type. | ||
// Only certain instance types in M6a, R6a and C6a series support AMD SEV-SNP. Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/sev-snp.html | ||
var ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While having this list allows us to avoid an API call to validate whether the instance type is supported or not, it's also going to be a burden to maintain and keep up-to-date.
I wonder if we could describe the instance and check for support via the returned API object, raising an error if it's not present, something like how CAPZ does it, though obviously their data types are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CAPZ uses cache to store the sku data, and then get data from the cache. It seems a lot of code changes if we want to do the same.
Or we don't do the check and rely on the AWS end to fail when creating the instance?
/retest |
4d01d6f
to
c1591a8
Compare
/retest |
4423997
to
37b4865
Compare
This commit adds support for AMD SEV-SNP instances, so users can utilize confidential computing technology on cluster nodes. Signed-off-by: Fangge Jin <[email protected]>
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR adds support for AMD SEV-SNP instances.
Special notes for your reviewer:
Currently, I’ve introduced a cpuOptions struct with a enum type field AmdSevSNP(values: enabled, disabled), which is consistent with AWS's approach.
Alternatively, we could use an enum with values like AmdSevSNP(and TDX in the fugure), which would make it easier to support other confidential computing technologies(I submitted another pr ✨ Add support for AMD SEV-SNP instances #5605). I'm not sure which one is better.
Currently, I hard coded the supported instance type list, but it may be a burden to keep the list up-to-date. Perhaps we skip the instance type validation and can rely on the AWS end to fail if a unsupported instance type is used for AmdSevSnp?
Checklist:
Release note: