feat: add example for GKE confidential nodes with GPU#2426
feat: add example for GKE confidential nodes with GPU#2426arthurlapertosa wants to merge 7 commits intoterraform-google-modules:mainfrom
Conversation
|
@apeabody could you please run the build for this PR? |
|
/gcbrun |
|
@apeabody I don't have access to the GCP cloud build project. Could you please send me the error? |
|
|
@apeabody could you please re-run the build? |
|
/gcbrun |
|
@apeabody I think the build wasn't properly triggered, could you please take a look? |
|
/gcbrun |
Might have been too quick after the merge, it's running now. |
apeabody
left a comment
There was a problem hiding this comment.
Thanks @arthurlapertosa!
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new example for creating a GKE cluster with confidential nodes and GPUs. This is a valuable addition. The changes include modifications to several Terraform modules to support confidential_instance_type and guest_accelerator configurations, along with the new example files and corresponding integration tests. The implementation is mostly correct, but I've found a few issues related to version constraints, external dependencies, and a bug in the for_each logic that need to be addressed.
|
@apeabody I don't have access to the build error, could you please send it to me? |
|
@apeabody could you please rerun the build and the checks? |
|
/gcbrun |
It's expired, triggered a fresh test |
Thanks - This change looks pretty good, but we still need to enable the new test. Can you please add to the cloudbuild similar to https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/pull/2458/files#diff-35f3e17c28a8a8de710f4be35fb5448d0e33c2d9b89fb2be7499b9830f890d12 |
|
@apeabody sorry for the delay. Can you run the build, please? |
|
/gcbrun |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new example for creating a GKE cluster with confidential nodes and GPUs. The implementation is solid, including updates to various modules to support confidential_instance_type and GPU configurations, and the addition of a comprehensive integration test. My review focuses on the new example files, where I've identified opportunities to simplify the Terraform code, remove unused resources, and improve documentation and test clarity. Overall, these are great additions that enhance the module's capabilities.
| cluster_type = "confidential-gpu" | ||
| network_name = "confidential-gpu-network-${random_string.suffix.result}" | ||
| subnet_name = "confidential-gpu-subnet" | ||
| master_auth_subnetwork = "confidential-gpu-master-subnet" |
| region = var.region | ||
| zones = var.zones | ||
| network = module.gcp-network.network_name | ||
| subnetwork = local.subnet_names[index(module.gcp-network.subnets_names, local.subnet_name)] |
| { | ||
| subnet_name = local.master_auth_subnetwork | ||
| subnet_ip = "10.60.0.0/17" | ||
| subnet_region = var.region | ||
| subnet_private_access = true | ||
| }, |
| output "location" { | ||
| value = module.gke.location | ||
| } |
| "nodeConfig.diskType", | ||
| "nodeConfig.enableConfidentialStorage", | ||
| "nodeConfig.machineType", | ||
| "nodeConfig.diskType", |
|
|
@apeabody |
Hi @arthurlapertosa - They are ephemeral projects, so unfortunately that isn't really feasible. Could you switch to something more like NVIDIA T4 which we should have quota? |
|
I made some tests, and apparently only the nvidia H100 is supported with confidential computing. Also the docs only say it's only possible with H100: https://docs.cloud.google.com/confidential-computing/confidential-vm/docs/create-a-confidential-vm-instance-with-gpu. |
Unfortunately the test projects are ephemeral. |
No description provided.