Add GPU/Accelerator support to VMs in GCPMachineTemplate #1341

jwmay2012 · 2024-10-22T17:30:56Z

What type of PR is this?

/kind feature

What this PR does / why we need it:
Adds the ability to configure Guest Accelerators like GPUs in a GCPMachineTemplate
Fixes #289

Special notes for your reviewer:
Tested and creates machines with GPUs correctly. After installing drivers and nvidia container runtime on the node, was able to get the GPU to run successfully in a Pod.
If you try to use an accelerator on the wrong instance type it will have an instance reconcile error from GCP that describes the improper API use.

OnHostMaintenance must be set to TERMINATE for GPU enabled machines.
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#guest_accelerator
Confirmed this is correct. Instance reconcile is rejected by GCP otherwise.
I set this field automatically.

TODOs:

squashed commits
includes documentation
adds unit tests

Release note:

Add GPU/Accelerator support for VMs in GCPMachineTemplate

k8s-ci-robot · 2024-10-22T17:31:06Z

Hi @jwmay2012. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

netlify · 2024-10-22T17:31:14Z

✅ Deploy Preview for kubernetes-sigs-cluster-api-gcp ready!

Name	Link
🔨 Latest commit	`aed6b4c`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-sigs-cluster-api-gcp/deploys/68d2c82892b2350008155960
😎 Deploy Preview	https://deploy-preview-1341--kubernetes-sigs-cluster-api-gcp.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

salasberryfin · 2024-10-22T19:31:41Z

Thanks @jwmay2012

/ok-to-test

richardcase · 2024-10-23T12:04:37Z

@jwmay2012 - would you be able to run make lint on this change?

reyvonger · 2024-12-11T18:14:43Z

Could you please provide an estimate of when this change might be included in a release?

salasberryfin

Just a minor comment, otherwise it looks good to me.

cloud/scope/machine.go

salasberryfin · 2025-01-02T09:32:06Z

Thanks @jwmay2012.

/lgtm

jwmay2012 · 2025-01-09T19:12:57Z

We good to merge? Been running a custom CAPG with these changes for a while and would love to get this upstream :)

damdo · 2025-01-27T14:29:17Z

@richardcase are you happy with this? If so would you be able to stamp your approval on it?
Thanks!

cpanato

/hold defer to @richardcase

k8s-ci-robot · 2025-01-27T14:30:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cpanato, jwmay2012

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [cpanato]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-triage-robot · 2025-04-27T15:15:45Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

reyvonger · 2025-04-27T19:49:47Z

bump

damdo · 2025-05-21T07:58:26Z

/cc @elmiko

elmiko

makes sense to me

k8s-triage-robot · 2025-06-26T16:24:49Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

reyvonger · 2025-06-26T16:25:59Z

bump

elmiko · 2025-06-30T19:23:18Z

/remove-lifecycle rotten

reyvonger · 2025-07-15T14:39:20Z

@dims @richardcase
help

damdo

Richard is not a maintainer of the project anymore.
Considering @salasberryfin and @cpanato are happy with it, I am happy to remove the hold.

/unhold

damdo · 2025-07-15T15:40:26Z

@jwmay2012 could you please rebase? Thanks!

damdo · 2025-07-15T15:40:41Z

Or @reyvonger

damdo

/lgtm

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 22, 2024

k8s-ci-robot requested review from dims and richardcase October 22, 2024 17:31

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 22, 2024

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 22, 2024

jwmay2012 force-pushed the guest-accelerators branch from cddf743 to b0eb05e Compare October 22, 2024 17:34

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 22, 2024

jwmay2012 force-pushed the guest-accelerators branch from b0eb05e to 7b8e870 Compare October 24, 2024 15:39

salasberryfin reviewed Dec 19, 2024

View reviewed changes

cloud/scope/machine.go Show resolved Hide resolved

jwmay2012 force-pushed the guest-accelerators branch from 7b8e870 to 2358039 Compare December 19, 2024 16:35

k8s-ci-robot assigned salasberryfin Jan 2, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 2, 2025

cpanato approved these changes Jan 27, 2025

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 27, 2025

k8s-ci-robot assigned cpanato Jan 27, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 27, 2025

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 27, 2025

k8s-ci-robot requested a review from elmiko May 21, 2025 07:58

elmiko reviewed May 27, 2025

View reviewed changes

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 26, 2025

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 30, 2025

damdo reviewed Jul 15, 2025

View reviewed changes

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 15, 2025

Add GPU/Accelerator support to VMs

aed6b4c

jwmay2012 force-pushed the guest-accelerators branch from 2358039 to aed6b4c Compare September 23, 2025 16:17

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 23, 2025

damdo reviewed Sep 23, 2025

View reviewed changes

k8s-ci-robot assigned damdo Sep 23, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 23, 2025

k8s-ci-robot merged commit 5f2aa05 into kubernetes-sigs:main Sep 23, 2025
17 checks passed

Add GPU/Accelerator support to VMs in GCPMachineTemplate #1341

Add GPU/Accelerator support to VMs in GCPMachineTemplate #1341

Conversation

jwmay2012 commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Oct 22, 2024

Uh oh!

netlify bot commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-cluster-api-gcp ready!

Uh oh!

salasberryfin commented Oct 22, 2024

Uh oh!

richardcase commented Oct 23, 2024

Uh oh!

reyvonger commented Dec 11, 2024

Uh oh!

salasberryfin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

salasberryfin commented Jan 2, 2025

Uh oh!

jwmay2012 commented Jan 9, 2025

Uh oh!

damdo commented Jan 27, 2025

Uh oh!

cpanato left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Jan 27, 2025

Uh oh!

k8s-triage-robot commented Apr 27, 2025

Uh oh!

reyvonger commented Apr 27, 2025

Uh oh!

damdo commented May 21, 2025

Uh oh!

elmiko left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-triage-robot commented Jun 26, 2025

Uh oh!

reyvonger commented Jun 26, 2025

Uh oh!

elmiko commented Jun 30, 2025

Uh oh!

reyvonger commented Jul 15, 2025

Uh oh!

damdo left a comment

Choose a reason for hiding this comment

Uh oh!

damdo commented Jul 15, 2025

Uh oh!

damdo commented Jul 15, 2025

Uh oh!

damdo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

jwmay2012 commented Oct 22, 2024 •

edited

Loading

netlify bot commented Oct 22, 2024 •

edited

Loading