Merge pull request #42408 from jeana-redhat/OSDOCS-3278_GPU_on_GCP

jeana-redhat · web-flow · commit d55e7e77b205 · 2022-03-02T14:42:48.000-05:00
OSDOCS-3278: GPU support for GCP
diff --git a/machine_management/creating_machinesets/creating-machineset-gcp.adoc b/machine_management/creating_machinesets/creating-machineset-gcp.adoc
@@ -21,3 +21,5 @@ include::modules/machineset-non-guaranteed-instance.adoc[leveloffset=+1]
 include::modules/machineset-creating-non-guaranteed-instances.adoc[leveloffset=+1]
 
 include::modules/machineset-enabling-customer-managed-encryption.adoc[leveloffset=+1]
+
+include::modules/machineset-gcp-enabling-gpu-support.adoc[leveloffset=+1]
diff --git a/modules/machineset-gcp-enabling-gpu-support.adoc b/modules/machineset-gcp-enabling-gpu-support.adoc
@@ -0,0 +1,110 @@
+// Module included in the following assemblies:
+//
+// * machine_management/creating_machinesets/creating-machineset-gcp.adoc
+
+:_content-type: PROCEDURE
+[id="machineset-gcp-enabling-gpu-support_{context}"]
+= Enabling GPU support for a machine set
+
+Google Cloud Platform (GCP) Compute Engine enables users to add GPUs to VM instances. Workloads that benefit from access to GPU resources can perform better on compute machines with this feature enabled. {product-title} on GCP supports NVIDIA GPU models in the A2 and N1 machine series.
+
+.Supported GPU configurations
+|====
+|Model name |GPU type |Machine types ^[1]^
+
+|NVIDIA A100
+|`nvidia-tesla-a100`
+a|* `a2-highgpu-1g`
+* `a2-highgpu-2g`
+* `a2-highgpu-4g`
+* `a2-highgpu-8g`
+* `a2-megagpu-16g`
+
+|NVIDIA K80
+|`nvidia-tesla-k80`
+.5+a|* `n1-standard-1`
+* `n1-standard-2`
+* `n1-standard-4`
+* `n1-standard-8`
+* `n1-standard-16`
+* `n1-standard-32`
+* `n1-standard-64`
+* `n1-standard-96`
+* `n1-highmem-2`
+* `n1-highmem-4`
+* `n1-highmem-8`
+* `n1-highmem-16`
+* `n1-highmem-32`
+* `n1-highmem-64`
+* `n1-highmem-96`
+* `n1-highcpu-2`
+* `n1-highcpu-4`
+* `n1-highcpu-8`
+* `n1-highcpu-16`
+* `n1-highcpu-32`
+* `n1-highcpu-64`
+* `n1-highcpu-96`
+
+|NVIDIA P100
+|`nvidia-tesla-p100`
+
+|NVIDIA P4
+|`nvidia-tesla-p4`
+
+|NVIDIA T4
+|`nvidia-tesla-t4`
+
+|NVIDIA V100
+|`nvidia-tesla-v100`
+
+|====
+[.small]
+--
+1. For more information about machine types, including specifications, compatibility, regional availability, and limitations, see the GCP Compute Engine documentation about link:https://cloud.google.com/compute/docs/general-purpose-machines#n1_machines[N1 machine series], link:https://cloud.google.com/compute/docs/accelerator-optimized-machines#a2_vms[A2 machine series], and link:https://cloud.google.com/compute/docs/gpus/gpu-regions-zones#gpu_regions_and_zones[GPU regions and zones availability].
+--
+
+You can define which supported GPU to use for an instance by using the Machine API.
+
+You can configure machines in the N1 machine series to deploy with one of the supported GPU types. Machines in the A2 machine series come with associated GPUs, and cannot use guest accelerators.
+
+[NOTE]
+====
+GPUs for graphics workloads are not supported.
+====
+
+.Procedure
+
+. In a text editor, open the YAML file for an existing machine set or create a new one.
+
+. Specify a GPU configuration under the `providerSpec` field in your machine set YAML file. See the following examples of valid configurations:
++
+.Example configuration for the A2 machine series:
+[source,yaml]
+----
+  providerSpec:
+    value:
+      machineType: a2-highgpu-1g <1>
+      onHostMaintenance: Terminate <2>
+      restartPolicy: Always <3>
+----
+<1> Specify the machine type. Ensure that the machine type is included in the the A2 machine series.
+<2> When using GPU support, you must set `onHostMaintenance` to `Terminate`.
+<3> Specify the restart policy for machines deployed by the machine set. Allowed values are `Always` or `Never`.
++
+.Example configuration for the N1 machine series:
+[source,yaml]
+----
+providerSpec:
+  value:
+    gpus:
+    - count: 1 <1>
+      type: nvidia-tesla-p100 <2>
+    machineType: n1-standard-1 <3>
+    onHostMaintenance: Terminate <4>
+    restartPolicy: Always <5>
+----
+<1> Specify the number of GPUs to attach to the machine.
+<2> Specify the type of GPUs to attach to the machine. Ensure that the machine type and GPU type are compatible.
+<3> Specify the machine type. Ensure that the machine type and GPU type are compatible.
+<4> When using GPU support, you must set `onHostMaintenance` to `Terminate`.
+<5> Specify the restart policy for machines deployed by the machine set. Allowed values are `Always` or `Never`.