|
| 1 | +// Module included in the following assemblies: |
| 2 | +// |
| 3 | +// * machine_management/creating-machinesets/creating-machineset-aws.adoc |
| 4 | + |
| 5 | +:_content-type: PROCEDURE |
| 6 | +[id="nvidia-gpu-aws-adding-a-gpu-node_{context}"] |
| 7 | += Adding a GPU node to an existing {product-title} cluster |
| 8 | + |
| 9 | +You can copy and modify a default compute machine set configuration to create a GPU-enabled machine set and machines for the AWS EC2 cloud provider. |
| 10 | + |
| 11 | +The following table lists the validated instance types: |
| 12 | + |
| 13 | +[cols="1,1,1,1"] |
| 14 | +|=== |
| 15 | +|Instance type |NVIDIA GPU accelerator |Maximum number of GPUs |Architecture |
| 16 | + |
| 17 | +|`p4d.24xlarge` |
| 18 | +|A100 |
| 19 | +|8 |
| 20 | +|x86 |
| 21 | + |
| 22 | +|`g4dn.xlarge` |
| 23 | +|T4 |
| 24 | +|1 |
| 25 | +|x86 |
| 26 | +|=== |
| 27 | + |
| 28 | +.Procedure |
| 29 | + |
| 30 | +. View the existing nodes, machines, and machine sets by running the following command. Note that each node is an instance of a machine definition with a specific AWS region and {product-title} role. |
| 31 | ++ |
| 32 | +[source,terminal] |
| 33 | +---- |
| 34 | +$ oc get nodes |
| 35 | +---- |
| 36 | ++ |
| 37 | +.Example output |
| 38 | ++ |
| 39 | +[source,terminal] |
| 40 | +---- |
| 41 | +NAME STATUS ROLES AGE VERSION |
| 42 | +ip-10-0-52-50.us-east-2.compute.internal Ready worker 3d17h v1.25.4+86bd4ff |
| 43 | +ip-10-0-58-24.us-east-2.compute.internal Ready control-plane,master 3d17h v1.25.4+86bd4ff |
| 44 | +ip-10-0-68-148.us-east-2.compute.internal Ready worker 3d17h v1.25.4+86bd4ff |
| 45 | +ip-10-0-68-68.us-east-2.compute.internal Ready control-plane,master 3d17h v1.25.4+86bd4ff |
| 46 | +ip-10-0-72-170.us-east-2.compute.internal Ready control-plane,master 3d17h v1.25.4+86bd4ff |
| 47 | +ip-10-0-74-50.us-east-2.compute.internal Ready worker 3d17h v1.25.4+86bd4ff |
| 48 | +---- |
| 49 | + |
| 50 | +. View the machines and machine sets that exist in the `openshift-machine-api` namespace by running the following command. Each compute machine set is associated with a different availability zone within the AWS region. The installer automatically load balances compute machines across availability zones. |
| 51 | ++ |
| 52 | +[source,terminal] |
| 53 | +---- |
| 54 | +$ oc get machinesets -n openshift-machine-api |
| 55 | +---- |
| 56 | ++ |
| 57 | +.Example output |
| 58 | ++ |
| 59 | +[source,terminal] |
| 60 | +---- |
| 61 | +NAME DESIRED CURRENT READY AVAILABLE AGE |
| 62 | +preserve-dsoc12r4-ktjfc-worker-us-east-2a 1 1 1 1 3d11h |
| 63 | +preserve-dsoc12r4-ktjfc-worker-us-east-2b 2 2 2 2 3d11h |
| 64 | +---- |
| 65 | + |
| 66 | +. View the machines that exist in the `openshift-machine-api` namespace by running the following command. At this time, there is only one compute machine per machine set, though a compute machine set could be scaled to add a node in a particular region and zone. |
| 67 | ++ |
| 68 | +[source,terminal] |
| 69 | +---- |
| 70 | +$ oc get machines -n openshift-machine-api | grep worker |
| 71 | +---- |
| 72 | ++ |
| 73 | +.Example output |
| 74 | ++ |
| 75 | +[source,terminal] |
| 76 | +---- |
| 77 | +preserve-dsoc12r4-ktjfc-worker-us-east-2a-dts8r Running m5.xlarge us-east-2 us-east-2a 3d11h |
| 78 | +preserve-dsoc12r4-ktjfc-worker-us-east-2b-dkv7w Running m5.xlarge us-east-2 us-east-2b 3d11h |
| 79 | +preserve-dsoc12r4-ktjfc-worker-us-east-2b-k58cw Running m5.xlarge us-east-2 us-east-2b 3d11h |
| 80 | +---- |
| 81 | + |
| 82 | +. Make a copy of one of the existing compute `MachineSet` definitions and output the result to a JSON file by running the following command. This will be the basis for the GPU-enabled compute machine set definition. |
| 83 | ++ |
| 84 | +[source,terminal] |
| 85 | +---- |
| 86 | +$ oc get machineset preserve-dsoc12r4-ktjfc-worker-us-east-2a -n openshift-machine-api -o json > <output_file.json> |
| 87 | +---- |
| 88 | + |
| 89 | +. Edit the JSON file and make the following changes to the new `MachineSet` definition: |
| 90 | ++ |
| 91 | +* Replace `worker` with `gpu`. This will be the name of the new machine set. |
| 92 | +* Change the instance type of the new `MachineSet` definition to `g4dn`, which includes an NVIDIA Tesla T4 GPU. |
| 93 | +To learn more about AWS `g4dn` instance types, see link:https://aws.amazon.com/ec2/instance-types/#Accelerated_Computing[Accelerated Computing]. |
| 94 | ++ |
| 95 | +[source,terminal] |
| 96 | +---- |
| 97 | +$ jq .spec.template.spec.providerSpec.value.instanceType preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a.json |
| 98 | + |
| 99 | +"g4dn.xlarge" |
| 100 | +---- |
| 101 | ++ |
| 102 | +The `<output_file.json>` file is saved as `preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a.json`. |
| 103 | + |
| 104 | + . Update the following fields in `preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a.json`: |
| 105 | ++ |
| 106 | +* `.metadata.name` to a name containing `gpu`. |
| 107 | + |
| 108 | +* `.spec.selector.matchLabels["machine.openshift.io/cluster-api-machineset"]` to |
| 109 | +match the new `.metadata.name`. |
| 110 | + |
| 111 | +* `.spec.template.metadata.labels["machine.openshift.io/cluster-api-machineset"]` |
| 112 | +to match the new `.metadata.name`. |
| 113 | + |
| 114 | +* `.spec.template.spec.providerSpec.value.instanceType` to `g4dn.xlarge`. |
| 115 | + |
| 116 | +. To verify your changes, perform a `diff` of the original compute definition and the new GPU-enabled node definition by running the following command: |
| 117 | ++ |
| 118 | +[source,terminal] |
| 119 | +---- |
| 120 | +$ oc -n openshift-machine-api get preserve-dsoc12r4-ktjfc-worker-us-east-2a -o json | diff preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a.json - |
| 121 | +---- |
| 122 | ++ |
| 123 | +.Example output |
| 124 | ++ |
| 125 | +[source,terminal] |
| 126 | +---- |
| 127 | +10c10 |
| 128 | +
|
| 129 | +< "name": "preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a", |
| 130 | +--- |
| 131 | +> "name": "preserve-dsoc12r4-ktjfc-worker-us-east-2a", |
| 132 | +
|
| 133 | +21c21 |
| 134 | +
|
| 135 | +< "machine.openshift.io/cluster-api-machineset": "preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a" |
| 136 | +--- |
| 137 | +> "machine.openshift.io/cluster-api-machineset": "preserve-dsoc12r4-ktjfc-worker-us-east-2a" |
| 138 | +
|
| 139 | +31c31 |
| 140 | +
|
| 141 | +< "machine.openshift.io/cluster-api-machineset": "preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a" |
| 142 | +--- |
| 143 | +> "machine.openshift.io/cluster-api-machineset": "preserve-dsoc12r4-ktjfc-worker-us-east-2a" |
| 144 | +
|
| 145 | +60c60 |
| 146 | +
|
| 147 | +< "instanceType": "g4dn.xlarge", |
| 148 | +--- |
| 149 | +> "instanceType": "m5.xlarge", |
| 150 | +---- |
| 151 | + |
| 152 | +. Create the GPU-enabled compute machine set from the definition by running the following command: |
| 153 | ++ |
| 154 | +[source,terminal] |
| 155 | +---- |
| 156 | +$ oc create -f preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a.json |
| 157 | +---- |
| 158 | ++ |
| 159 | +.Example output |
| 160 | ++ |
| 161 | +[source,terminal] |
| 162 | +---- |
| 163 | +machineset.machine.openshift.io/preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a created |
| 164 | +---- |
| 165 | + |
| 166 | +.Verification |
| 167 | + |
| 168 | +. View the machine set you created by running the following command: |
| 169 | ++ |
| 170 | +[source,terminal] |
| 171 | +---- |
| 172 | +$ oc -n openshift-machine-api get machinesets | grep gpu |
| 173 | +---- |
| 174 | ++ |
| 175 | +The MachineSet replica count is set to `1` so a new `Machine` object is created automatically. |
| 176 | + |
| 177 | ++ |
| 178 | +.Example output |
| 179 | ++ |
| 180 | +[source,terminal] |
| 181 | +---- |
| 182 | +preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a 1 1 1 1 4m21s |
| 183 | +---- |
| 184 | + |
| 185 | +. View the `Machine` object that the machine set created by running the following command: |
| 186 | ++ |
| 187 | +[source,terminal] |
| 188 | +---- |
| 189 | +$ oc -n openshift-machine-api get machines | grep gpu |
| 190 | +---- |
| 191 | ++ |
| 192 | +.Example output |
| 193 | ++ |
| 194 | +[source,terminal] |
| 195 | +---- |
| 196 | +preserve-dsoc12r4-ktjfc-worker-gpu-us-east-2a running g4dn.xlarge us-east-2 us-east-2a 4m36s |
| 197 | +---- |
| 198 | + |
| 199 | +Note that there is no need to specify a namespace for the node. The node definition is cluster scoped. |
0 commit comments