Skip to content

Commit 63ede0a

Browse files
committed
More nits I missed under the fold
Signed-off-by: Laura Lorenz <[email protected]>
1 parent 57667f5 commit 63ede0a

File tree

1 file changed

+66
-79
lines changed

1 file changed

+66
-79
lines changed

content/en/docs/tutorials/cluster-management/install-use-dra.md

Lines changed: 66 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -91,10 +91,12 @@ To enable the DRA feature, you must enable the following feature gates and API g
9191

9292
<!-- lessoncontent -->
9393

94-
## Explore the DRA initial state
94+
## Explore the initial cluster state {#explore-initial-state}
9595

96-
With no driver installed or Pod claims yet to satisfy, you can observe the
97-
initial state of a cluster with DRA enabled.
96+
You can spend some time to observe the initial state of a cluster with DRA
97+
enabled, especially if you have not used these APIs extensively before. If you
98+
set up a new cluster for this tutorial, with no driver installed and no Pod
99+
claims yet to satisfy, the output of these commands won't show any resources.
98100

99101
1. Get a list of {{< glossary_tooltip text="DeviceClasses" term_id="deviceclass" >}}:
100102

@@ -106,10 +108,6 @@ initial state of a cluster with DRA enabled.
106108
No resources found
107109
```
108110

109-
If you set up a new blank cluster for this tutorial, it's normal to find that
110-
there are no DeviceClasses. [Learn more about DeviceClasses
111-
here.](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#deviceclass)
112-
113111
1. Get a list of {{< glossary_tooltip text="ResourceSlices" term_id="resourceslice" >}}:
114112

115113
```shell
@@ -120,11 +118,7 @@ initial state of a cluster with DRA enabled.
120118
No resources found
121119
```
122120

123-
If you set up a new blank cluster for this tutorial, it's normal to find that
124-
there are no ResourceSlices advertised. [Learn more about ResourceSlices
125-
here.](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resourceslice)
126-
127-
1. View {{< glossary_tooltip text="ResourceClaims" term_id="resourceclaim" >}} and {{<
121+
1. Get a list of {{< glossary_tooltip text="ResourceClaims" term_id="resourceclaim" >}} and {{<
128122
glossary_tooltip text="ResourceClaimTemplates" term_id="resourceclaimtemplate"
129123
>}}
130124

@@ -138,12 +132,6 @@ glossary_tooltip text="ResourceClaimTemplates" term_id="resourceclaimtemplate"
138132
No resources found
139133
```
140134

141-
If you set up a new blank cluster for this tutorial, it's normal to find that
142-
there are no ResourceClaims or ResourceClaimTemplates as you, the user, have
143-
not created any. [Learn more about ResourceClaims and ResourceClaimTemplates
144-
here.](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resourceclaims-templates)
145-
146-
147135
At this point, you have confirmed that DRA is enabled and configured properly in
148136
the cluster, and that no DRA drivers have advertised any resources to the DRA
149137
APIs yet.
@@ -158,15 +146,22 @@ selection of the nodes (using {{< glossary_tooltip text="selectors"
158146
term_id="selector" >}} or similar mechanisms) in your cluster.
159147
160148
Check your driver's documentation for specific installation instructions, which
161-
may include a Helm chart, a set of manifests, or other deployment tooling.
149+
might include a Helm chart, a set of manifests, or other deployment tooling.
162150

163151
This tutorial uses an example driver which can be found in the
164152
[kubernetes-sigs/dra-example-driver](https://github.com/kubernetes-sigs/dra-example-driver)
165-
repository to demonstrate driver installation.
153+
repository to demonstrate driver installation. This example driver advertises
154+
simulated GPUs to Kubernetes for your Pods to interact with.
166155

167-
### Prepare your cluster for driver installation
156+
### Prepare your cluster for driver installation {#prepare-cluster-driver}
157+
158+
To simplify cleanup, create a namespace named dra-tutorial:
159+
160+
1. Create the namespace:
168161

169-
To make it easier to cleanup later, create a namespace called `dra-tutorial` in your cluster.
162+
```shell
163+
kubectl create namespace dra-tutorial
164+
```
170165

171166
In a production environment, you would likely be using a previously released or
172167
qualified image from the driver vendor or your own organization, and your nodes
@@ -175,12 +170,6 @@ hosted. In this tutorial, you will use a publicly released image of the
175170
dra-example-driver to simulate access to a DRA driver image.
176171

177172

178-
1. Create the namespace:
179-
180-
```shell
181-
kubectl create namespace dra-tutorial
182-
```
183-
184173
1. Confirm your nodes have access to the image by running the following
185174
from within one of your cluster's nodes:
186175
@@ -231,12 +220,10 @@ on this cluster:
231220
```
232221
233222
1. Create a {{< glossary_tooltip term_id="priority-class" >}} for the DRA
234-
driver. The DRA driver component is responsible for important lifecycle
235-
operations for Pods with claims, so you don't want it to be preempted. Learn
236-
more about [pod priority and preemption
237-
here](/docs/concepts/scheduling-eviction/pod-priority-preemption/). Learn
238-
more about [good practices when maintaining a DRA driver
239-
here](/docs/concepts/cluster-administration/dra/).
223+
driver. The PriorityClass prevents preemption of th DRA driver component,
224+
which is responsible for important lifecycle operations for Pods with
225+
claims. Learn more about [pod priority and preemption
226+
here](/docs/concepts/scheduling-eviction/pod-priority-preemption/).
240227
241228
{{% code_sample language="yaml" file="dra/driver-install/priorityclass.yaml" %}}
242229
@@ -245,21 +232,22 @@ on this cluster:
245232
```
246233
247234
1. Deploy the actual DRA driver as a DaemonSet configured to run the example
248-
driver binary with the permissions provisioned above.
235+
driver binary with the permissions provisioned above. The DaemonSet has the
236+
permissions that you granted to the ServiceAccount in the previous steps.
249237
250238
{{% code_sample language="yaml" file="dra/driver-install/daemonset.yaml" %}}
251239
252240
```shell
253241
kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/daemonset.yaml
254242
```
255-
It is configured with
243+
The DaemonSet is configured with
256244
the volume mounts necessary to interact with the underlying Container Device
257-
Interface (CDI) directory, and to expose its socket to kubelet via the
258-
kubelet plugins directory.
245+
Interface (CDI) directory, and to expose its socket to `kubelet` via the
246+
`kubelet/plugins` directory.
259247
260-
### Verify the DRA driver installation
248+
### Verify the DRA driver installation {#verify-driver-install}
261249
262-
1. Observe the Pods of the DRA driver DaemonSet across all worker nodes:
250+
1. Get a list of the Pods of the DRA driver DaemonSet across all worker nodes:
263251
264252
```shell
265253
kubectl get pod -l app.kubernetes.io/name=dra-example-driver -n dra-tutorial
@@ -293,7 +281,7 @@ At this point, you have successfully installed the example DRA driver, and
293281
confirmed its initial configuration. You're now ready to use DRA to schedule
294282
Pods.
295283
296-
## Claim resources and deploy a Pod
284+
## Claim resources and deploy a Pod {#claim-resources-pod}
297285
298286
To request resources using DRA, you create ResourceClaims or
299287
ResourceClaimTemplates that define the resources that your Pods need. In the
@@ -309,12 +297,11 @@ learn more about ResourceClaims.
309297
310298
### Create the ResourceClaim
311299
312-
The Pod manifest itself will include a reference to its relevant ResourceClaim
313-
object, which you will create now. Whatever the claim, the `deviceClassName` is
314-
a required field, narrowing down the scope of the request to a specific device
315-
class. The request itself can include a {{< glossary_tooltip term_id="cel" >}}
316-
expression that references attributes that may be advertised by the driver
317-
managing that device class.
300+
In this section, you create a ResourceClaim and reference it in a Pod. Whatever
301+
the claim, the `deviceClassName` is a required field, narrowing down the scope
302+
of the request to a specific device class. The request itself can include a {{<
303+
glossary_tooltip term_id="cel" >}} expression that references attributes that
304+
may be advertised by the driver managing that device class.
318305
319306
In this example, you will create a request for any GPU advertising over 10Gi
320307
memory capacity. The attribute exposing capacity from the example driver takes
@@ -341,20 +328,6 @@ underlying container.
341328
kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/example/pod.yaml
342329
```
343330

344-
### Explore the DRA state
345-
346-
The cluster now tries to schedule that Pod to a node where Kubernetes can
347-
satisfy the ResourceClaim. In our situation, the DRA driver is deployed on all
348-
nodes, and is advertising mock GPUs on all nodes, all of which have enough
349-
capacity advertised to satisfy the Pod's claim, so this Pod may be scheduled to
350-
any node and any of the mock GPUs on that node may be allocated.
351-
352-
The mock GPU driver injects environment variables in each container it is
353-
allocated to in order to indicate which GPUs _would_ have been injected into
354-
them by a real resource driver and how they would have been configured, so you
355-
can check those environment variables to see how the Pods have been handled by
356-
the system.
357-
358331
1. Confirm the pod has deployed:
359332

360333
```shell
@@ -367,7 +340,22 @@ the system.
367340
pod0 1/1 Running 0 9s
368341
```
369342

370-
1. Observe the pod logs which report the name of the mock GPU allocated:
343+
### Explore the DRA state
344+
345+
After you create the Pod, the cluster tries to schedule that Pod to a node where
346+
Kubernetes can satisfy the ResourceClaim. In this tutorial, the DRA driver is
347+
deployed on all nodes, and is advertising mock GPUs on all nodes, all of which
348+
have enough capacity advertised to satisfy the Pod's claim, so Kubernetes can
349+
schedule this Pod on any node and can allocate any of the mock GPUs on that
350+
node.
351+
352+
When Kubernetes allocates a mock GPU to a Pod, the example driver adds
353+
environment variables in each container it is allocated to in order to indicate
354+
which GPUs _would_ have been injected into them by a real resource driver and
355+
how they would have been configured, so you can check those environment
356+
variables to see how the Pods have been handled by the system.
357+
358+
1. Check the Pod logs, which report the name of the mock GPU that was allocated:
371359
372360
```shell
373361
kubectl logs pod0 -c ctr0 -n dra-tutorial | grep -E "GPU_DEVICE_[0-9]+=" | grep -v "RESOURCE_CLAIM"
@@ -378,10 +366,7 @@ the system.
378366
declare -x GPU_DEVICE_4="gpu-4"
379367
```
380368
381-
1. Observe the ResourceClaim object:
382-
383-
You can observe the ResourceClaim more closely, first only to see its state
384-
is allocated and reserved.
369+
1. Check the state of the ResourceClaim object:
385370
386371
```shell
387372
kubectl get resourceclaims -n dra-tutorial
@@ -394,9 +379,12 @@ the system.
394379
some-gpu allocated,reserved 34s
395380
```
396381
397-
Looking deeper at the `some-gpu` ResourceClaim, you can see that the status
398-
stanza includes information about the device that has been allocated and for
399-
what pod it has been reserved for:
382+
In this output, the `STATE` column shows that the ResourceClaim is allocated
383+
and reserved.
384+
385+
1. Check the details of the `some-gpu` ResourceClaim. The `status` stanza of
386+
the ResourceClaim has information about the allocated device and the Pod it
387+
has been reserved for:
400388
401389
```shell
402390
kubectl get resourceclaim some-gpu -n dra-tutorial -o yaml
@@ -453,8 +441,8 @@ the system.
453441
resourceVersion: ""
454442
{{< /highlight >}}
455443
456-
1. Observe the driver by checking the pod logs for pods backing the driver
457-
daemonset:
444+
1. To check how the driver handled device allocation, get the logs for the
445+
driver DaemonSet Pods:
458446
459447
```shell
460448
kubectl logs -l app.kubernetes.io/name=dra-example-driver -n dra-tutorial
@@ -466,17 +454,16 @@ the system.
466454
I0729 05:11:52.684450 1 driver.go:112] Returning newly prepared devices for claim '79e1e8d8-7e53-4362-aad1-eca97678339e': [&Device{RequestNames:[some-gpu],PoolName:kind-worker,DeviceName:gpu-4,CDIDeviceIDs:[k8s.gpu.example.com/gpu=common k8s.gpu.example.com/gpu=79e1e8d8-7e53-4362-aad1-eca97678339e-gpu-4],}]
467455
```
468456
469-
You have now successfully deployed a Pod with a DRA based claim, and seen it
470-
scheduled to an appropriate node and the associated DRA APIs updated to reflect
471-
its status.
457+
You have now successfully deployed a Pod that claims devices using DRA, verified
458+
that the Pod was scheduled to an appropriate node, and saw that the associated
459+
DRA APIs kinds were updated with the allocation status.
472460
473-
## Remove the Pod with a claim
461+
## Delete a Pod that has a claim {#delete-pod-claim}
474462
475463
When a Pod with a claim is deleted, the DRA driver deallocates the resource so
476-
it can be available for future scheduling. You can observe that by deleting our
477-
pod with a claim and seeing that the state of the ResourceClaim changes.
478-
479-
### Delete the pod using the resource claim
464+
it can be available for future scheduling. To validate this behavior, delete the
465+
Pod that you created in the previous steps and watch the corresponding changes
466+
to the ResourceClaim and driver.
480467
481468
1. Delete the `pod0` Pod:
482469

0 commit comments

Comments
 (0)