|
| 1 | +// Module included in the following assemblies: |
| 2 | +// |
| 3 | +// *scalability_and_performance/cnf-numa-aware-scheduling.adoc |
| 4 | + |
| 5 | +:_module-type: PROCEDURE |
| 6 | +[id="cnf-checking-numa-aware-scheduler-logs_{context}"] |
| 7 | += Checking the NUMA-aware scheduler logs |
| 8 | + |
| 9 | +Troubleshoot problems with the NUMA-aware scheduler by reviewing the logs. If required, you can increase the scheduler log level by modifying the `spec.logLevel` field of the `NUMAResourcesScheduler` resource. Acceptable values are `Normal`, `Debug`, and `Trace`, with `Trace` being the most verbose option. |
| 10 | + |
| 11 | +[NOTE] |
| 12 | +==== |
| 13 | +To change the log level of the secondary scheduler, delete the running scheduler resource and re-deploy it with the changed log level. The scheduler is unavailable for scheduling new workloads during this downtime. |
| 14 | +==== |
| 15 | + |
| 16 | +.Prerequisites |
| 17 | + |
| 18 | +* Install the OpenShift CLI (`oc`). |
| 19 | +* Log in as a user with `cluster-admin` privileges. |
| 20 | +
|
| 21 | +.Procedure |
| 22 | + |
| 23 | +. Delete the currently running `NUMAResourcesScheduler` resource: |
| 24 | + |
| 25 | +.. Get the active `NUMAResourcesScheduler` by running the following command: |
| 26 | ++ |
| 27 | +[source,terminal] |
| 28 | +---- |
| 29 | +$ oc get NUMAResourcesScheduler |
| 30 | +---- |
| 31 | ++ |
| 32 | +.Example output |
| 33 | +[source,terminal] |
| 34 | +---- |
| 35 | +NAME AGE |
| 36 | +numaresourcesscheduler 90m |
| 37 | +---- |
| 38 | + |
| 39 | +.. Delete the secondary scheduler resource by running the following command: |
| 40 | ++ |
| 41 | +[source,terminal] |
| 42 | +---- |
| 43 | +$ oc delete NUMAResourcesScheduler numaresourcesscheduler |
| 44 | +---- |
| 45 | ++ |
| 46 | +.Example output |
| 47 | +[source,terminal] |
| 48 | +---- |
| 49 | +numaresourcesscheduler.nodetopology.openshift.io "numaresourcesscheduler" deleted |
| 50 | +---- |
| 51 | + |
| 52 | +. Save the following YAML in the file `nro-scheduler-debug.yaml`. This example changes the log level to `Debug`: |
| 53 | ++ |
| 54 | +[source,yaml] |
| 55 | +---- |
| 56 | +apiVersion: nodetopology.openshift.io/v1alpha1 |
| 57 | +kind: NUMAResourcesScheduler |
| 58 | +metadata: |
| 59 | + name: numaresourcesscheduler |
| 60 | +spec: |
| 61 | + imageSpec: "registry.redhat.io/openshift4/noderesourcetopology-scheduler-container-rhel8:v4.10" |
| 62 | + logLevel: Debug |
| 63 | +---- |
| 64 | + |
| 65 | +. Create the updated `Debug` logging `NUMAResourcesScheduler` resource by running the following command: |
| 66 | ++ |
| 67 | +[source,terminal] |
| 68 | +---- |
| 69 | +$ oc create -f nro-scheduler-debug.yaml |
| 70 | +---- |
| 71 | ++ |
| 72 | +.Example output |
| 73 | +[source,terminal] |
| 74 | +---- |
| 75 | +numaresourcesscheduler.nodetopology.openshift.io/numaresourcesscheduler created |
| 76 | +---- |
| 77 | + |
| 78 | +.Verification steps |
| 79 | + |
| 80 | +. Check that the NUMA-aware scheduler was successfully deployed: |
| 81 | + |
| 82 | +.. Run the following command to check that the CRD is created succesfully: |
| 83 | ++ |
| 84 | +[source,terminal] |
| 85 | +---- |
| 86 | +$ oc get crd | grep numaresourcesschedulers |
| 87 | +---- |
| 88 | ++ |
| 89 | +.Example output |
| 90 | +[source,terminal] |
| 91 | +---- |
| 92 | +NAME CREATED AT |
| 93 | +numaresourcesschedulers.nodetopology.openshift.io 2022-02-25T11:57:03Z |
| 94 | +---- |
| 95 | + |
| 96 | +.. Check that the new custom scheduler is available by running the following command: |
| 97 | ++ |
| 98 | +[source,terminal] |
| 99 | +---- |
| 100 | +$ oc get numaresourcesschedulers.nodetopology.openshift.io |
| 101 | +---- |
| 102 | ++ |
| 103 | +.Example output |
| 104 | +[source,terminal] |
| 105 | +---- |
| 106 | +NAME AGE |
| 107 | +numaresourcesscheduler 3h26m |
| 108 | +---- |
| 109 | + |
| 110 | +. Check that the logs for the scheduler shows the increased log level: |
| 111 | + |
| 112 | +.. Get the list of pods running in the `openshift-numaresources` namespace by running the following command: |
| 113 | ++ |
| 114 | +[source,terminal] |
| 115 | +---- |
| 116 | +$ oc get pods -n openshift-numaresources |
| 117 | +---- |
| 118 | ++ |
| 119 | +.Example output |
| 120 | +[source,terminal] |
| 121 | +---- |
| 122 | +NAME READY STATUS RESTARTS AGE |
| 123 | +numaresources-controller-manager-d87d79587-76mrm 1/1 Running 0 46h |
| 124 | +numaresourcesoperator-worker-5wm2k 2/2 Running 0 45h |
| 125 | +numaresourcesoperator-worker-pb75c 2/2 Running 0 45h |
| 126 | +secondary-scheduler-7976c4d466-qm4sc 1/1 Running 0 21m |
| 127 | +---- |
| 128 | + |
| 129 | +.. Get the logs for the secondary scheduler pod by running the following command: |
| 130 | ++ |
| 131 | +[source,terminal] |
| 132 | +---- |
| 133 | +$ oc logs secondary-scheduler-7976c4d466-qm4sc -n openshift-numaresources |
| 134 | +---- |
| 135 | ++ |
| 136 | +.Example output |
| 137 | +[source,terminal] |
| 138 | +---- |
| 139 | +... |
| 140 | +I0223 11:04:55.614788 1 reflector.go:535] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Namespace total 11 items received |
| 141 | +I0223 11:04:56.609114 1 reflector.go:535] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.ReplicationController total 10 items received |
| 142 | +I0223 11:05:22.626818 1 reflector.go:535] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.StorageClass total 7 items received |
| 143 | +I0223 11:05:31.610356 1 reflector.go:535] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.PodDisruptionBudget total 7 items received |
| 144 | +I0223 11:05:31.713032 1 eventhandlers.go:186] "Add event for scheduled pod" pod="openshift-marketplace/certified-operators-thtvq" |
| 145 | +I0223 11:05:53.461016 1 eventhandlers.go:244] "Delete event for scheduled pod" pod="openshift-marketplace/certified-operators-thtvq" |
| 146 | +---- |
0 commit comments