Skip to content

Commit bc1d292

Browse files
authored
Polish the 'Setting Up a Kubernetes Environment with GPUs' tutorial by (#51)
including a troubleshooting tip Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
1 parent 4d7d063 commit bc1d292

File tree

1 file changed

+24
-0
lines changed

1 file changed

+24
-0
lines changed

tutorials/00-install-kubernetes-env.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,30 @@ Before you begin, ensure the following:
126126
TEST SUITE: None
127127
```
128128
129+
4. Some troubleshooting tips for installing gpu-operator:
130+
131+
If gpu-operator fails to start because of the common seen “too many open files” issue for minikube (and [kind](https://kind.sigs.k8s.io/)), then a quick fix below may be helpful.
132+
133+
The issue can be observed by one or more gpu-operator pods in `CrashLoopBackOff` status, and be confirmed by checking their logs. For example,
134+
135+
```console
136+
$ sudo kubectl -n gpu-operator logs daemonset/nvidia-device-plugin-daemonset -c nvidia-device-plugin
137+
IS_HOST_DRIVER=true
138+
NVIDIA_DRIVER_ROOT=/
139+
DRIVER_ROOT_CTR_PATH=/host
140+
NVIDIA_DEV_ROOT=/
141+
DEV_ROOT_CTR_PATH=/host
142+
Starting nvidia-device-plugin
143+
I0131 19:35:42.895845 1 main.go:235] "Starting NVIDIA Device Plugin" version=<
144+
d475b2cf
145+
commit: d475b2cfcf12b983a4975d4fc59d91af432cf28e
146+
>
147+
I0131 19:35:42.895917 1 main.go:238] Starting FS watcher for /var/lib/kubelet/device-plugins
148+
E0131 19:35:42.895933 1 main.go:173] failed to create FS watcher for /var/lib/kubelet/device-plugins/: too many open files
149+
```
150+
151+
The fix is [well documented](https://kind.sigs.k8s.io/docs/user/known-issues#pod-errors-due-to-too-many-open-files) by kind, it also works for minikube.
152+
129153
### Step 4: Verifying GPU Configuration
130154
131155
1. Ensure Minikube is running:

0 commit comments

Comments
 (0)