Skip to content

No devices found. Waiting indefinitely #1552

@alrex66608

Description

@alrex66608

SOS,救命!!!

version_info:
os : Ubuntu24.04
Kubernetes version : 1.28.15
Containerd version : 1.7.28
k8s-device-plugin : v0.17.1

GPU检测正常
Image

Image

配置过了运行时
nvidia-ctk runtime configure --runtime=containerd
systemctl daemon-reload
systemctl restart containerd

But , 设备插件依旧检测不到GPU

root@master:~/demo/nvidia-device-plugin# kubectl logs nvidia-device-plugin-daemonset-sdd6n -n kube-system
I1204 07:31:31.764138       1 main.go:235] "Starting NVIDIA Device Plugin" version=<
	3c378193
	commit: 3c378193fcebf6e955f0d65bd6f2aeed099ad8ea
 >
I1204 07:31:31.764189       1 main.go:238] Starting FS watcher for /var/lib/kubelet/device-plugins
I1204 07:31:31.764229       1 main.go:245] Starting OS watcher.
I1204 07:31:31.764515       1 main.go:260] Starting Plugins.
I1204 07:31:31.764549       1 main.go:317] Loading configuration.
I1204 07:31:31.765073       1 main.go:342] Updating config with default resource matching patterns.
I1204 07:31:31.765318       1 main.go:353] 
Running with config:
{
  "version": "v1",
  "flags": {
    "migStrategy": "none",
    "failOnInitError": false,
    "mpsRoot": "",
    "nvidiaDriverRoot": "/",
    "nvidiaDevRoot": "/",
    "gdsEnabled": false,
    "mofedEnabled": false,
    "useNodeFeatureAPI": null,
    "deviceDiscoveryStrategy": "auto",
    "plugin": {
      "passDeviceSpecs": false,
      "deviceListStrategy": [
        "envvar"
      ],
      "deviceIDStrategy": "uuid",
      "cdiAnnotationPrefix": "cdi.k8s.io/",
      "nvidiaCTKPath": "/usr/bin/nvidia-ctk",
      "containerDriverRoot": "/driver-root"
    }
  },
  "resources": {
    "gpus": [
      {
        "pattern": "*",
        "name": "nvidia.com/gpu"
      }
    ]
  },
  "sharing": {
    "timeSlicing": {}
  },
  "imex": {}
}
I1204 07:31:31.765334       1 main.go:356] Retrieving plugins.
E1204 07:31:31.765477       1 factory.go:112] Incompatible strategy detected auto
E1204 07:31:31.765489       1 factory.go:113] If this is a GPU node, did you configure the NVIDIA Container Toolkit?
E1204 07:31:31.765494       1 factory.go:114] You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
E1204 07:31:31.765499       1 factory.go:115] You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start
E1204 07:31:31.765504       1 factory.go:116] If this is not a GPU node, you should set up a toleration or nodeSelector to only deploy this plugin on GPU nodes
I1204 07:31:31.765516       1 main.go:381] No devices found. Waiting indefinitely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionCategorizes issue or PR as a support question.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions