-
Notifications
You must be signed in to change notification settings - Fork 488
Description
exporter 一直无法启动, 监听Hami得export 日志如下:
kubectl logs -n kube-system hami-device-exporter-nprt6
I0302 09:47:46.050488 1 flags.go:36] FLAG: --mig-strategy="none"
I0302 09:47:46.050562 1 flags.go:36] FLAG: --fail-on-init-error="true"
I0302 09:47:46.050569 1 flags.go:36] FLAG: --nvidia-driver-root="/"
I0302 09:47:46.050575 1 flags.go:36] FLAG: --dev-root=""
I0302 09:47:46.050579 1 flags.go:36] FLAG: --nvidia-dev-root=""
I0302 09:47:46.050583 1 flags.go:36] FLAG: --pass-device-specs="false"
I0302 09:47:46.050589 1 flags.go:36] FLAG: --device-list-strategy="[envvar]"
I0302 09:47:46.050600 1 flags.go:36] FLAG: --device-id-strategy="uuid"
I0302 09:47:46.050606 1 flags.go:36] FLAG: --gdrcopy-enabled="false"
I0302 09:47:46.050610 1 flags.go:36] FLAG: --gds-enabled="false"
I0302 09:47:46.050615 1 flags.go:36] FLAG: --mofed-enabled="false"
I0302 09:47:46.050619 1 flags.go:36] FLAG: --kubelet-socket="/var/lib/kubelet/device-plugins/kubelet.sock"
I0302 09:47:46.050625 1 flags.go:36] FLAG: --config-file=""
I0302 09:47:46.050629 1 flags.go:36] FLAG: --cdi-annotation-prefix="cdi.k8s.io/"
I0302 09:47:46.050635 1 flags.go:36] FLAG: --nvidia-cdi-hook-path="/usr/bin/nvidia-ctk"
I0302 09:47:46.050640 1 flags.go:36] FLAG: --nvidia-ctk-path="/usr/bin/nvidia-ctk"
I0302 09:47:46.050646 1 flags.go:36] FLAG: --driver-root-ctr-path="/driver-root"
I0302 09:47:46.050650 1 flags.go:36] FLAG: --container-driver-root="/driver-root"
I0302 09:47:46.050660 1 flags.go:36] FLAG: --device-discovery-strategy="auto"
I0302 09:47:46.050666 1 flags.go:36] FLAG: --imex-channel-ids="[]int{}"
I0302 09:47:46.050677 1 flags.go:36] FLAG: --imex-required="false"
I0302 09:47:46.050682 1 flags.go:36] FLAG: --v="0"
I0302 09:47:46.050688 1 flags.go:36] FLAG: --node-name="phy-10-19-196-70"
I0302 09:47:46.050694 1 flags.go:36] FLAG: --device-split-count="2"
I0302 09:47:46.050700 1 flags.go:36] FLAG: --device-memory-scaling="1"
I0302 09:47:46.050706 1 flags.go:36] FLAG: --device-cores-scaling="1"
I0302 09:47:46.050711 1 flags.go:36] FLAG: --disable-core-limit="false"
I0302 09:47:46.050716 1 flags.go:36] FLAG: --resource-name="nvidia.com/gpu"
I0302 09:47:46.050721 1 flags.go:36] FLAG: --help="false"
I0302 09:47:46.050725 1 flags.go:36] FLAG: --h="false"
I0302 09:47:46.050776 1 client.go:97] BuildConfigFromFlags failed for file /root/.kube/config: stat /root/.kube/config: no such file or directory. Using in-cluster config.
I0302 09:47:46.051269 1 main.go:267] Starting FS watcher for /var/lib/kubelet/device-plugins
I0302 09:47:46.051319 1 main.go:275] Start working on node phy-10-19-196-70
I0302 09:47:46.051325 1 main.go:276] Starting OS watcher.
I0302 09:47:46.051574 1 main.go:291] Starting Plugins.
I0302 09:47:46.051598 1 main.go:348] Loading configuration.
I0302 09:47:46.051856 1 vgpucfg.go:104] flags= [--mig-strategy value the desired strategy for exposing MIG devices on GPUs that support it:
[none | single | mixed] (default: "none") [$MIG_STRATEGY] --fail-on-init-error fail the plugin if an error is encountered during initialization, otherwise block indefinitely (default: true) [$FAIL_ON_INIT_ERROR] --nvidia-driver-root value the root path for the NVIDIA driver installation (typical values are '/' or '/run/nvidia/driver') (default: "/") [$NVIDIA_DRIVER_ROOT] --dev-root value, --nvidia-dev-root value the root path for the NVIDIA device nodes on the host (typical values are '/' or '/run/nvidia/driver') [$NVIDIA_DEV_ROOT] --pass-device-specs pass the list of DeviceSpecs to the kubelet on Allocate() (default: false) [$PASS_DEVICE_SPECS] --device-list-strategy value [ --device-list-strategy value ] the desired strategy for passing the device list to the underlying runtime:
[envvar | volume-mounts | cdi-annotations] (default: "envvar") [$DEVICE_LIST_STRATEGY] --device-id-strategy value the desired strategy for passing device IDs to the underlying runtime:
[uuid | index] (default: "uuid") [$DEVICE_ID_STRATEGY] --gdrcopy-enabled ensure that containers that request NVIDIA GPU resources are started with GDRCopy support (default: false) [$GDRCOPY_ENABLED] --gds-enabled ensure that containers are started with NVIDIA_GDS=enabled (default: false) [$GDS_ENABLED] --mofed-enabled ensure that containers are started with NVIDIA_MOFED=enabled (default: false) [$MOFED_ENABLED] --kubelet-socket value specify the socket for communicating with the kubelet; if this is empty, no connection with the kubelet is attempted (default: "/var/lib/kubelet/device-plugins/kubelet.sock") [$KUBELET_SOCKET] --config-file value the path to a config file as an alternative to command line options or environment variables [$CONFIG_FILE] --cdi-annotation-prefix value the prefix to use for CDI container annotation keys (default: "cdi.k8s.io/") [$CDI_ANNOTATION_PREFIX] --nvidia-cdi-hook-path value, --nvidia-ctk-path valuethe path to use for NVIDIA CDI hooks in the generated CDI specification (default: "/usr/bin/nvidia-ctk") [$NVIDIA_CDI_HOOK_PATH, $NVIDIA_CTK_PATH] --driver-root-ctr-path value, --container-driver-root value the path where the NVIDIA driver root is mounted in the container; used for generating CDI specifications (default: "/driver-root") [$DRIVER_ROOT_CTR_PATH, $CONTAINER_DRIVER_ROOT] --device-discovery-strategy value the strategy to use to discover devices: 'auto', 'nvml', or 'tegra' (default: "auto") [$DEVICE_DISCOVERY_STRATEGY] --imex-channel-ids value [ --imex-channel-ids value ] A list of IMEX channels to inject. [$IMEX_CHANNEL_IDS] --imex-required The specified IMEX channels are required (default: false) [$IMEX_REQUIRED] -v value number for the log level verbosity (default: 0) --node-name value node name (default: "phy-10-19-196-70") [$NodeName] --device-split-count value the number for NVIDIA device split (default: 2) [$DEVICE_SPLIT_COUNT] --device-memory-scaling value the ratio for NVIDIA device memory scaling (default: 1) [$DEVICE_MEMORY_SCALING] --device-cores-scaling value the ratio for NVIDIA device cores scaling (default: 1) [$DEVICE_CORES_SCALING] --disable-core-limit If set, the core utilization limit will be ignored (default: false) [$DISABLE_CORE_LIMIT] --resource-name value the name of field for number GPU visible in container (default: "nvidia.com/gpu")]
I0302 09:47:46.052065 1 config.go:463] Reading config file from path:
F0302 09:47:46.052076 1 vgpucfg.go:116] failed to load ascend vnpu config file : open : no such file or directory