Skip to content

[question] Error while loading shared libraries: libcuda.so.1: on Pod Initialization (CrashLoop) #2781

@jothipillay

Description

@jothipillay

What happened:

Followed instruction to setup Koordinator v1.7 with Hami Core. Intended Purpose is to achieve GPU Memory Isolation and Fractioning.

When i use the sample pod specs provided. Pod goes into Crash Loop Back Off error state with following error
"error while loading shared libraries: libcuda.so.1:"

full screenshot below
Image

Can advise what I did wrong or troubleshooting steps. Which component is this issue related to?

Environment:

  • Koordinator version: - v1.7
  • Kubernetes version (use kubectl version): v1.29.x
  • docker/containerd version: containerd 1.7.28
  • OS (e.g: cat /etc/os-release): Ubuntu 22.04
  • Kernel (e.g. uname -a): 5.15.0-161-generic

Anything else we need to know:
GPUEnvInject enabled in Koordlet
Also tried with Koordinator v1.6 with Hami Core Distribute - faced same error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/questionSupport request or question relating to Koordinator

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions