Skip to content

[BUG]: Vulkan doesn't fall back to CPU-only if memory can't be allocated on GPU #1290

@johnearnshaw

Description

@johnearnshaw

Description

I've been following and playing around with LlamaSharp since the project first started - It's cool BTW!

Since I updated CUDA, I've not been able to get it working, even though I have CUDA 13, CUDA 12.9, and CUDA 12.4 installed, with my CUDA_PATH environment variable set to 12.4

Image

Although this question is about Vulkan...

I've tested on 3 different machines (laptops), 2 with Intel GPU as primary and Nvidia GPU as secondary (one with Geforce 960 6GB dedicated RAM, and one with Quadro 6GB dedicated RAM), and another laptop with 2x dedicated Geforce 1080 GTX GPUs, both with 8GB dedicated RAM.

Vulkan always returns an error saying it can't allocate memory on all three machines, regardless of configurations I've tested, and if CUDA and Vulkan runtimes are installed together, Vulkan takes priority, throws the error

It won't fall back to CUDA, and then it doesn't fall back to CPU only.

Is there a fix for this with multiple GPU runtimes installed?

And is there a fix for the Vulkan not allocating memory error?

Reproduction Steps

Install Vulkan, CUDA, and CPU-only runtimes and test on (incompatible???) hardware. Remove Vulkan dependency and test again... CUDA will say 'no compatible device' and fallback to CPU, Vulkan says 'unable to allocate memory' and model doesn't load for CPU fallback, so next call to model (inference) throws an exception.

I also tried to override the native library loading using my own cross-platform assembly loader library (I've have old school native/pinvoke experience). This library works well with loading native Chromium .dll/.so, Whisper.cpp, and other native C++ libraries into a .NET application.

https://github.com/netmodules/NetTools.AssemblyLoader

It's a netstandard2.1 class library, and works as a replacement for the built-in assembly loading for assemblies in the 'runtimes' directory, etc... It lets you specify your own path to assemblies to load via cross-platform P/Invoke based on your own runtime logic.

Although my assembly loader doesn't seem to work with overriding LlamaSharp native assembly loading so I commented out the dependency.

Environment & Configuration

  • Operating system: Windows 11

  • .NET runtime version: netstandard2.1 class library with dependency on LlamaSharp running in a net8.0 console application.

  • LLamaSharp version: 0.25.0

<ItemGroup>
	<PackageReference Include="LLamaSharp" Version="0.25.0" />
	<!--<PackageReference Include="LLamaSharp.Backend.Vulkan" Version="0.25.0" />-->
	<PackageReference Include="LLamaSharp.Backend.Cuda12" Version="0.25.0" />
	<!--<PackageReference Include="LLamaSharp.Backend.Cuda11" Version="0.24.0" />-->
	<!--<PackageReference Include="LLamaSharp.Backend.OpenCL" Version="0.13.0" />
	<PackageReference Include="LLamaSharp.Backend.MacMetal" Version="0.7.0" />-->
	<PackageReference Include="LLamaSharp.Backend.Cpu" Version="0.25.0" />
	<!--<PackageReference Include="NetTools.AssemblyLoader" Version="0.0.12" />-->
	<PackageReference Include="NetTools.Serialization.Json" Version="1.1.43" />
</ItemGroup>
  • CUDA version (if you are using cuda backend): 12.4, 12.9, and 13.0 tested. (Primarily 12.4)

  • CPU & GPU device: Intel i7 with Nvidia Geforce 960 6GB, Intel i7 with Nvidia Quadro 6GB, and Intel i7 with 2x Nvidia Geforce 1080 GTX 8GB

Known Workarounds

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions