-
Notifications
You must be signed in to change notification settings - Fork 595
Description
Describe the Issue
To my setup I added a second Nvidia 4090 with 48 GB. If I run on a single GPU, I can offload a 24B Q8 model (e.g. Cydonia) with CTX of 131k, allocating around 47 GB of VRAM.
If I switch in KoboldCpp to a multi GPU setup, starting up with exact same setting, the allocation of 47 GB is shared equally between both 4090s. Theoretically I should have 96 GB of VRAM available now.
But if I slightly increase size of CTX or choose a bigger model (>24B), I run into an Out Of Memory bug.
On CUDA:
cudaMalloc failed: OOM
failed to allocate CUDA buffer
On Vulkan:
Requested buffer size exceeds device max_buffer_size limit
I sincerely do not understand what is causing it. How I can allocate the whole 96 GB of VRAM available? Any help would be appreciated.
Additional Information:
Windows 11
2 x Nvidia 4090 48GB
KoboldCpp version: any
Both CUDA / Vulkan, same result
Tensor split 1:1
"No mmap" tried without success