Skip to content

Memory allocation on multi GPU setup #1882

@Suppe2000

Description

@Suppe2000

Describe the Issue
To my setup I added a second Nvidia 4090 with 48 GB. If I run on a single GPU, I can offload a 24B Q8 model (e.g. Cydonia) with CTX of 131k, allocating around 47 GB of VRAM.

If I switch in KoboldCpp to a multi GPU setup, starting up with exact same setting, the allocation of 47 GB is shared equally between both 4090s. Theoretically I should have 96 GB of VRAM available now.
But if I slightly increase size of CTX or choose a bigger model (>24B), I run into an Out Of Memory bug.

On CUDA:
cudaMalloc failed: OOM
failed to allocate CUDA buffer

On Vulkan:
Requested buffer size exceeds device max_buffer_size limit

I sincerely do not understand what is causing it. How I can allocate the whole 96 GB of VRAM available? Any help would be appreciated.

Additional Information:
Windows 11
2 x Nvidia 4090 48GB
KoboldCpp version: any
Both CUDA / Vulkan, same result
Tensor split 1:1
"No mmap" tried without success

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions