Newer version of KoboldCPP cannot fit as many layers in VRAM? #1594

PhoenixGameDevelopment · 2025-06-09T12:53:54Z

PhoenixGameDevelopment
Jun 9, 2025

I am using KoboldCPP on my linux system, and I have noticed that the amount of layers that I can offload to VRAM is a lot less (And a lot less reliable) on more recent versions.

I have three GPU's (A6000 Ampere, 2 3090s).

With version 1.76 I can offload, for example, 83 layers of a particular model. I am also using 2.0,1.0,1.0 as my tensorsplit.

With version 1.93.1, with the same settings, I first of all have to change my tensor split to 1.0,2.0,1.0 (It seems that the order has changed?).

In addition, I can only load between 76 and 80 layers of the same model before I get OOM's. Not only that, but when I successfully loaded the model with 80 layers offloaded, I then restarted, and got an OOM with 80 layers, I had to reduce the layer count (with no other changes made).

I tried again under identical conditions with version 1.76, and 83 layers of the model loaded with no issues.

Can anyone confirm this behaviour?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Newer version of KoboldCPP cannot fit as many layers in VRAM? #1594

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Newer version of KoboldCPP cannot fit as many layers in VRAM? #1594

Uh oh!

PhoenixGameDevelopment Jun 9, 2025

Replies: 0 comments

PhoenixGameDevelopment
Jun 9, 2025