You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using KoboldCPP on my linux system, and I have noticed that the amount of layers that I can offload to VRAM is a lot less (And a lot less reliable) on more recent versions.
I have three GPU's (A6000 Ampere, 2 3090s).
With version 1.76 I can offload, for example, 83 layers of a particular model. I am also using 2.0,1.0,1.0 as my tensorsplit.
With version 1.93.1, with the same settings, I first of all have to change my tensor split to 1.0,2.0,1.0 (It seems that the order has changed?).
In addition, I can only load between 76 and 80 layers of the same model before I get OOM's. Not only that, but when I successfully loaded the model with 80 layers offloaded, I then restarted, and got an OOM with 80 layers, I had to reduce the layer count (with no other changes made).
I tried again under identical conditions with version 1.76, and 83 layers of the model loaded with no issues.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I am using KoboldCPP on my linux system, and I have noticed that the amount of layers that I can offload to VRAM is a lot less (And a lot less reliable) on more recent versions.
I have three GPU's (A6000 Ampere, 2 3090s).
With version 1.76 I can offload, for example, 83 layers of a particular model. I am also using 2.0,1.0,1.0 as my tensorsplit.
With version 1.93.1, with the same settings, I first of all have to change my tensor split to 1.0,2.0,1.0 (It seems that the order has changed?).
In addition, I can only load between 76 and 80 layers of the same model before I get OOM's. Not only that, but when I successfully loaded the model with 80 layers offloaded, I then restarted, and got an OOM with 80 layers, I had to reduce the layer count (with no other changes made).
I tried again under identical conditions with version 1.76, and 83 layers of the model loaded with no issues.
Can anyone confirm this behaviour?
Beta Was this translation helpful? Give feedback.
All reactions