Help in converting llama-cpp command to KoboldCpp #1684
-
I got this command from Reddit to run GLM-4.5-Air on my machine (Windows, 2x3090 + 64 GB DDR5):
The model runs perfectly with this command and produces about 10 t/s. How the tensors are split:
I tried to run the same model with latest Koboldcpp (the backend that I normally use for all the other models) but apparently I failed to load it due to insufficient VRAM. Here is the full output from the terminal:
Can you tell me what I am doing wrong w.r.t. llama-cpp? Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
This option would be the equivalent of |
Beta Was this translation helpful? Give feedback.
This option would be the equivalent of
-ts
on your llama-server command line, so something like:--tensor_split 2 1
.