Replies: 2 comments
-
Some of them you'd have to trial and error, I think with a 8GB card you should be able to safely offload about 24 layers or so for a 13B model with CLBlast. SmartContext is a feature which halves your context but allows it to require reprocessing less frequently. mmap is memory mapped I/O, generally you don't need to change it, read more here: https://en.wikipedia.org/wiki/Mmap |
Beta Was this translation helpful? Give feedback.
-
Thanks. I did a little testing, and it seems that 30 layers gives me 1.5 T/s. I'll try with 24 layers and see if I get a better rate. Is the default BLAS (what it this?) of 512 the best, or does that require experimentation as well? Attached is my testing; Looks like 24 layers is a little slower than 30; |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Using the Easy Launcher, there's some setting names that aren't very intuitive.
I have a RX 6600 XT 8GB GPU, and a 4-core i3-9100F CPU w/16gb sysram
Using a 13B model (chronos-hermes-13b.ggmlv3.q4_K_S), what settings would best to offload most to the GPU, if possible?
Also, could someone explain the checkbox options (SmartContext, Disable MMAP, etc)?
I understand a bit about running Stable Diffusion via DirectML, and why it's slow (come on AMD.. port ROCm to Windows already!) but I'm only just starting to cut my teeth on LLMs.
Don't worry, I'm under no illusions that this'll be fast, I'm just hoping it'll be a little faster than running on pure CPU.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions