Replies: 3 comments
-
Do you mind providing the exact commands used to get those numbers (and any details about the quant used)? |
Beta Was this translation helpful? Give feedback.
-
Please tell us your command line parameters. I cannot run Maverick, but here is how I run Scout on a 32-core Ryzen-5975WX with a 16 GB RTX-4080:
where
The above command puts all attention tensors, shared experts, and the first 10 layers of |
Beta Was this translation helpful? Give feedback.
-
Nice, thank you! Your command 5x'ed my prompt speed. I'm running unsloths 4.5 Bit dynamic gguf. On my original test I'm now able to get: New command: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Any idea what the deal is with prompt speeds on Maverick?
1 3090 and a 56 core ddr4 epyc - Q4.5 - ~3500 tokens:
Prompt 6.24 T/s
Generation 31.7 T/s
Same but with the GPU disabled:
Prompt 95 T/s
Generation 5.6 T/s
Is it possible to leave prompt processing on the CPU and still use the GPU for generation?
Beta Was this translation helpful? Give feedback.
All reactions