Replies: 2 comments 1 reply
-
Small update, I replaced --numa isolate with --numa numactl This does what I thought isolate would do. Still no luck finding settings that actually both cpus. |
Beta Was this translation helpful? Give feedback.
-
There have been a lot of discussions around the Internet about I don't have access to a dual socket system, so have done nothing related to NUMA in |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
With a single 8480+ and a 3090 I get excellent speeds ~40 T/s on Maverick
After installing a second cpu and another 8 sticks of ram I cant get good speeds.
numa distribute gives ~27 T/s
numa isolate (and -t 56) is even slower at ~10 T/s
(With cache cleared between tests)
This is with Sub-NUMA Clustering disabled, so only 2 numa nodes total.
Any recommendations for settings that will get over 40 T/s?
Do I not understand what numa isolate does? I thought that would be the same as a single CPU.
llama-server -m Maverick-UD-IQ4_XS.gguf -c 32000 -fa -fmoe -amb 512 -rtr -ctk q8_0 -ctv q8_0 -ngl 99 -ot ".ffn_._exps.*=CPU" --numa isolate -t 56
Beta Was this translation helpful? Give feedback.
All reactions