Replies: 2 comments
-
|
Please write me for best of the worlds. First if I need the two models? Secondly how much CPU ram and GPU Vram for inference, like I'll have Intel Xeon 6 with AMX 6979P with 120 cores and 240 threads. For int8 it is 512 tera flops (only for CPU). and does CPU and CPU memory affect TPS at inference. Cause I want the active parameters to be on the GPU vram. |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
my specs one Intel 6979P 120 core and 240 threads (at int8) 2048 giga flops per core so Total INT8 ops/cycle 245,760 per CPU |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I didn't understand why we need two models CPU weights and GPU weights , ktransformers using the MOE active parameters on GPU so fp16 or bf16 on GPU the active parameters so 64GB or two rtx 5090. I know the cpu weights are INT 4 so 509GB I will have one Intel 6979P and 12 dimms of ram 64GB*12= 768GB. I am asking does ram speed critical for inference after loading model (cold initial loading). Please help how much vram I need and how much CPU ram. and should I use two models GPU weight s and CPU weights. so CPU ram is 509GB what about gram for the GPU weight s
Beta Was this translation helpful? Give feedback.
All reactions