Kimi-K2-Thinking deployment #1578

cutetest · 2025-11-09T17:25:02Z

cutetest
Nov 9, 2025

I didn't understand why we need two models CPU weights and GPU weights , ktransformers using the MOE active parameters on GPU so fp16 or bf16 on GPU the active parameters so 64GB or two rtx 5090. I know the cpu weights are INT 4 so 509GB I will have one Intel 6979P and 12 dimms of ram 64GB*12= 768GB. I am asking does ram speed critical for inference after loading model (cold initial loading). Please help how much vram I need and how much CPU ram. and should I use two models GPU weight s and CPU weights. so CPU ram is 509GB what about gram for the GPU weight s

cutetest · 2025-11-09T17:30:41Z

cutetest
Nov 9, 2025
Author

Please write me for best of the worlds. First if I need the two models? Secondly how much CPU ram and GPU Vram for inference, like I'll have Intel Xeon 6 with AMX 6979P with 120 cores and 240 threads. For int8 it is 512 tera flops (only for CPU). and does CPU and CPU memory affect TPS at inference. Cause I want the active parameters to be on the GPU vram.

0 replies

cutetest · 2025-11-09T17:40:24Z

cutetest
Nov 9, 2025
Author

my specs one Intel 6979P 120 core and 240 threads (at int8) 2048 giga flops per core so Total INT8 ops/cycle 245,760 per CPU

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kimi-K2-Thinking deployment #1578

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Kimi-K2-Thinking deployment #1578

Uh oh!

cutetest Nov 9, 2025

Replies: 2 comments

Uh oh!

cutetest Nov 9, 2025 Author

Uh oh!

cutetest Nov 9, 2025 Author

cutetest
Nov 9, 2025

cutetest
Nov 9, 2025
Author

cutetest
Nov 9, 2025
Author