DeepSeek R1 unsloth 2.51 bits多GPU对比单GPU性能无优势 #921

yuliao0214 · 2025-03-18T06:21:11Z

yuliao0214
Mar 18, 2025

模型： DeepSeek-R1-GGUF(unsloth) 2.58 bit (60 layers)
机器：
- Xeon(R) Platinum 8457C * 2;
- DRAM 720GB;
- 4090D 24G VRAM * 8
ktransformer : v0.2.3

测试结果，8卡4090D对比1张4090D性能无优化，

optimize rules	threads	prompt length	output length	TTFT s	TPOT ms	Prefill token/s	Decode token/s
1 gpu	90	3500	1500	27.7	55.9	126.8	17.9
1 gpu	45	3500	1500	34.8	57.5	100.7	17.4
2 gpu	90	3500	1500	26.8	54.9	130.6	18.2
4 gpu	90	3500	1500	28.2	55.4	124.1	18.1
8 gpu	90	3500	1500	28.6	56.5	122.5	17.7

对应优化规则如下：

DeepSeek-V3-Chat.yaml
DeepSeek-V3-Chat-multi-gpu.yaml
DeepSeek-V3-Chat-multi-gpu-4.yaml
DeepSeek-V3-Chat-multi-gpu-8.yaml

查看GPU使用率和显存使用，多卡和单卡并没有显著差异，请问该如何解决？

Answered by Azure-Tang

Mar 18, 2025

You're correct. Currently, KT's multi-GPU implementation is based on pipeline, which is designed for users with multiple GPUs but limited VRAM on each device. At this stage, the multi-gpu doesn't provide acceleration benefits, but rather enables model deployment across multiple smaller GPUs.

We are actively working on improving this functionality to deliver performance enhancements in future releases.

View full answer

yuliao0214 · 2025-03-18T06:22:16Z

yuliao0214
Mar 18, 2025
Author

@Azure-Tang need help

0 replies

Azure-Tang · 2025-03-18T09:26:12Z

Azure-Tang
Mar 18, 2025
Maintainer

You're correct. Currently, KT's multi-GPU implementation is based on pipeline, which is designed for users with multiple GPUs but limited VRAM on each device. At this stage, the multi-gpu doesn't provide acceleration benefits, but rather enables model deployment across multiple smaller GPUs.

We are actively working on improving this functionality to deliver performance enhancements in future releases.

1 reply

yuliao0214 Mar 18, 2025
Author

Thanks for the answer, looking forward to the subsequent multi-GPU optimization.

abfmwei · 2025-03-19T04:36:07Z

abfmwei
Mar 19, 2025

问一下你这个机器单CPU跑Q4多少速度？

1 reply

yuliao0214 Mar 20, 2025
Author

Q8 decode是12 tokens/s左右，没跑过Q4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DeepSeek R1 unsloth 2.51 bits多GPU对比单GPU性能无优势 #921

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

DeepSeek R1 unsloth 2.51 bits多GPU对比单GPU性能无优势 #921

Uh oh!

yuliao0214 Mar 18, 2025

Replies: 3 comments · 2 replies

Uh oh!

yuliao0214 Mar 18, 2025 Author

Uh oh!

Azure-Tang Mar 18, 2025 Maintainer

Uh oh!

yuliao0214 Mar 18, 2025 Author

Uh oh!

abfmwei Mar 19, 2025

Uh oh!

yuliao0214 Mar 20, 2025 Author

yuliao0214
Mar 18, 2025

Replies: 3 comments 2 replies

yuliao0214
Mar 18, 2025
Author

Azure-Tang
Mar 18, 2025
Maintainer

yuliao0214 Mar 18, 2025
Author

abfmwei
Mar 19, 2025

yuliao0214 Mar 20, 2025
Author