lmdeploy中TP设置成2时，当模型不能完全加载到一张显卡中，会对模型进行分层并加载到两张GPU中吗？ #1474

hello-gary-2022 · 2024-04-23T00:55:53Z

hello-gary-2022
Apr 23, 2024

在lmdeploy中，通过TP可以指定GPU数为2。如果一个模型不能直接加载到单个GPU的显卡中，这个参数会把模型进行分层，并加载到两个GPU的显存中进行推理吗？ lmdeploy是否可以加载模型并进行推理，这里的TP是指纯粹的张量并行，还是也包含模型并行？

例如，在case2中，lmdeploy的运行结果和原理是什么样的？

硬件：T4 GPU * 2 ，显存都为16G
case 1：
1. 设置推理时的TP=1
2. 加载一个精度为fp16，参数为14B的模型进行推理时，使用单个GPU无法加载这个模型。
case 2：
1. 设置推理时的TP=2
2. 加载一个精度为fp16，参数为14B的模型进行推理时，运行结果和原理是？

zhyncs · 2024-07-06T09:29:51Z

zhyncs
Jul 6, 2024
Collaborator

Tensor Parallelism

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

lmdeploy中TP设置成2时，当模型不能完全加载到一张显卡中，会对模型进行分层并加载到两张GPU中吗？ #1474

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

lmdeploy中TP设置成2时，当模型不能完全加载到一张显卡中，会对模型进行分层并加载到两张GPU中吗？ #1474

Uh oh!

hello-gary-2022 Apr 23, 2024

Replies: 1 comment

Uh oh!

zhyncs Jul 6, 2024 Collaborator

hello-gary-2022
Apr 23, 2024

zhyncs
Jul 6, 2024
Collaborator