Skip to content

Commit af31a16

Browse files
HDCharlesyiliu30
authored andcommitted
fix ddp for nvfp4 on A100 (vllm-project#2404)
depends on vllm-project/compressed-tensors#603 Summary: nccl only allows broadcasting fp8 on a100 but we can work around it with this util Test Plan: <details> Test Script </details> Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com> Signed-off-by: yiliu30 <yi4.liu@intel.com>
1 parent 6b8a224 commit af31a16

File tree

1 file changed

+4
-2
lines changed
  • src/llmcompressor/modifiers/quantization/gptq

1 file changed

+4
-2
lines changed

src/llmcompressor/modifiers/quantization/gptq/base.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
from typing import Dict, List, Optional, Tuple, Union
33

44
import torch
5-
from compressed_tensors.offload.dist_utils import is_distributed
5+
from compressed_tensors.offload.dist_utils import as_broadcastable, is_distributed
66
from compressed_tensors.quantization import (
77
QuantizationConfig,
88
QuantizationScheme,
@@ -358,7 +358,9 @@ def _broadcast_quantized_params(self, module_list, module_to_rank):
358358
if getattr(module, attr, None) is not None:
359359
pending_comms.append(
360360
dist.broadcast(
361-
getattr(module, attr), src=src_rank, async_op=True
361+
as_broadcastable(getattr(module, attr)),
362+
src=src_rank,
363+
async_op=True,
362364
)
363365
)
364366
wait_for_comms(pending_comms)

0 commit comments

Comments
 (0)