Skip to content
This repository was archived by the owner on Sep 4, 2025. It is now read-only.

Commit 9858710

Browse files
authored
extend moe padding to DUMMY weights (#211)
* extend moe padding to DUMMY weights
1 parent 5c50fca commit 9858710

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed

vllm/model_executor/model_loader/loader.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -407,6 +407,18 @@ def load_model(self, *, model_config: ModelConfig,
407407
# NOTE(woosuk): For accurate performance evaluation, we assign
408408
# random values to the weights.
409409
initialize_dummy_weights(model)
410+
411+
for _, module in model.named_modules():
412+
quant_method = getattr(module, "quant_method", None)
413+
if quant_method is not None:
414+
# When quant methods need to process weights after loading
415+
# (for repacking, quantizing, etc), they expect parameters
416+
# to be on the global target device. This scope is for the
417+
# case where cpu offloading is used, where we will move the
418+
# parameters onto device for processing and back off after.
419+
with device_loading_context(
420+
module, torch.device(device_config.device)):
421+
quant_method.process_weights_after_loading(module)
410422
return model.eval()
411423

412424

0 commit comments

Comments
 (0)