Skip to content
This repository was archived by the owner on Sep 4, 2025. It is now read-only.

Commit 4f419c0

Browse files
Fix ShardedStateLoader for vllm fp8 quantization (vllm-project#7708)
1 parent a3fce56 commit 4f419c0

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

vllm/model_executor/model_loader/loader.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -579,6 +579,10 @@ def load_model(self, *, model_config: ModelConfig,
579579
with torch.device(device_config.device):
580580
model = _initialize_model(model_config, self.load_config,
581581
lora_config, cache_config)
582+
for _, module in model.named_modules():
583+
quant_method = getattr(module, "quant_method", None)
584+
if quant_method is not None:
585+
quant_method.process_weights_after_loading(module)
582586
rank = get_tensor_model_parallel_rank()
583587
pattern = os.path.join(
584588
local_model_path,

0 commit comments

Comments
 (0)