refactor(smoothquant): address HDCharles review comments

David Zheng · dzhengAP · commit 5d037f78e76b · 2026-03-19T12:25:57.000-07:00
- Move is_distributed() guard to _apply_smoothing() hotpath so _reduce_activation_scales() only handles the distributed case, improving readability for single-GPU readers - Remove redundant unit tests (subsumed by 2n_calls test; empty scales test unnecessary per reviewer feedback) - Remove test_smoothquant_ddp_script_runs_cleanly (too expensive for CI) - Switch integration test model to nm-testing/tinysmokellama-3.2 (CI-friendly tiny model per HDCharles suggestion) - Switch DDP example model to Qwen/Qwen2-7B-Instruct (DDP more meaningful for larger models) - Fix --nproc arg conflict with torchrun, rename to --num_gpus - Add benchmark_smoothquant_ddp.py for reproducing speedup numbers Distributed speedup on 4x V100 32GB (Qwen2-7B-Instruct, 512 samples): 1 GPU: 94.1 min | 8.93 GB peak mem | 1.00x 2 GPU: 58.7 min | 7.06 GB peak mem | 1.60x 4 GPU: 28.7 min | 7.06 GB peak mem | 3.28x Addresses review comments from HDCharles on PR vllm-project#2471 Signed-off-by: David Zheng <dqzheng1996@gmail.com>
diff --git a/examples/quantization_w8a8_int8/smoothquant_ddp_example.py b/examples/quantization_w8a8_int8/smoothquant_ddp_example.py
@@ -30,7 +30,7 @@
 # ---------------------------------------------------------------------------
 # Config
 # ---------------------------------------------------------------------------
-MODEL_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
+MODEL_ID = "Qwen/Qwen2-7B-Instruct"
 DATASET_ID = "HuggingFaceH4/ultrachat_200k"
 DATASET_SPLIT = "train_sft"
 NUM_CALIBRATION_SAMPLES = 512
diff --git a/src/llmcompressor/modifiers/transform/smoothquant/base.py b/src/llmcompressor/modifiers/transform/smoothquant/base.py
@@ -343,7 +343,8 @@ def _apply_smoothing(self, model: Module):
 
         This modifies the weights of the model in-place.
         """
-        self._reduce_activation_scales()
+        if is_distributed():
+            self._reduce_activation_scales()
 
         for mapping in self.resolved_mappings_:
             if mapping.smooth_name not in self.scales_:
diff --git a/tests/llmcompressor/modifiers/transform/smoothquant/test_smoothquant_distributed.py b/tests/llmcompressor/modifiers/transform/smoothquant/test_smoothquant_distributed.py