Skip to content

Commit 5d037f7

Browse files
David ZhengdzhengAP
authored andcommitted
refactor(smoothquant): address HDCharles review comments
- Move is_distributed() guard to _apply_smoothing() hotpath so _reduce_activation_scales() only handles the distributed case, improving readability for single-GPU readers - Remove redundant unit tests (subsumed by 2n_calls test; empty scales test unnecessary per reviewer feedback) - Remove test_smoothquant_ddp_script_runs_cleanly (too expensive for CI) - Switch integration test model to nm-testing/tinysmokellama-3.2 (CI-friendly tiny model per HDCharles suggestion) - Switch DDP example model to Qwen/Qwen2-7B-Instruct (DDP more meaningful for larger models) - Fix --nproc arg conflict with torchrun, rename to --num_gpus - Add benchmark_smoothquant_ddp.py for reproducing speedup numbers Distributed speedup on 4x V100 32GB (Qwen2-7B-Instruct, 512 samples): 1 GPU: 94.1 min | 8.93 GB peak mem | 1.00x 2 GPU: 58.7 min | 7.06 GB peak mem | 1.60x 4 GPU: 28.7 min | 7.06 GB peak mem | 3.28x Addresses review comments from HDCharles on PR vllm-project#2471 Signed-off-by: David Zheng <dqzheng1996@gmail.com>
1 parent fae564f commit 5d037f7

File tree

3 files changed

+75
-191
lines changed

3 files changed

+75
-191
lines changed

examples/quantization_w8a8_int8/smoothquant_ddp_example.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
# ---------------------------------------------------------------------------
3131
# Config
3232
# ---------------------------------------------------------------------------
33-
MODEL_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
33+
MODEL_ID = "Qwen/Qwen2-7B-Instruct"
3434
DATASET_ID = "HuggingFaceH4/ultrachat_200k"
3535
DATASET_SPLIT = "train_sft"
3636
NUM_CALIBRATION_SAMPLES = 512

src/llmcompressor/modifiers/transform/smoothquant/base.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -343,7 +343,8 @@ def _apply_smoothing(self, model: Module):
343343
344344
This modifies the weights of the model in-place.
345345
"""
346-
self._reduce_activation_scales()
346+
if is_distributed():
347+
self._reduce_activation_scales()
347348

348349
for mapping in self.resolved_mappings_:
349350
if mapping.smooth_name not in self.scales_:

0 commit comments

Comments
 (0)