Skip to content

Commit 8f6a0b5

Browse files
authored
[BugFix]: Sparse2of4 example sparsity-only case (#1282)
The Sparse2of4 example supports two modes of operation: - Sparsity-only - Sparsity + Quantization (only supported for symmetric quantization) In a recent update, we removed the `ConstantPruningModifier` from the Sparse2of4 example and added a check to raise an error if **asymmetric quantization** was being used. However, this check was incorrectly placed outside the quantization-specific code path and unconditionally accessed the scheme attribute from the quantization modifier. This caused failures in sparsity-only cases, as reported in [INFERENG-483](https://issues.redhat.com/browse/INFERENG-483). ### Fix This PR moves the asymmetric quantization check inside the quantization code path to ensure it is only evaluated when quantization is actually enabled. ### Testing The example was tested by running it both with and without the `--fp8` flag. In both cases, successful completion was verified. Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
1 parent 85152fd commit 8f6a0b5

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -60,13 +60,13 @@ def get_recipe(fp8_enabled):
6060
)
6161
save_dir = MODEL_ID.split("/")[1] + "2of4-W8A8-FP8-Dynamic-Per-Token"
6262

63-
# check that asymmetric quantization is not being used
64-
q_scheme = base_recipe[1].scheme
65-
if not isinstance(q_scheme, str) and not q_scheme["weights"].symmetric:
66-
raise ValueError(
67-
"Asymmetric quantization with 2of4 sparsity is not supported by vLLM. "
68-
"Please use symmetric quantization"
69-
)
63+
# check that asymmetric quantization is not being used
64+
q_scheme = base_recipe[1].scheme
65+
if not isinstance(q_scheme, str) and not q_scheme["weights"].symmetric:
66+
raise ValueError(
67+
"Asymmetric quantization with 2of4 sparsity is not supported by vLLM. "
68+
"Please use symmetric quantization"
69+
)
7070

7171
return base_recipe, save_dir
7272

0 commit comments

Comments
 (0)