[BugFix]: Sparse2of4 example sparsity-only case (#1282)

rahul-tuli · web-flow · commit 8f6a0b5fc939 · 2025-03-25T11:56:50.000-04:00
The Sparse2of4 example supports two modes of operation: - Sparsity-only - Sparsity + Quantization (only supported for symmetric quantization) In a recent update, we removed the `ConstantPruningModifier` from the Sparse2of4 example and added a check to raise an error if **asymmetric quantization** was being used. However, this check was incorrectly placed outside the quantization-specific code path and unconditionally accessed the scheme attribute from the quantization modifier. This caused failures in sparsity-only cases, as reported in [INFERENG-483](https://issues.redhat.com/browse/INFERENG-483). ### Fix This PR moves the asymmetric quantization check inside the quantization code path to ensure it is only evaluated when quantization is actually enabled. ### Testing The example was tested by running it both with and without the `--fp8` flag. In both cases, successful completion was verified. Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
diff --git a/examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py b/examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py
@@ -60,13 +60,13 @@ def get_recipe(fp8_enabled):
         )
         save_dir = MODEL_ID.split("/")[1] + "2of4-W8A8-FP8-Dynamic-Per-Token"
 
-    # check that asymmetric quantization is not being used
-    q_scheme = base_recipe[1].scheme
-    if not isinstance(q_scheme, str) and not q_scheme["weights"].symmetric:
-        raise ValueError(
-            "Asymmetric quantization with 2of4 sparsity is not supported by vLLM. "
-            "Please use symmetric quantization"
-        )
+        # check that asymmetric quantization is not being used
+        q_scheme = base_recipe[1].scheme
+        if not isinstance(q_scheme, str) and not q_scheme["weights"].symmetric:
+            raise ValueError(
+                "Asymmetric quantization with 2of4 sparsity is not supported by vLLM. "
+                "Please use symmetric quantization"
+            )
 
     return base_recipe, save_dir