Drop flash_attn skip for quantizing_moe example tests (#1396)

dbarbuzzi · kylesayrs · commit 27dccc1a9a21 · 2025-05-04T18:42:06.000-04:00
SUMMARY: Drop the skip related to requiring `flash_attn` be installed in the tests for the `quantizing_moe` examples. Recent CI failures related to this package and CUDA compatibility with the recently released PyTorch 2.7.0 has resulted in findings that it is not required for these tests. TEST PLAN: An [internal test run][1] that drops the installation of `flash-attn` and runs the changes on this branch indicates that the tests will pass (one successful so far, will mark PR as ready once the run completes and the remaining show expected results). Specific relevant output (will update with other tests’ results): ``` tests/examples/test_quantizing_moe.py::TestQuantizingMOE::test_deepseek_example_script[deepseek_moe_w8a8_int8.py] PASSED tests/examples/test_quantizing_moe.py::TestQuantizingMOE::test_deepseek_example_script[deepseek_moe_w8a8_fp8.py] PASSED ``` [1]: https://github.com/neuralmagic/llm-compressor-testing/actions/runs/14712618904 Signed-off-by: Domenic Barbuzzi <dbarbuzz@redhat.com>
diff --git a/tests/examples/test_quantizing_moe.py b/tests/examples/test_quantizing_moe.py
@@ -11,12 +11,6 @@
     requires_gpu_count,
 )
 
-# flash_attn module is required. It cannot safely be specified as a dependency because
-# it rqeuires a number of non-standard packages to be installed in order to be built
-# such as pytorch, and thus cannot be installed in a clean environment (those
-# dependencies must be installed prior to attempting to install flash_attn)
-pytest.importorskip("flash_attn", reason="flash_attn is required")
-
 
 @pytest.fixture
 def example_dir() -> str: