[Docs] Remove compression format summary table from src (#1732)

dsikka · web-flow · commit 0d28f893cbe6 · 2025-08-13T16:15:04.000-04:00
SUMMARY:
- Moved the table from src and adds it under docs
- Adds nvfp4 to the table as well
diff --git a/docs/guides/compression_formats.md b/docs/guides/compression_formats.md
@@ -0,0 +1,23 @@
+# Compression Formats
+
+The following table outlines the possible quantization and sparsity 
+compression formats that are applied to a model during compression.
+The formats are determined according to the quantization scheme and 
+sparsity type. For more details on the quantization schemes, see 
+`guides/compression_schemes.md`.
+
+
+| Quantization  | Sparsity | Quant Compressor     | Sparsity Compressor |
+|---------------|----------|----------------------|---------------------|
+| W8A8 - int    | None     | int_quantized        | Dense               |
+| W8A8 - float  | None     | float_quantized      | Dense               |
+| W4A16 - float | None     | nvfp4_pack_quantized | Dense               |
+| W4A4 - float  | None     | nvfp4_pack_quantized | Dense               |
+| W4A16 - int   | None     | pack_quantized       | Dense               |
+| W8A16 - int   | None     | pack_quantized       | Dense               |
+| W8A16 - float | None     | naive_quantized      | Dense               |
+| W8A8 - int    | 2:4      | int_quantized        | Sparse24            |
+| W8A8 - float  | 2:4      | float_quantized      | Sparse24            |
+| W4A16 - int   | 2:4      | marlin_24            | Dense               |
+| W8A16 - int   | 2:4      | marlin_24            | Dense               |
+| W8A16 - float | 2:4      | naive_quantized      | Dense               |
diff --git a/src/llmcompressor/transformers/compression/quantization_format.py b/src/llmcompressor/transformers/compression/quantization_format.py
@@ -18,24 +18,7 @@ def infer_quantization_format(
     Infers the quantization format for a model based on its state and provided
     compression arguments.
 
-    The following table outlines the possible quantization and sparsity formats
-    along with their corresponding compressor formats:
-
-        +---------------+----------+----------------------+---------------------+
-        | Quantization  | Sparsity | Quant Compressor     | Sparsity Compressor |
-        |               |          | Format               | Format              |
-        +---------------+----------+----------------------+---------------------+
-        | W8A8 - int    | None     | int_quantized        | Dense               |
-        | W8A8 - float  | None     | float_quantized      | Dense               |
-        | W4A16 - int   | None     | pack_quantized       | Dense               |
-        | W8A16 - int   | None     | pack_quantized       | Dense               |
-        | W8A16 - float | None     | naive_quantized      | Dense               |
-        | W8A8 - int    | 2:4      | int_quantized        | Sparse24            |
-        | W8A8 - float  | 2:4      | float_quantized      | Sparse24            |
-        | W4A16 - int   | 2:4      | marlin_24            | Dense               |
-        | W8A16 - int   | 2:4      | marlin_24            | Dense               |
-        | W8A16 - float | 2:4      | naive_quantized      | Dense               |
-        +---------------+----------+----------------------+---------------------+
+    For a summary of the formats, see `docs/guides/compression_formats.md`.
 
     :param model: model to check for quantization, if the model is not quantized no
         quantization format is returned