Skip to content

Commit 0d28f89

Browse files
authored
[Docs] Remove compression format summary table from src (#1732)
SUMMARY: - Moved the table from src and adds it under docs - Adds nvfp4 to the table as well
1 parent b1eb4b7 commit 0d28f89

File tree

2 files changed

+24
-18
lines changed

2 files changed

+24
-18
lines changed

docs/guides/compression_formats.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Compression Formats
2+
3+
The following table outlines the possible quantization and sparsity
4+
compression formats that are applied to a model during compression.
5+
The formats are determined according to the quantization scheme and
6+
sparsity type. For more details on the quantization schemes, see
7+
`guides/compression_schemes.md`.
8+
9+
10+
| Quantization | Sparsity | Quant Compressor | Sparsity Compressor |
11+
|---------------|----------|----------------------|---------------------|
12+
| W8A8 - int | None | int_quantized | Dense |
13+
| W8A8 - float | None | float_quantized | Dense |
14+
| W4A16 - float | None | nvfp4_pack_quantized | Dense |
15+
| W4A4 - float | None | nvfp4_pack_quantized | Dense |
16+
| W4A16 - int | None | pack_quantized | Dense |
17+
| W8A16 - int | None | pack_quantized | Dense |
18+
| W8A16 - float | None | naive_quantized | Dense |
19+
| W8A8 - int | 2:4 | int_quantized | Sparse24 |
20+
| W8A8 - float | 2:4 | float_quantized | Sparse24 |
21+
| W4A16 - int | 2:4 | marlin_24 | Dense |
22+
| W8A16 - int | 2:4 | marlin_24 | Dense |
23+
| W8A16 - float | 2:4 | naive_quantized | Dense |

src/llmcompressor/transformers/compression/quantization_format.py

Lines changed: 1 addition & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -18,24 +18,7 @@ def infer_quantization_format(
1818
Infers the quantization format for a model based on its state and provided
1919
compression arguments.
2020
21-
The following table outlines the possible quantization and sparsity formats
22-
along with their corresponding compressor formats:
23-
24-
+---------------+----------+----------------------+---------------------+
25-
| Quantization | Sparsity | Quant Compressor | Sparsity Compressor |
26-
| | | Format | Format |
27-
+---------------+----------+----------------------+---------------------+
28-
| W8A8 - int | None | int_quantized | Dense |
29-
| W8A8 - float | None | float_quantized | Dense |
30-
| W4A16 - int | None | pack_quantized | Dense |
31-
| W8A16 - int | None | pack_quantized | Dense |
32-
| W8A16 - float | None | naive_quantized | Dense |
33-
| W8A8 - int | 2:4 | int_quantized | Sparse24 |
34-
| W8A8 - float | 2:4 | float_quantized | Sparse24 |
35-
| W4A16 - int | 2:4 | marlin_24 | Dense |
36-
| W8A16 - int | 2:4 | marlin_24 | Dense |
37-
| W8A16 - float | 2:4 | naive_quantized | Dense |
38-
+---------------+----------+----------------------+---------------------+
21+
For a summary of the formats, see `docs/guides/compression_formats.md`.
3922
4023
:param model: model to check for quantization, if the model is not quantized no
4124
quantization format is returned

0 commit comments

Comments
 (0)